How to compare array's values with each other?

In an array (a) with indexes from 1 to m, I want to compare the values of this array one by one with each other, and if the distance (Difference) between two values is more than a value (z), for example, the difference between a(i) and a(j) at indexes i and j is more than z, I want to save these two indexes i and j and represent them in the output. I wrote these codes:
if abs(a(i)-a(j))> z
disp(i);
disp(j);
fprintf('result is between %10.6f and %10.6f',i,j);
end
but there is an error in if line:
Subscript indices must either be real positive integers or logicals.
How can I define indexes for matlab. Is a for loop (for i=1:m) needed for passing the array, If a loop is necessary, should I put fprintf out of the loop because it will repeat. For saving and representing the indexes i and j in the output, I'm looking for better functions besides disp or fprintf.

 Réponse acceptée

Guillaume
Guillaume le 31 Juil 2019
It's unclear how you get your error if your i and j were just created with a for i = 1:m and for j=1:m. They're clearly something else for you to get that error.
Anyway, assuming a is a vector and assuming matlab>=R2016b, this is very straightforward:
distance = abs(a - a.')
will create a m x m matrix of the distance between a(i) and a(j) for all i and j.
finding the i and j of the elements for which distance is greater than z is also easy:
[i, j] = find(distance > z)
which you could store in a 2 column matrix if you wanted:
pairs = [i, j]

5 commentaires

phdcomputer Eng
phdcomputer Eng le 31 Juil 2019
Modifié(e) : phdcomputer Eng le 31 Juil 2019
Thank you very much
I'm very grateful for your help.
I wrote a program to select best features of a data, and the aim of my question about comparing an array's elements was to find best threshold for cuting features.
first I loaded lung dataset, beacause this data had 2 class labels I seprated them and then I computed the hamming distances between each of features, then I sorted them ascendingly and saved the results in the array (A), I calculated z value and with your help I could find the best range of threshold for feature selection.
I wanted to ask your opinions about the results of pairs matrix.
close all;
clc
load lung.mat
data=lung;
[n,m]=size(data);
l=1;
t=1;
data1=[];
data2=[];
if data(i,m)==1
data1(l,:)=data(i,:);
l=l+1;
else
data2(t,:)=data(i,:);
t=t+1;
end
end
if t>l
data1(l:t-1,:)=0;
else
data2(t:l-1,:)=0;
end
for i=1: m
thisCol1=data1(:,i);
thisCol2=data2(:,i);
a(i)=fHammingDist(thisCol1,thisCol2);
end
[A,indA1]=sort(a,'descend');
z=sum(A)/(m-1);
distance=bsxfun(@minus,A,A.');
[i,j]=find(distance>z);
pairs=[i,j];
according to pairs results in the output:
pairs5 =
55 1
56 1
57 1
55 2
56 2
57 2
56 3
57 3
56 4
57 4
56 5
57 5
56 6
57 6
56 7
57 7
56 8
57 8
56 9
57 9
56 10
57 10
56 11
57 11
56 12
57 12
56 13
57 13
57 14
57 15
57 16
57 17
57 18
57 19
57 20
57 21
57 22
I wanted to set the threshold between a(i) and a(j) that difference between them is greater than z. but I don't know how to find the thereshold among these values.I'll be very grateful to have your grateful opinions.
Thanks
Guillaume
Guillaume le 31 Juil 2019
Modifié(e) : Guillaume le 31 Juil 2019
I'm really confused as to what you're trying to do. Why are you calculating distances between your hamming distances? On the other hand, it's not my field, so maybe it makes sense to calculate the distance of distances.
I also don't understand why you're sorting your hamming distances, thereby separating their ordering from the feature vectors ordering. The z calculation and the distance of distances calculation doesn't depend of the order, so why?
----
Unrelated to this, the data1 and data2 separation can be done more simply with just:
data1 = data(data(:, end) == 1, :);
data2 = data(data(:, end) ~= 1, :);
data1(end+1:size(data2, 1), :) = 0; %will add rows to data1 if shorter than data2, otherwise does nothing
data2(end+1:size(data1, 1), :) = 0; %will add rows to data1 if shorter than data2, otherwise does nothing
And it would be wiser to use cell arrays instead of numbered variables (numbered variables are always a bad idea), particularly if in the future you have more than 2 classes:
data{1} = data(data(:, end) == 1, :);
data{2} = data(data(:, end) ~= 1, :);
data{1}(end+1:size(data2, 1), :) = 0; %will add rows to data{1} if shorter than data2, otherwise does nothing
data{2}(end+1:size(data1, 1), :) = 0; %will add rows to data{2} if shorter than data2, otherwise does nothing
@Guillaume Thank you very much for your valuable and helpful tips, I'm very grateful for your attention.
I calculated the distance between the part of each feature that belongs to class 1 and another part of the feature that belongs to class 2 .
My aim was selecting the most discriminative features among all of the features of the data by sorting these distance values descendingly and then cut the greater values so just keep this count of features and discard the rest of them.
For this purpose, when I plot the sorted distances (A) it's very complicated to find the best threshold for cutting the features through observation.
I wanted to use the z value in this way that if the distance of two computed values (the elements of A , for example i & j ) are greater than z , so the program keeps the number of two features.
By using this approach I obtained the above results of i & j , but it seems meaningless,I think i and j must be continuous for example 10 & 11 , that we can select 10 features of the data.
Your valuable advices will help me a lot.
Thanks greatly
Guillaume
Guillaume le 9 Août 2019
"My aim was selecting the most discriminative features among all of the features of the data by sorting these distance values descendingly and then cut the greater values so just keep this count of features and discard the rest of them."
As I said, this is not my field. If most discriminative features is equivalent to pair of features with the largest hamming distance between them, then that part makes sense.
What I don't understand is what you do after, if you have hamming distance a(i) between feature V(m) and V(n), and hamming distance a(j) between feature V(x) and V(y), what does a(i)-a(j) mean (which is what you calculate with your distance)?
phdcomputer Eng
phdcomputer Eng le 11 Août 2019
Modifié(e) : phdcomputer Eng le 11 Août 2019
Thanks. I'm very grateful for your attention.
rst I tried to find the best point for cutting the best features by plotting the sorted distances (A) , but it's complicated because in some parts of the figure , values are changing gradually but in some points the decrease is suddenly, by the way sometimes I can't choose which points is better as threshold.for example in multiple points, the values have sudden drop.
as you said if we suppose a(i) and a(j) are disances.
a(i)-a(j) is the difference of two points in the plot(A) figure and we can put this condition that if the difference of two points in the figure is more than a computed value for example z , we keep the points i and j and we can cut i number of features.
my purpose of threshold is the point that the distance values are decreasing after that point impressive.
Thank you very much

Connectez-vous pour commenter.

Plus de réponses (1)

Jon
Jon le 31 Juil 2019
Staying close to what you have started here, you could put your code into a double loop, for example
% assign threshold
z = 10; % or what ever your threshold is
% find number of elements to loop through
N = length(a)
% preallocate array to hold results
% elements of D will be set to true (1) when
% a(i) and a(j) are further apart than threshold
D = zeros(N,N)
for i = 1:N
for j = 1:N
D(i,j)=abs(a(i)-a(j))> z
end
end
% display indices of elements whose absolute difference exceeds threshold, z
[idxI, idxJ] = find(D)
disp(idxI)
disp(idxJ)

5 commentaires

Jon
Jon le 31 Juil 2019
I had not seen Guillaume's answer until after I posted mine, but clearly his approach is much more compact and in the spirit of good MATLAB programming, to avoid loops and vectorize when possible. In this case he takes good advantage of MATLAB's implicit expansion capability introduced after 2016b. See https://www.mathworks.com/help/matlab/matlab_prog/compatible-array-sizes-for-basic-operations.html and also https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
I still forget that it is now a well defined operation to for example subtract a row vector from a column vector (assuming they both have the same number of elements)
Even before R2016b, you could do the same with bsxfun (or meshgrid, ndgrid, or repmat at the expense of a bit more memory):
distance = bsxfun(@minus, a, a.'); %works both pre- and post- R2016b, but from R2016b implicit expansion is usually faster
Jon
Jon le 31 Juil 2019
Modifié(e) : Jon le 31 Juil 2019
Good point.
One question @Guillaume, I notice you use the transpose operator:
.'
rather than the conjugate transpose operator:
'
For real vectors the result will be the same. Do you recommend using the transpose operator even when the problem involves only real matrices?
Guillaume
Guillaume le 31 Juil 2019
For real matrix, I don't think there's any difference in performance between the two, so you can indeed use either.
However, since the OP never specified that the vectors were pure real, and since the original code would have worked with complex numbers, I used the plain transpose so as not to change the meaning of the distance formula.
By default, I tend to use .' so that the code works the same with real or complex numbers, when all is meant is changing the direction of a vector.
I'm not a mathematician, maybe it makes sense that the shorter ' is a conjugate tranpose. if the design had been up to me, I would have swapped the meaning of the two so that ' was a plain transpose and .' a conjugate transpose.
Jon
Jon le 31 Juil 2019
@Guillaume - Thanks for the explanation.

Connectez-vous pour commenter.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by