how to calculate similarity between any two points using custom distance fuction

3 vues (au cours des 30 derniers jours)
I have a data set has m rows (data point) and 4 columns (properties) as below:
i j k p
1 30 50 30 1.2
2 30 60 30 1.3
.. .. .. ..
m 80 40 40 0.2
i, j, k represent coordinates information, p represents other property say weight.
I want to define the distance or dissimilarity between any two data points as ( sqrt( (i1-i2)^2+(j1-j2)^2+(k1-k2)^2 ) / Dmax ) × (p1-p2 / Pmax), Where Dmax is the maximum distance between two points, and Pmax is the maximum weight gap between two points.
Can anyone give me a little tint for this one? If pdist is not working for this one, is there any other function that I can use? Or I have to write some code to calculate the dissimilarity every time, merge the points with smallest dissimilarity, update the dissimilarity matrix and original data matrix, merge, and do the circle. if this is the way, any efficient way to get it done?
Thanks a lot.
Derrick

Réponse acceptée

Star Strider
Star Strider le 15 Juil 2016
I would leave out ‘Dmax’ and ‘Pmax’ because they’re constants and are not going to affect the relative distance results, and that is what you want. Those extra computations will simply impair the efficiency of your code.
I would create a distance function and use pdist:
M = [30 50 30 1.2
30 60 30 1.3
80 40 40 0.2];
dist_fcn = @(x1,x2) sqrt( (x1(1)-x2(1)).^2+(x1(2)-x2(2)).^2+(x1(3)-x2(3)).^2 ) .* abs(x1(4)-x2(4));
D = pdist(M, dist_fcn);
Dm = squareform(D)
Dm =
0 1646.1 1646.1
1646.1 0 60.249
1646.1 60.249 0
See the documentation for pdist and squareform for details.
  7 commentaires
Derrick Fu
Derrick Fu le 19 Juil 2016
Oh my god, you are a genius! Thank you so much Star!!!

Connectez-vous pour commenter.

Plus de réponses (1)

John BG
John BG le 14 Juil 2016
Hi Derrik
Your approach
( sqrt( (i1-i2)^2+(j1-j2)^2+(k1-k2)^2 ) / Dmax ) × (|p1-p2| / Pmax)
has a problem: while you input are, for each point, 4 variables
[i j k p]
dividing by Dmax and Pmax is unnecessarily reducing resolution. So
1. arrange data in an m*4 array
ore1=[30 50 30 1.2;
30 60 30 1.3;
.. .. .. ..
80 40 40 0.2 ]
2. let's say you have m pairs to measure, arrange the pairs to measure in another array m*2
check1=[1 3;
20 106;
50 19;
..]
where each line of check(k,:) points at 2 different lines of ore1.
3. Since
* the distance coordinates you are using do not seem to have negative values,
* and you are using the weight as a 4th dimension
try without measuring the euclidean distance, just with
meas1=zeros(1,m) % meas1 contains the measurements
for k=1:1:m
L1=orde1(check1(k,1),:);
L2=orde1(check1(k,2),:);
meas1(k)=sum(abs(L1-L2))
end
If it's not the case that all coordinates are positive, perhaps you can shift the origin to a point far enough so that all coordinates are positive.
4. now in meas1 you have the pairs with higher similarity or closer each other in the ore with lower values, invert to have higher values proportional to higher similarity:
meas12=max(meas1)-meas1;
cannot go much simpler than these lines.
Check the time with tic;toc; on a small sample to forecast how much time you may need to measure larger ores.
You can also try
(a*(i1-i2)^2+b*(j1-j2)^2+c*(k1-k2)^2+d*(p1-p2)^2)^.5
where a,b,c,d may be chosen as for instance
a=1;b=1;c=1;d=10
Try measuring the distance between 2 same points with different distance methods and you will realise that Dmax and Pmax are not needed.
You can still normalise the distance measurement, but after calculating distances.
If you find this answer of any help solving your question,
please click mark it as accepted answer on click on the thumbs-up vote link,
thanks in advance
John
  6 commentaires

Connectez-vous pour commenter.

Catégories

En savoir plus sur Matrix Indexing dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by