**You are now following this question**

- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.

# Sorting points based on comparing distances.

63 views (last 30 days)

Show older comments

Hello, I am using matlab 2019a and I am trying to sort points based on comparing the distance between them. I use the pdist2 function to calculate the the distance between the points stored in matricies. One matrix has 2 sets of coordinates

A= [0 0, -5.52181708330649 5.78874218621840]

the other has 7 sets

B = [38.8197976955836 -29.0434116907097

-37.0532684880158 0.644925865563445

-4.49735986053992 57.3402937422674

-43.7442096431328 38.5935144262550

41.5359946082739 41.4696332098067

57.3572679057450 8.87552592324304

-29.8320366435934 -43.1286701525403]

After calculating the distance between each point in A and every point in B i end up with a 2x7 matrix C. C(1,1) is the distance between the first point in A and the first point in B while C(2,1) is the distance between point 2 in A and point 1 in B. I want to compare each of these distances between C(1,2) to C(2,2).... to the end and sort the points in B based on which point in A they are closest to. I hope to end up with a matrix of points closest to the first point in A and another matrix of points closest to the second point in A. If the code works with a variable number of points in A that would be greatly apprecitated. Thanks for the help.

##### 0 Comments

### Accepted Answer

Adam Danz
on 1 Sep 2019

Edited: Adam Danz
on 8 Sep 2019

Use the 2nd output to min() to pair the points in B to the nearest points in A. Then use splitapply to segment the points in B into nearest neighbors in A. This works for any amount of coordinates in A.

% Creat demo data

A= [0 0;

-5.52181708330649 5.78874218621840];

B = [38.8197976955836 -29.0434116907097

-37.0532684880158 0.644925865563445

-4.49735986053992 57.3402937422674

-43.7442096431328 38.5935144262550

41.5359946082739 41.4696332098067

57.3572679057450 8.87552592324304

-29.8320366435934 -43.1286701525403];

% compute distances between A, B

C = pdist2(A,B);

% or better: C = sqrt((B(:,1)-A(:,1)').^2 + (B(:,2)-A(:,2)').^2)';

% Split B into groups that are nearest to points in A

[~, minRowIdx] = min(C,[],1); %

neighborGroups = splitapply(@(x){x},B,minRowIdx(:));

neighborGroups{n} are all the coordinates in B that are closest to A(n,:).

##### 31 Comments

Vance Blake
on 1 Sep 2019

Thanks again Adam!

Vance Blake
on 2 Sep 2019

Hey Adam sorry to disturb you again but I was working with the code you gave and trying to calculate the vector addition in the cell arrays but Im having a bit of trouble. I am able to calculate the vectors but when I use the sum command it spits out an error. the sum command works with just the regular neighborGroups cell array but not when I assign them to a new variable D. The code I use to get the vectors is below but now I want to add the values in each group in D. I will then use the cell2mat command to return the 2 combined vectors so that I can plot them.

Edit: Nvm I figured it out. needed to use the same format i used to call the cell arrays in neighborGroups to get it right but I do have one question. How come D has to be a cell array but I can return sum D to regular matrix in E ?? Thanks again for your time and help!

SizeA = size(A);

for i = 1:SizeA

D{i,:} = neighborGroups{i:i, 1:1}-A(i,:);

E(i,:) = sum(D{i:i, 1:1});

end

Adam Danz
on 2 Sep 2019

To start, this line below is very confusing....

neighborGroups{i:i, 1:1}

...and it does the same thing as this

neighborGroups{i}

If you want to add up the x and y coordinates within each group, use this line below. "groupSums" will be a nx2 matrix where columns are [x,y] and rows are each neighborGroups.

groupSums = cell2mat(cellfun(@sum, neighborGroups, 'UniformOutput', false));

Vance Blake
on 2 Sep 2019

Hm when I tried it on my side it wasnt working when I only used i which i thought was strange as well. this code below worked when i tried it on the small scale but when I took it to my main code and changed the appropriate variable names it failed at this point below. and told me brace indexing is not supported for variables of this type.

D{i,:} = neighborGroups{i:i, 1:1}-A(i,:);

E(i,:) = sum(D{i:i, 1:1});

%%

for i = 1:SizeA

D{i,:} = neighborGroups{i:i, 1:1}-A(i,:);

E(i,:) = sum(D{i:i, 1:1});

end

E = 8*(E./norm(E));

radii_vein = 4;

centers_node = [E(:,1), E(:,2)];

plot(E(:,1), E(:,2), '*', 'color', 'k', 'markersize', 12);

viscircles(centers_node, radii_vein, 'color', 'k');

A = [A;E];

But what Im trying to do is calculate the vector between the HS stored in each group and the vein node they are closest to, and then sum each group of vectors normalize them and then plot the resultant 2 combined vectors on the graph. Im working with the same points from the big code we worked on earlier.

Adam Danz
on 2 Sep 2019

"Hm when I tried it on my side it wasnt working "

What you need to do is run my version (below), then run your version, and look carefully what the differences are. Ask yourself, does "neighborGroups" look the same between our versions? If you see a difference, look into your code to determine where the difference started. Also, please remind me what version of matlab you're using (I doubt that's the problem but it's good to know).

% compute distances between A, B

C = pdist2(A,B);

% Split B into groups that are nearest to points in A

[~, minRowIdx] = min(C,[],1); %

neighborGroups = splitapply(@(x){x},B,minRowIdx(:));

% show the first group

neighborGroups{1}

Vance Blake
on 2 Sep 2019

Im using matlab r2019a. and the output from my code and yours is the same. and when I run the next section where it goes

SizeA = size(A);

for i = 1:SizeA

D{i,:} = neighborGroups{i}-A(i,:);

E(i,:) = sum(D{i});

end

also works just fine and does exactly what I wanted as well. I changed it to your notation cause it was much clear and still work the same way.

Edit: in the main code it doesn't seem to be calculating D correctly it only reports the first group in D and not the second group in D which in turn messes up the calculation step for E. Im only changing variable names between the small scale and the main code. they are working with the same data points so the outputs should be the same if everything is working properly.

Vance Blake
on 2 Sep 2019

I figured it out I forgot to change one of the variable names in the main code (been a long day lol) but now its working properly. And you're right it does return a vector... someone on different question said that using length was bad and to use size instead. I don't want to use numel because it would be 4. Is the benefit of writing it as you suggested just so that it always makes sure to pull the largest number ie row number?

also i could make this a separate question just so that you can get credit for helping me find my error. It wasn't until i looked very closely at my code and yours that I realized I just forgot to change variable name.

Adam Danz
on 2 Sep 2019

" someone on different question said that using length was bad and to use size instead."

That's good advice but you need to use the 2nd input to size() to specify whether you want the number of rows (1) or columns (2).

The reason why length() is not a good choice is because it will always return the larger dimension. Let's say you only have 1 coordinate in "A". A = [2,3]; length(A) equal 2!

Instead, you want to loop through the coordinates in A and the coordinates are arranged by row. So you specifically want the number of rows which is obtained by size(A,1).

No need to make an additional question. Glad you solved this one!

Vance Blake
on 2 Sep 2019

Vance Blake
on 8 Sep 2019

Adam Danz
on 8 Sep 2019

Hi Vance, I responded to that question.

However, I didn't understand this

"I need to do this because as the black vein nodes expand across the graph they will absorb older HS and I don't have a clean way of removing them from my plot without clearing the axes and replotting the remaining points"

partailly because it's been a while since I've sifted through your code and comments and kind of forget the background. But also because I don't now what it means, "absorb older HS" and what you have to remove.

Vance Blake
on 8 Sep 2019

yeah it has been a bit but I am able to grow the next set of vein nodes (black dots in the picture attached) and I realized that as I expand this chain with more nodes then the older HS from previous generations red (HS gen 1), green (HS gen 2), will fail the various elimination tests and need to be removed but since they are plotted as sets I dont know how to remove just the specific point that fails without clearing the axes and replotting. But if i do that I would need to keep the points the same color so that the generation they originated from can be tracked.

As the chain of black vein nodes continues to grow the red HS that is nearest to the chain will eventaully fail the criteria specified in the code and then need to be removed form the plot.

Stephen
on 8 Sep 2019

"I dont know how to remove just the specific point that fails without clearing the axes and replotting"

Clearing and replotting is inefficient.

You can simply change the plotted data: e.g .for a line object change the XData and YData:

For this you will need to obtain and refer to explicit graphics handles (which of course in the interest of writing reliable, robust, predictable code you should be doing anyway).

Vance Blake
on 8 Sep 2019

Stephen
on 8 Sep 2019

"... in your example I call the graph handle ..."

I do not know what you mean by to "call" a graphics object's handle: functions are called, handles are not. A graphics object contains the data that is plotted, the handle is a reference to it. Thus you can change properties of the graphics object (e.g. the plotted data) using its handle:

":...just change the corresponding xy data to NAN or something else ??"

Sure, you could change the data to NaN, or just remove those points entirely by entirely replacing the data... whatever suits your workflow.

"How would that work with multiple points from different plot steps ??"

Vance Blake
on 8 Sep 2019

my bad by call i meant refer to the handle but i see what you mean. thanks for the help.

Edit: Is there a way for matlab to automatically recognize the handle for which a point is attached too??

@Adam thanks for the tip!

Bruno Luong
on 8 Sep 2019

Vance Blake
on 8 Sep 2019

Adam Danz
on 8 Sep 2019

"Is there a way for matlab to automatically recognize the handle for which a point is attached too"

If you click on a point and then run gco() it will return the handle to the object that contains that point. But I doubt that's the solution you're looking for.

Programmatically there is no builtin way to get the handle(s) to a object that contains a given coordinate. You could search the XData and YData for all objects and hope to find 1 and only 1 match. But that could be messy.

In your recent question I recommended gscatter() so you can color code different groups of data. Maybe scatter() would be better given this context. It keeps the data together in 1 handle.

h = scatter(1:5,1:5,50,[1 0 0; 1 0 0; 0 1 0; 0 1 0; 0 0 1])

Stephen
on 8 Sep 2019

"... then I select 1 point from 3 of the separate handles..."

What do you mean "select": interactively via the GUI (e.g. the user clicks on a plotted point/line/...) or some other method (e.g. programmatically, so you get an index or similar) ?

"Is there a way for matlab to automatically recognize the handle for which a point is attached too??"

Most likely that concept is entirely reversed to how you should manage your graphics data: you know (after all you write the code) which objects contains what kind of data, you also know which points have been plotted in which objects: the rest is basic MATLAB array manipulation.

Bruno Luong
on 8 Sep 2019

Vance Blake
on 8 Sep 2019

Bruno Luong
on 8 Sep 2019

Well when you sort [sic] (programmatically) the points you must know which points you keep. So it is your duty (the programmer) to keep track which points (the index) are keep and plotted, where, in which colors and what the graphic handle associated with.

Don't relly on MATLAB to tell you magically which points are plotted. You (the programmer) can and must keep track of those informations for your own use.

Of course you can compare the points coordinates between your array and what is in the plots, but this is BAD programming.

Stephen
on 8 Sep 2019

"...therefore stored under different handle names"

Most likely a terrible approach. Just use an array of handles (much simpler).

"So I was hoping for a solution whereby I could use the set of coordinates to recover their corresponding handle and then programmably edit the x y data under the corresponding handle to remove the point from the graph so that I don't have to clear and re plot."

For some reason you are thinking that you need to go around in a rather complicated indirect route, like reaching London from Paris by sailing down Africa, around Cape Horn, up the Pacific, through the Northern Passage, ... et voici!

You do NOT need to "recover" any handles. You already know everything: the groups, the data, which handle in the handle array a particular type of data was plotted to. You really just need indexing (which means you keep track of your own data), followed by set or whatever.

Adam Danz
on 8 Sep 2019

Use scatter() to plot your data and control the color of each coordinate. That way all of your XData and YData are in 1 handle.

Let's say you need to remove the coordinates stored at (x,y) and the object handles is 'h',

h.XData(h.XData == x) =NaN;

h.XData(h.yData == y) =NaN;

This assumes x and y have an exact match in your data. If you run into round-off error where your x or y value differs by an extremely tiny amount from your XData and YData, you may have to build in a tolerance and we (or better yet, google or answers in this forum) can help you with that).

Vance Blake
on 8 Sep 2019

@Stephen @Bruno I know im a bad programmer thats why I on here asking for help lol. But when it comes to creating an array of handle names how do I assign a handle name to an object that doesnt exist yet? I think you guys are assuming that my variables are all created and existing at the same time but thats not the case. If I want to use indexing for storing my plots, handle names, or whatever then wouldnt each of them have to be created already or after creation appended in to the same matrix/cell array?? Unless Im misunderstanding something which could very well be (and probably is) the case. I have read the tutorial documents for not using eval and such but it seems im missing something about indexing.

Either way thank your for all of your help. I would not have gotten to this point without it. I will look into and hopefully understand all the information that has been shown to me.

Stephen
on 8 Sep 2019

"when it comes to creating an array of handle names how do I assign a handle name to an object that doesnt exist yet?"

I do not know what "handle names" are.

However graphics objects certainly do NOT need to all exist when you create the handles array for their handles, as you can just preallocate a handles array:

or (less optimally) collect the handles into a container array, e.g. a cell array.

"I think you guys are assuming that my variables are all created and existing at the same time but thats not the case."

By the time you have plotted that data, then you have created the graphics objects, so you will certainly be able to store their handles.

"...after creation appended in to the same matrix/cell array??"

Yes. That is how robust code is writen. That is what we are advising you to do store the graphics handles in one handles array, so you can trivially access them later.

"I have read the tutorial documents for not using eval and such but it seems im missing something about indexing."

That intent of that link was to dissuade you from using "different handle names" as you wrote earlier, as it seemed that you were referring to dynamic variable names. But now I notice that you use the term "handle names" to apparently refer to graphics object handles themselves, so perhaps that link was superfluous. Or perhaps not, depending on your meaning of "handle name".

Bruno Luong
on 9 Sep 2019

Handle "name"? Exists at the creation? Not sure what you are concerned.

I show you a pseudo test code.

% Generate some dummy test data

A = rand(5,2);

B = rand(100,2);

idx = nearestNeighbor(delaunayTriangulation(A),B);

gidx = unique(idx);

ngrp = length(gidx);

h = zeros(1,ngrp); % prealloate a table of graphic handles

ColorTable = jet(ngrp);

figure

hold on

for k = 1:ngrp

Bgrp = B(idx==gidx(k),:); % filter

Color = ColorTable(k,:); % Select the color of your specific group

h(k) = plot(Bgrp(:,1), Bgrp(:,2), 'o', 'Color', Color, 'MarkerFace', Color, 'Markersize', 10);

end

h(k) is array of handles, each correspond to a plot with a group.

You know which plot contains which B from idx. The information is encoded in arrays idx, gidx, ngrp, h. You can further split idx and ngrp of cell, put them in array of structures (or table), gives it a name for each individual structure if you like.

What is the problem with those arrays, what you can't figure out with array of handles h()?

### More Answers (1)

Bruno Luong
on 8 Sep 2019

Edited: Bruno Luong
on 8 Sep 2019

You might learn to use Delaunay triangulation that can find nearrest points much more efficiently than pdist2 or such (applicable if the number of points of A is >=3). scatteredInterpolant is based in Delaunay T.

A=rand(1000,2);

B=rand(1000,2);

% pdist2

[~,i]=min(pdist2(A,B),[],1)

% scatteredInterpolant

F=scatteredInterpolant(A(:,1),A(:,2),(1:size(A,1))','nearest');

i = F(B(:,1),B(:,2))'

% Direct call of Delaunay

i = nearestNeighbor(delaunayTriangulation(A),B)'

##### 6 Comments

Adam Danz
on 8 Sep 2019

@Bruno Luong, if we're aiming for efficiency then apply the pythagorean theorem directly is the best solution to calculate distance.

c2 = sqrt((B(:,1)-A(:,1)').^2 + (B(:,2)-A(:,2)').^2)';

This produces the same result as

C = pdist2(A,B);

Vance Blake
on 8 Sep 2019

Adam Danz
on 8 Sep 2019

Using the A, B data in the question, the result is

Warning: Insufficient number of data points to create a triangulation-based interpolant.

ans =

[]

but this is interesting!

I compared the distribution of tic/toc times for both methods across 10000 repetitions. The scatteredInterpolant method is about 2x faster than the median speed of pdist2 .

Bruno Luong
on 8 Sep 2019

DELAUNAY is even better in this sense:

When you compute PDIST2 of M x N points it computes O(M x N) arithmetic operations. M is the number of points in A, and N in B.

When you call DELAUNAY for NEAREST search, the complexity is (M log(M)) to build F, and then N*log(M) to query nearest point.

For M = 1000 and N = 1000 you reduce the compuation from billions (pdist2) to few thousands (delaunay based).

It's the complexity of the search not about computing euclidian distance.

First NOTE: As I said you need >= 3 points in A to be able to use Delaunay.

Second NOTE: you don't even need to call SQRT to compute nearest by "dumb" method. Call min(D.^2) leads to the same result.

Bruno Luong
on 8 Sep 2019

"If you could expand more on what is specifically happening with each step of your code that would be great. "

Well I apply pretty directly scatteredInterpolant or delaunayTriangulation on your data points, there is nothing mysterious about it. Please read the doc of these functions/class. It's pretty straighforward once you know what it supposes to do.

Vance Blake
on 8 Sep 2019

### See Also

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!**An Error Occurred**

Unable to complete the action because of changes made to the page. Reload the page to see its updated state.