use a custom distance with the kmeans
    9 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
    Emanuele Gandolfi
 le 13 Jan 2022
  
    
    
    
    
    Commenté : Walter Roberson
      
      
 le 15 Jan 2022
            Hello everyone. 
I'm not very good with matlab so I ask you for help. For a university project I need to be able to group users who are furthest away from each other within a rectangular area. I am using kmeans and I have two possibilities: the first is to create a custom function, but I have read that I should use kMedoids; the second is to pass it the custom distance matrix.
At the moment I am following the second path but I do not understand how to do it. I am attaching the code of the one done so far.
N = 10
x=rand(N,1)*5
y=rand(N,1)*2.5 
figure
scatter(x,y)
M = [x,y]
num_medoids = 1;
eucli_dis = pdist(M);
eucli_dis = squareform(eucli_dis);
inv_eucli_dis = 1./eucli_dis; 
for ii = 1:10
    text(M(ii,1),M(ii,2),num2str(ii));
end
gscatter(M(:,1),M(:,2))
6 commentaires
  Walter Roberson
      
      
 le 13 Jan 2022
				What is the code for your custom distance function?
What is your code for your call to kmedoids ?
  Emanuele Gandolfi
 le 14 Jan 2022
				
      Modifié(e) : Walter Roberson
      
      
 le 14 Jan 2022
  
			
		Réponse acceptée
  Image Analyst
      
      
 le 14 Jan 2022
        Here is code to randomly lay down points and draw thin black lines between a pair of points if they are far away from each other and draw a green line between endpoints of a pair if the points are close together:
N = 10
x=rand(N,1)*5
y=rand(N,1)*2.5 
plot(x, y, 'b.', 'MarkerSize', 30);
grid on;
xy = [x(:), y(:)];
distances = pdist2(xy, xy)
% Zero out lower triangle because it's a repeat of the upper triangle
distances = triu(distances)
nonZeroIndexes = distances > 0;
medianDistance = median(distances(nonZeroIndexes))
thresholdValue = medianDistance/2; % Whatever you want.
% Find pairs that are far apart.
[rows, columns] = find(distances > thresholdValue);
hold on;
% Plot pairs that are far apart.
for k = 1 : length(rows)
    index1 = columns(k);
    index2 = rows(k);
    xp = [x(index1), x(index2)];
    yp = [y(index1), y(index2)];
    plot(xp, yp, 'k-', 'LineWidth', 1, 'MarkerSize', 30);
end
% Find pairs that are close together.
[rows2, columns2] = find((distances > 0) & (distances <= thresholdValue));
hold on;
% Plot pairs that are close together.
for k = 1 : length(rows2)
    index1 = columns2(k);
    index2 = rows2(k);
    xp = [x(index1), x(index2)];
    yp = [y(index1), y(index2)];
    plot(xp, yp, 'r-', 'LineWidth', 2, 'MarkerSize', 30);
end
title('Black lines are far away, red lines are close')

7 commentaires
  Walter Roberson
      
      
 le 15 Jan 2022
				I wanted to know for sure if it could also be solved with kmeans by using a custom distance function or by passing it an inverse distance matrix.
No, it cannot be solve that way "for sure". Unless there are exactly two choices in two dimensions, then The Voting Paradox shows that there is no possible algorithm that can reliably generate optimal outcomes for all nodes.
Plus de réponses (2)
  Image Analyst
      
      
 le 13 Jan 2022
        Not sure why you think there are clusters.  I'd just use pdist2() and then threshold to find points that are farther apart than some distance.  Something like
xy = [x(:), y(:)];
distances = pdist2(xy, xy);
thresholdValue = .3; % Whatever you want.
[rows, columns] = find(distances > thresholdValue);
0 commentaires
  Image Analyst
      
      
 le 14 Jan 2022
        
      Modifié(e) : Image Analyst
      
      
 le 14 Jan 2022
  
      You might also consider SVM.  It tries to find a dividing line between two groups such that the gap between the two groups is widest so the two groups are farthest apart.  See

or use dbscan (demo attached):
It tries to find all points that can be connected with a distance less than what you specify:

A point that is found to be in a cluster with more than a certain number of close neighbors is called a "core point".  It can also be part of the cluster if any points are within that distance.  So for example, a dumbbell shape could have core points in the ends, connected points in the middle, and the whole thing being one single cluster.  If it's an isolated point not closer to any other point than your specified distance, it's not a core point. (Like point N above.)  
In this diagram, minPts = 4. Point A and the other red points are core points, because the area surrounding these points in an ε radius contain at least 4 points (including the point itself). Because  they are all reachable from one another, they form a single cluster.   Points B and C are not core points, but are reachable from A (via other  core points) and thus belong to the cluster as well. Point N is a noise point that is neither a core point nor directly-reachable.  You can consider it as essentially being in a clluster all by itself.
0 commentaires
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!




