Why Kmeans function give us give different answer?

I have noticed that kmeans function for one k value in a single run gives different cluster indices than while using in a loop with varying k say from 2:N. I do not understand this. It will be great if it is clear to me.

 Réponse acceptée

José-Luis
José-Luis le 22 Sep 2014

1 vote

Because, if you are using the default settings, kmeans() randomly selects a starting point. The algorithm is not deterministic and the results might depend on that starting position.

2 commentaires

So what is the default setting then i have chosen:
rng('default');
Am I right?
Try using the 'replicates' option for kmeans to automatically run the algorithm multiple times and return the best answer:
>> doc kmeans
You can set the order of random numbers generated with the rng command:
>> doc rng
Putting something like rng(3) before kmeans will make the results repeatable even though it involves random starting points.

Connectez-vous pour commenter.

Plus de réponses (1)

Image Analyst
Image Analyst le 22 Sep 2014

0 votes

Like many other types of numerical minimizations, the solution that kmeans reaches often depends on the starting points. It is possible for kmeans to reach a local minimum, where reassigning any one point to a new cluster would increase the total sum of point-to-centroid distances, but where a better solution does exist. However, you can use the optional 'replicates' parameter to overcome that problem.

1 commentaire

Yes I do understand. However, I got different answer while it is single value of cluster like
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',8, 'display','iter');
and others inside loop like
rng('default'); % For reproducibility
param_sac = load('param2W_sac.cld');
size(param_sac);
dist_alg = 'sqEuclidean';
iditer = [];
sumdistitr = [];
meansil = [];
silhitr = [];
for nkmeans = 1:10;
[idx,cent,sumdist] = kmeans(param_sac,nkmeans,'dist',dist_alg,...
'replicates',nkmeans, 'display','iter');
[silh,h] = silhouette(param_sac,idx);
xlabel('Silhouette Value')
ylabel('Cluster');
meanh = mean(silh);
iditer = [iditer idx];
% cen = [cen cent];
% sumdistitr = [sumdistitr sumdist];
meansil = [meansil; nkmeans meanh];
silhitr = [silhitr silh];
end
I got totally different in classification.
Thanks for responses to all

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by