Clustering evaluation with Silhouette extremely slow

37 vues (au cours des 30 derniers jours)
Phu Lai
Phu Lai le 9 Jan 2017
Commenté : Stephen john le 24 Mai 2022
I am evaluating my kmeans clustering solutions using the built-in evalclusters function with Silhouette criterion:
eva = evalclusters(data,idx,'Silhouette');
Data size is 434874x4, tested on my laptop (core i7, 8GB RAM). I have been waiting for more than an hour but it still has not completed. Is there any way to boost the speed of Silhouette evaluation in Matlab?
Thanks a lot.
  2 commentaires
Fernando Isorna Retamino
Fernando Isorna Retamino le 11 Fév 2017
I have the same problem. My data size is 1726944x7 and I have been waiting for more than 2 weeks with a similar computer.
eva = evalclusters(data,'kmeans','Silhouette','KList',[1:10]);
Stephen john
Stephen john le 23 Mai 2022
@Fernando Isorna Retamino how you solve that?

Connectez-vous pour commenter.

Réponse acceptée

John D'Errico
John D'Errico le 11 Fév 2017
Modifié(e) : John D'Errico le 11 Fév 2017
Your computer is not infinitely large or infinitely fast, unless of course, you are on TV or in the movies. Then anything is computable, merely by typing a few characters, and it will happen immediately, because who wants to wait for a few hours if they are watching a movie?
In real life, big problems take time. No matter how fast is your chosen algorithm, you can always throw more data at it than the algorithm is able to handle efficiently, no matter what is the algorithm. In real life, I don't know of anybody who does not want their code to execute more quickly.
There are a few simple choices to make here:
1. Learn patience, get some coffee and read a good book while you wait. Preferably one about MATLAB, or perhaps one about numerical methods in computing.
2. Find a seriously faster computer to use. This is never as cheap as you want.
3. Choose a different, more efficient algorithm. Different tools have different properties, but faster tools need not be always as good as others.
4. Learn enough about the methods that you can implement the algorithm EFFICIENTLY in a lower level language. Note that merely compiling your code from MATLAB is rarely going to give much of a speedup. That question gets asked on a weekly basis here.
5. Learn enough about parallel computing that you can take advantage of tools like the parallel computing TB. There are many issues here, and not all codes can be sped up.

Plus de réponses (1)

neuromechanist
neuromechanist le 30 Mai 2019
A problem with evalclusters is that it essentially runs multiple k-means (or any other clustering algortihm) one by one, not in parallel. I think Adding parallel cpability to the function increases its speed dramatically, given that parallel computing toolbox is available.
evalclusters is written in object form, and I don't know how to change for loops in object oriented setups to parfor for now. Will update my response if I figure it out.
  3 commentaires
neuromechanist
neuromechanist le 23 Mai 2022
Hi Stephen,
No, Unfortunately I did not find an answer to speed up the evalclusters function.
Sorry to disappoint.
Stephen john
Stephen john le 24 Mai 2022
@neuromechanist any other method you used to solve this?
or change algorithm?

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by