Effacer les filtres
Effacer les filtres

how to use testing data to validate kmeans?

1 vue (au cours des 30 derniers jours)
Mnr
Mnr le 22 Mar 2014
Commenté : Mnr le 23 Mar 2014
Hello there,
I have some data in 8 text files. I would like to classify the similar ones into same classes. I am using k-means for now. I would like to have 5 of the files as training and 3 of them for testing. I have used kmeans command to have k classes, however, I do not know how to validate my results. In other words, I do not know how to use my testing data to calculate the error? I would appreciate if somebody help me. Thanks in advance.

Réponse acceptée

Image Analyst
Image Analyst le 23 Mar 2014
If you do not know the "ground truth" of your data then there's no way to tell if it's "wrong". The only thing you can do (I think) is to classify your "unknown" data and measure how far off your data are from the means of the classes. For example, let's say you had a cluster of data "class#1" around 30 +/- 5, and you had a second cluster "class#2" at 100+/-20. So you run kmeans with 2 classes and it tells you about those two classes, with the mean at 30 and 100. Now you have a data point in the "non-training" set of data and it has a value of 70. So you can say that the 65 belongs to class#2 and it's 40 from class#1 and 30 from class#2. You can do the same for all other data in your test sets.
  3 commentaires
Image Analyst
Image Analyst le 23 Mar 2014
To accurately get the error you have to know the tru e values, don't you? And you don't know those. So all you have is a guess.
Mnr
Mnr le 23 Mar 2014
Thanks!

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Statistics and Machine Learning Toolbox dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by