Effacer les filtres
Effacer les filtres

K-mode clustering algorithm to cluster categorical data?

9 vues (au cours des 30 derniers jours)
Dankur Mcgoo
Dankur Mcgoo le 10 Août 2018
Commenté : Image Analyst le 12 Août 2018
Has anyone come across k-mode script in the Matlabsphere? I've seen people respond with links to supervised learning algos, but I need unsupervised. Even a pseudo code would be okay, so I can build it.
I'm using R2017b.
Really trying to avoid using R..

Réponses (1)

Image Analyst
Image Analyst le 11 Août 2018
I can't imagine why you'd use kmeans with categorical data. If it's categorical you can simply just use the category to classify the data point, right?
  4 commentaires
Dankur Mcgoo
Dankur Mcgoo le 12 Août 2018
Modifié(e) : Image Analyst le 12 Août 2018
I apologize for not clearly stating my question/issue. I was hoping just for some one having come across k-mode script, but I'll try to pose my question better.
I think this analogy is similar enough to my data set. I have 200 questionnaires, and within each questionnaire I have 40 questions that are categorical. I would like to cluster them such that similar questionnaires cluster together. So even if 1-2 questions were answered different, the distance measure would not be too large between those two data points.
How my question differs from what you replied, which perhaps my interpretation is wrong, but I can't simply cluster the questionnaire based on an arbitrary question (i.e just Question 1, or just the car makers)-- I need to consider all of them.
k-means is appropriate for numerical data. There is no way of translating my categorical data into meaningful numeric data. They are currently numeric in my matrix, but consecutive numbers are not related and thus any distance measure is meaningless.
Does that make more sense?
I've found this, https://shapeofdata.wordpress.com/2014/03/04/k-modes/, which may seem to be of use -- and this is what I am looking to try? I just would rather avoid having to code it myself because of time constraints.
I would also entertain any other suggestion of data clustering. I am not sold on k-mode.
Image Analyst
Image Analyst le 12 Août 2018
I'm not an expert on questionnaires, though we have many statisticians in our company who spend their whole lives doing that. I'd suggest you try the Classification Learner app, and pick the best one. Check out this page https://www.mathworks.com/help/stats/machine-learning-in-matlab.html. You have unsupervised learning because you have data but no ground truth - you don't know the classes/groupings of any of them in advance.

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by