How do I visualize high-dimensional clusters from the "kmeans" function?

Question

MathWorks Support Team le 18 Avr 2019

1
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/457228-how-do-i-visualize-high-dimensional-clusters-from-the-kmeans-function

Réponse apportée : MathWorks Support Team le 18 Avr 2019

Réponse acceptée : MathWorks Support Team

I applied the "kmeans" function to a dataset of 24 variables with the number of clusters being set to 3. How can I visualize the three clusters and their centroids?

Connectez-vous pour répondre à cette question.

Answer 1

MathWorks Support Team le 19 Avr 2019

3
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/457228-how-do-i-visualize-high-dimensional-clusters-from-the-kmeans-function#answer_371246

Because the cluster data is 24-dimensional, it is often difficult to visualize them directly. A common way to deal with this is to first project or transform the data to lower dimensions (typically 2 or 3) and then apply visualization techniques to the reduced-dimensional data. As an example, suppose the "kmeans" function is applied to a data matrix "data" (300 x 24) with the number of clusters being set to 3:

rng("default");
data = randn(300, 24);
[idx, C] = kmeans(data, 3);

Then here are some visualization options:

   Option 1: Plot 2 or 3 dimensions of your interest. For instance, to plot the 4th dimension versus the 9th dimension of your data, one can do the following

scatter(data(:,4), data(:,9), [], idx);   % plot three clusters with different colors
hold on;
plot(C(:, 4), C(:, 9), 'kx');   % plot centroids

   Option 2: First reduce the dimensionality of your data using principal component analysis (PCA), and then plot the data in the principal-component space:

[standard_data, mu, sigma] = zscore(data);     % standardize data so that the mean is 0 and the variance is 1 for each variable
[coeff, score, ~]  = pca(standard_data);     % perform PCA
new_C = (C-mu)./sigma*coeff;     % apply the PCA transformation to the centroid data
scatter(score(:, 1), score(:, 2), [], idx)     % plot 2 principal components of the cluster data (three clusters are shown in different colors)
hold on
plot(new_C(:, 1), new_C(:, 2), 'kx')     % plot 2 principal components of the centroid data

Option 3: Use "silhouette" function to measure the goodness of the clustering: