MATLAB Answers

How do I visualize high-dimensional clusters from the "kmeans" function?

98 views (last 30 days)
I applied the "kmeans" function to a dataset of 24 variables with the number of clusters being set to 3. How can I visualize the three clusters and their centroids?

Accepted Answer

MathWorks Support Team
MathWorks Support Team on 19 Apr 2019

Because the cluster data is 24-dimensional, it is often difficult to visualize them directly. A common way to deal with this is to first project or transform the data to lower dimensions (typically 2 or 3) and then apply visualization techniques to the reduced-dimensional data. As an example, suppose the "kmeans" function is applied to a data matrix "data" (300 x 24) with the number of clusters being set to 3:

rng("default");
data = randn(300, 24);
[idx, C] = kmeans(data, 3);

Then here are some visualization options:

   Option 1: Plot 2 or 3 dimensions of your interest. For instance, to plot the 4th dimension versus the 9th dimension of your data, one can do the following
scatter(data(:,4), data(:,9), [], idx);   % plot three clusters with different colors
hold on;
plot(C(:, 4), C(:, 9), 'kx');   % plot centroids
   Option 2: First reduce the dimensionality of your data using principal component analysis (PCA), and then plot the data in the principal-component space:
[standard_data, mu, sigma] = zscore(data);     % standardize data so that the mean is 0 and the variance is 1 for each variable
[coeff, score, ~]  = pca(standard_data);     % perform PCA
new_C = (C-mu)./sigma*coeff;     % apply the PCA transformation to the centroid data
scatter(score(:, 1), score(:, 2), [], idx)     % plot 2 principal components of the cluster data (three clusters are shown in different colors)
hold on
plot(new_C(:, 1), new_C(:, 2), 'kx')     % plot 2 principal components of the centroid data

​​ Option 3: Use "silhouette" function to measure the goodness of the clustering:

silhouette(data, idx);

  2 Comments

ananya mittal
ananya mittal on 13 Jun 2020
how can we visualize in case significant pc for the data comes to be 10 after reducing from 534 variable as in case of spectroscopic data?
shimji k
shimji k on 27 Nov 2020
my data size is 262144*483 so i done this dimensionsioality reduction as in option2.But
there comes an error 'out of memory'.Now what can i do to resolve this issue. Please replay.Thanks in advance.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by