Using Mahalanobis distance in hierarchical cluster analysis error

2 vues (au cours des 30 derniers jours)
Sriparna Sen
Sriparna Sen le 4 Mar 2020
Commenté : Sriparna Sen le 12 Mar 2020
Hi! Thank you in advance for the help! I am currently creating a hierarchical cluser using the linkage function in Matlab. I pass the following argument into the function:
links = linkage(samples,'complete', 'mahalanobis');
My variable, samples, is a 25 x 106720 matrix, class double, that contains t values.
Every time I run this in Matlab however, it gives me the following error message:
Error using *
Requested 106720x106720 (84.9GB) array exceeds maximum array size preference. Creation of arrays greater than this limit
may take a long time and cause MATLAB to become unresponsive. See array size limit or preference panel for more
information.
Error in nancov>localcov (line 173)
c = xc' * xc / denom;
Error in nancov (line 116)
c = localcov(x,domle);
Error in pdist (line 181)
additionalArg = nancov(X);
Error in linkage (line 259)
Z = internal.stats.linkagemex(Y,method,pdistArg, memEff);
How do I bypass this error/ is there another way for me to calculate the mahalanobis distance for hierarchical clustering?

Réponses (1)

Rajani Mishra
Rajani Mishra le 11 Mar 2020
The error encountered is because for your data “samples” of size 25 x 106720 when covariance matrix is computed in linkage function using “nancov()” the size grows to 106720 x 106720 which exceeds maximum array size preference.
You can try either reducing your data size by dimensionality reduction. I encountered literature talking about the same when researching about your question. You can also refer to literature regarding this. You can use function “pca()” for dimensionality reduction. Please refer to the following link to learn more about “pca()” : https://www.mathworks.com/help/stats/pca.html
Or, you can use tall arrays for storing data for hierarchical clustering. Tall arrays are designed for working with out-of-memory data. For more information refer : https://www.mathworks.com/help/stats/examples/statistics-and-machine-learning-with-big-data-using-tall-arrays.html
  1 commentaire
Sriparna Sen
Sriparna Sen le 12 Mar 2020
Hi Rajani! Thanks for your answer! I'll take a look at this!

Connectez-vous pour commenter.

Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by