Why pca on my matrix gives the first number in latent matrix greater than one?

1 vue (au cours des 30 derniers jours)
I have a 626284 by 26 matrix which is all zeros and ones. I did [coeff,score,latent] = pca(X) on my matrix but latent gave me the following numbers:
1.47069819212040
0.338544895320084
0.225716863688052
0.188056189419163
0.157949433440297
0.126385063251976
0.0906964951134501
0.0773105845697984
0.0738595589018172
0.0659590250255644
0.0616215954476751
0.0537688669401442
0.0262686347674844
0.0160550157883815
0.0112744279903577
0.0105353514551859
6.11095771880279e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
6.03879225801973e-33
5.96730010116445e-33
So what could be the reason?
Thank you for your guidance.

Réponse acceptée

David Goodmanson
David Goodmanson le 6 Mar 2019
Modifié(e) : David Goodmanson le 6 Mar 2019
Hi Penny,
Is there a reason that you think that a matrix of all ones and zeros can't have a latent value greater than 1? Here is a counterexample:
n = 50;
m = 20;
A = [ones(n,m);triu(ones(m,m));zeros(n,m)];
[coeff,score,latent] = pca(A);
rA =rank(A)
% results
latent =
4.4864
0.2955
0.0826
0.0379
0.0218
0.0143
0.0102
0.0077
0.0061
0.0050
0.0042
0.0036
0.0032
0.0029
0.0026
0.0025
0.0023
0.0022
0.0022
0.0021
rA = 20
The triu matrix was inserted so that every column is linearly independent, which sidesteps a potentially artificial trick situation where a lot of columns are identical. Matrix A has full rank of 20.
pca starts out by taking the mean of each column, so the idea here was to make the excursions from the mean as large as possible. WIth only 1 and 0 avialable, this means creating columns that are half ones and half zeros (or close to it). After that, constructing a bunch of columns that are nearly parallel puts most of the deviation along a single axis.
  3 commentaires
David Goodmanson
David Goodmanson le 8 Mar 2019
Hi Penny,
There is plenty of information out there, starting with 'help pca' and then wikipedia, but in brief: yes the latent matrix is as you say, but there is no reason the variances need to be small. Variances are just the average value of a sum of squares of deviations from the mean, and they can be large. If you take a set of data and multiply all the values by 10, the variance goes up by a factor of 100. It's not like the correlation coefficient, which is normalized and comes out between +-1.
The latent variable is as you say. Coefficients are components of the principal axes, which are unit vectors. So the sum of squares of each column in the component matrix = 1. Scores are the variances for each measurement (row) along the principal axes.
Penny13
Penny13 le 11 Mar 2019
Thank you so much for your answer,David.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by