How to apply PCA correctly?
Afficher commentaires plus anciens
Hello
I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).
Now, I can do PCA with the following command:
[coeff, score] = pca(X);
As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.
Is this correct?
But if this is correct, why is then X * coeff not equal to score?
1 commentaire
Sepp @Sepp
your doubt can be clarified by this tutorial (eventhough in another program context) .. specially after 5' in https://www.youtube.com/watch?v=eJ08Gdl5LH0
the cliclist
fabulous and generous explanation
Réponse acceptée
Plus de réponses (2)
Yaser Khojah
le 17 Avr 2019
2 votes
Dear the cyclist, thanks for showing this example. I have a question regarding to the order of the COEFF since they are different than the V. Is there anyway to see which order of these columns? In another word, what are the variables of each column?
8 commentaires
the cyclist
le 17 Avr 2019
Modifié(e) : the cyclist
le 17 Avr 2019
Quoting from the first section of the documentation for the pca function.
"Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance."
You can see that
var(dataInPrincipalComponentSpace)
has values in descending order.
Yaser Khojah
le 17 Avr 2019
i understand that but I do not see how the PC is related to the column of the original data (X). How can I know which variables from the original data has the strength impact?
Nyssa Capman
le 5 Jan 2020
Modifié(e) : Nyssa Capman
le 11 Mar 2020
I believe each row of coeff corresponds to the variables, in the order they were input as.
So, the first column has the coefficients for the 1st* PC, for each variable. The second column has the coefficints for the 2nd PC, for each variable, and so on.
This post is now several months old, and not really the original question, however I was also confused by this when getting started so I wanted to add this in case someone else is confused in the future and finds this post.
*[edited typo from '2nd' to '1st']
Image Analyst
le 5 Jan 2020
"So, the first column has the coefficients for the 2nd PC, for each variable. " ??? Huh? And this is supposed to reduce confusion?
Alex
le 31 Mar 2020
Hello,
I have some doubts on pca.
I have 2 variables with n observations each, and the coeff matrix is the following:
0.9999 -0.00944
0.0094 0.9999
As I understood, the first column represents the coefficient of the first principal component, 0.9999 is for the first variable in the initial matrix and 0.0094 for the second one.
But why the linear combination of coeff*variable does not give the same result as the first column of score?
Thank you
the cyclist
le 31 Mar 2020
As you can see in my code above it is
X * coeff
that should equal score, not
coeff * X
(where X is the de-meaned input to pca).
Yuan Luo
le 8 Nov 2020
why X need to be de-meaned? since pca by defualt will center the data.
the cyclist
le 26 Déc 2020
Sorry it took me a while to see this question.
If you do
[coeff,score] = pca(X);
it is true that pca() will internally de-mean the data. So, score is derived from de-meaned data.
But it does not mean that X itself [outside of pca()] has been de-meaned. So, if you are trying to re-create what happens inside pca(), you need to manually de-mean X first.
Greg Heath
le 13 Déc 2015
0 votes
Hope this helps.
Thank you for formally accepting my answer
Greg
Catégories
En savoir plus sur Dimensionality Reduction and Feature Extraction dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!