MATLAB Answers

Sepp
2

How to apply PCA correctly?

Asked by Sepp
on 12 Dec 2015
Latest activity Commented on by the cyclist
on 24 Jul 2019
Hello
I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).
Now, I can do PCA with the following command:
[coeff, score] = pca(X);
As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.
Is this correct?
But if this is correct, why is then X * coeff not equal to score?

  0 Comments

Sign in to comment.

3 Answers

the cyclist
Answer by the cyclist
on 12 Dec 2015
 Accepted Answer

Maybe this script will help.
rng 'default'
M = 7; % Number of observations
N = 5; % Number of variables observed
X = rand(M,N);
% De-mean
X = bsxfun(@minus,X,mean(X));
% Do the PCA
[coeff,score,latent] = pca(X);
% Calculate eigenvalues and eigenvectors of the covariance matrix
covarianceMatrix = cov(X);
[V,D] = eig(covarianceMatrix);
% "coeff" are the principal component vectors. These are the eigenvectors of the covariance matrix. Compare ...
coeff
V
% Multiply the original data by the principal component vectors to get the projections of the original data on the
% principal component vector space. This is also the output "score". Compare ...
dataInPrincipalComponentSpace = X*coeff
score
% The columns of X*coeff are orthogonal to each other. This is shown with ...
corrcoef(dataInPrincipalComponentSpace)
% The variances of these vectors are the eigenvalues of the covariance matrix, and are also the output "latent". Compare
% these three outputs
var(dataInPrincipalComponentSpace)'
latent
sort(diag(D),'descend')

  9 Comments

the cyclist
on 22 Mar 2019
Yes, bsxfun is a built-in function. It applies the element-wise operation, implicitly expanding either array, if necessary. With more modern versions of MATLAB, implicit expansion will happen automatically, so one could actually replace that line with
X = X - mean(X);
Take a look at this CrossValidated answer about why centering (i.e. de-meaning) can be important.
This is very interesting, but a question comes to my mind. Coeffs are the eigenvectors and scores are the projection of the data in the principal component space.
Are these then equivalent to the eigenfunctions and random variables of the Karhunen-Loève expansion?
the cyclist
on 24 Jul 2019
That's a math question, not a MATLAB question. :-)
I don't really know, but this abstract -- I did not access or read the paper itself -- suggests that KL and PCA are not strictly equivalent.

Sign in to comment.


Answer by Yaser Khojah on 17 Apr 2019

Dear the cyclist, thanks for showing this example. I have a question regarding to the order of the COEFF since they are different than the V. Is there anyway to see which order of these columns? In another word, what are the variables of each column?

  2 Comments

the cyclist
on 17 Apr 2019
Quoting from the first section of the documentation for the pca function.
"Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance."
You can see that
var(dataInPrincipalComponentSpace)
has values in descending order.
i understand that but I do not see how the PC is related to the column of the original data (X). How can I know which variables from the original data has the strength impact?

Sign in to comment.


Greg Heath
Answer by Greg Heath
on 13 Dec 2015

Hope this helps.
Thank you for formally accepting my answer
Greg

  0 Comments

Sign in to comment.