Asked by Sepp
on 12 Dec 2015

Hello

I'm currently struggling with PCA and Matlab. Let's say we have a data matrix X and a response y (classification task). X consists of 12 rows and 4 columns. The rows are the data points, the columns are the predictors (features).

Now, I can do PCA with the following command:

[coeff, score] = pca(X);

As I understood from the matlab documentation, coeff contains the loadings and score contains the principal components in the columns. That mean first column of score contains the first principal component (associated with the highest variance) and the first column of coeff contains the loadings for the first principal component.

Is this correct?

But if this is correct, why is then X * coeff not equal to score?

Answer by the cyclist
on 12 Dec 2015

Accepted Answer

Maybe this script will help.

rng 'default'

M = 7; % Number of observations

N = 5; % Number of variables observed

X = rand(M,N);

% De-mean

X = bsxfun(@minus,X,mean(X));

% Do the PCA

[coeff,score,latent] = pca(X);

% Calculate eigenvalues and eigenvectors of the covariance matrix

covarianceMatrix = cov(X);

[V,D] = eig(covarianceMatrix);

% "coeff" are the principal component vectors. These are the eigenvectors of the covariance matrix. Compare ...

coeff

V

% Multiply the original data by the principal component vectors to get the projections of the original data on the

% principal component vector space. This is also the output "score". Compare ...

dataInPrincipalComponentSpace = X*coeff

score

% The columns of X*coeff are orthogonal to each other. This is shown with ...

corrcoef(dataInPrincipalComponentSpace)

% The variances of these vectors are the eigenvalues of the covariance matrix, and are also the output "latent". Compare

% these three outputs

var(dataInPrincipalComponentSpace)'

latent

sort(diag(D),'descend')

the cyclist
on 22 Mar 2019

Yes, bsxfun is a built-in function. It applies the element-wise operation, implicitly expanding either array, if necessary. With more modern versions of MATLAB, implicit expansion will happen automatically, so one could actually replace that line with

X = X - mean(X);

Jaime de la Mota
on 24 Jul 2019

This is very interesting, but a question comes to my mind. Coeffs are the eigenvectors and scores are the projection of the data in the principal component space.

Are these then equivalent to the eigenfunctions and random variables of the Karhunen-Loève expansion?

the cyclist
on 24 Jul 2019

That's a math question, not a MATLAB question. :-)

I don't really know, but this abstract -- I did not access or read the paper itself -- suggests that KL and PCA are not strictly equivalent.

Sign in to comment.

Answer by Yaser Khojah
on 17 Apr 2019

the cyclist
on 17 Apr 2019

Quoting from the first section of the documentation for the pca function.

"Each column of coeff contains coefficients for one principal component, and the columns are in descending order of component variance."

You can see that

var(dataInPrincipalComponentSpace)

has values in descending order.

Yaser Khojah
on 17 Apr 2019

Sign in to comment.

Answer by Greg Heath
on 13 Dec 2015

Hope this helps.

Thank you for formally accepting my answer

Greg

Sign in to comment.

Opportunities for recent engineering grads.

Apply Today
## 0 Comments

Sign in to comment.