Covariance matrix and principal components

Asked by Jaime de la Mota

Jaime de la Mota (view profile)

on 17 Jul 2019
Latest activity Answered by Dheeraj Singh

Dheeraj Singh (view profile)

on 6 Aug 2019
Hello everyone. I have some questions about the use of cov and pca. According to the help (https://la.mathworks.com/help/stats/pca.html) the rows of the matrix X are observations and columns are variables. The following code generates a gaussian random process with 2001 observations and 500 variables:
close all
clear
clc
[X,Y] = meshgrid(0:0.0005:1,0:0.0005:1);
Z=exp((-1)*abs(X-Y));
tam=size(X, 1);
number_realizations=500;
cov_mat=Z;
[evec, evalM]=eig(cov_mat');
eval=diag(evalM);
eval=eval(end:-1:1);
evec=evec(:,end:-1:1);
figure
plot(evec(:,1:5))
figure
plot(eval(1:5), 'X')
realizacion=zeros(tam,1);
figure
for num_real=1:number_realizations
for num_evec=1:tam
evec_rand(:,num_evec)=sqrt(eval(num_evec))*normrnd(0,1)*evec(:,num_evec);
end
realizacion=sum(evec_rand,2);
hold on
plot(realizacion)
realization_mat(:,num_real)=realizacion;
end
The result is a 2001 by 500 matrix,where the rows are observations and columns are variables. However, if I perform PCA and plot the results
[coeff,score,latent,tsquared,explained,mu] = pca(realization_mat,'Centered',false);
figure
plot(score(:,1:6))
for i=1:5
figure
hist(coeff(:,i))
end
The obtained figures don't match with the theory. The scores (2001 by 500) if plotted look like the analytic eigenvectors of the analytic covariance matrix and the coeffs if plotted as histograms, look like gaussian random variables when the dimension of the matrix is (500 by 500). This confuses me, since it is all wrong; the random variables should be in score and the eigenvectors in coeff. Any hindishgt in this is extremelly appreciated.
If I perform the PCA as
[coeff2,score2,latent2,tsquared2,explained2,mu2] = pca(realization_mat','Centered',false);
Then, the coeffs2 look like eigenvectors and the scores2 if plotted by hist as random variables, but the dimensions are then wrong; coeff2 is a (2001 by 500) and score2 is a (500 by 500).
If I perform the KL by hand as
[W, EvalueMatrix] = eig(cov(realization_mat));
Evalues = diag(EvalueMatrix);
Evalues = Evalues(end:-1:1);
W = W(:,end:-1:1);
figure
plot(W(:,1:6))
Then the eigenvectors (W) don't look like the analytic eigenvectors from the analytic covariance matrix. However if I instead calculate cov of realization_mat', the transposed, then the eigenvectors (W2) look like the analytic ones. However, on the help (https://la.mathworks.com/help/matlab/ref/cov.html) it is stated that the rows of A should be observations and the columns, variables; therefore, eig(cov(realization_mat)) should offer good eigenvectors, no teig(cov(realization_mat)). Any hindsight on this is also welcome.
Thanks.
Jaime.

Products

Answer by Dheeraj Singh

Dheeraj Singh (view profile)

on 6 Aug 2019

When you are using the following code for pca:
[coeff2,score2,latent2,tsquared2,explained2,mu2] = pca(realization_mat','Centered',false);
Here realization_mat'matrix is a 500x2001 matrix which means the number of observations < number of variables.
You can refer to the following MATLAB ANSWER to know more about PCA, when number of variables are more than number of observations: