Covariance matrix and principal components

7 vues (au cours des 30 derniers jours)
Jaime  de la Mota
Jaime de la Mota le 17 Juil 2019
Hello everyone. I have some questions about the use of cov and pca. According to the help (https://la.mathworks.com/help/stats/pca.html) the rows of the matrix X are observations and columns are variables. The following code generates a gaussian random process with 2001 observations and 500 variables:
close all
clear
clc
[X,Y] = meshgrid(0:0.0005:1,0:0.0005:1);
Z=exp((-1)*abs(X-Y));
tam=size(X, 1);
number_realizations=500;
cov_mat=Z;
[evec, evalM]=eig(cov_mat');
eval=diag(evalM);
eval=eval(end:-1:1);
evec=evec(:,end:-1:1);
figure
plot(evec(:,1:5))
figure
plot(eval(1:5), 'X')
realizacion=zeros(tam,1);
figure
for num_real=1:number_realizations
for num_evec=1:tam
evec_rand(:,num_evec)=sqrt(eval(num_evec))*normrnd(0,1)*evec(:,num_evec);
end
realizacion=sum(evec_rand,2);
hold on
plot(realizacion)
realization_mat(:,num_real)=realizacion;
end
The result is a 2001 by 500 matrix,where the rows are observations and columns are variables. However, if I perform PCA and plot the results
[coeff,score,latent,tsquared,explained,mu] = pca(realization_mat,'Centered',false);
figure
plot(score(:,1:6))
for i=1:5
figure
hist(coeff(:,i))
end
The obtained figures don't match with the theory. The scores (2001 by 500) if plotted look like the analytic eigenvectors of the analytic covariance matrix and the coeffs if plotted as histograms, look like gaussian random variables when the dimension of the matrix is (500 by 500). This confuses me, since it is all wrong; the random variables should be in score and the eigenvectors in coeff. Any hindishgt in this is extremelly appreciated.
If I perform the PCA as
[coeff2,score2,latent2,tsquared2,explained2,mu2] = pca(realization_mat','Centered',false);
Then, the coeffs2 look like eigenvectors and the scores2 if plotted by hist as random variables, but the dimensions are then wrong; coeff2 is a (2001 by 500) and score2 is a (500 by 500).
If I perform the KL by hand as
[W, EvalueMatrix] = eig(cov(realization_mat));
Evalues = diag(EvalueMatrix);
Evalues = Evalues(end:-1:1);
W = W(:,end:-1:1);
figure
plot(W(:,1:6))
Then the eigenvectors (W) don't look like the analytic eigenvectors from the analytic covariance matrix. However if I instead calculate cov of realization_mat', the transposed, then the eigenvectors (W2) look like the analytic ones. However, on the help (https://la.mathworks.com/help/matlab/ref/cov.html) it is stated that the rows of A should be observations and the columns, variables; therefore, eig(cov(realization_mat)) should offer good eigenvectors, no teig(cov(realization_mat)). Any hindsight on this is also welcome.
Thanks.
Jaime.

Réponses (1)

Dheeraj Singh
Dheeraj Singh le 6 Août 2019
When you are using the following code for pca:
[coeff2,score2,latent2,tsquared2,explained2,mu2] = pca(realization_mat','Centered',false);
Here realization_mat'matrix is a 500x2001 matrix which means the number of observations < number of variables.
You can refer to the following MATLAB ANSWER to know more about PCA, when number of variables are more than number of observations:

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by