How to find the weight of PC_1 in my measurements, after doing PCA?

3 vues (au cours des 30 derniers jours)
Mark Golberg
Mark Golberg le 19 Sep 2022
Hi,
I'm trying to use the following code to understand PCA , SVD and it's releation:
% PCA_vs_SVD (my sand_box)
%% generate fake data points
my_fake_dataPoints = [-4 0 ; -2 1 ; -1 -1 ; 1 1 ; 3 2 ; 4 2];
% remove mean
my_fake_dataPoints_noMean = my_fake_dataPoints - mean(my_fake_dataPoints , 2);
% do PCA
[coeff , score , latent , tsquared , explained , mu] = pca(my_fake_dataPoints);
[coeff_noMean , score_noMean , latent_noMean , tsquared_noMean , explained_noMean , mu_noMean] = pca(my_fake_dataPoints_noMean);
% do SVD
[U , S , V] = svd(my_fake_dataPoints);
[U_noMean , S_noMean , V_noMean] = svd(my_fake_dataPoints_noMean);
%% plots
figure(1)
biplot(coeff , 'scores' , score , 'MarkerSize' , 30 , 'varlabels' ,{'var_1' , 'var_2'});
figure(2)
scatter(score(:,1) , score(:,2))
axis equal
xlabel('1st Principal Component')
ylabel('2nd Principal Component')
grid on
Have some questions:
1) How can I know the weight of PC_1 in my measurments? is it simply first column fo "score", or something else?
2) What's exactly the connection between the output of PCA and SVD? Which case should I compare, standard? with mean subtraction?
3) Am I missing something in the following:
PC1 = alpha_1 * v1 + alpha_2 * v2, right? my alphas are the first column of "coeff" variable, right?
So, the 1st data point projected on PC1 should be: 0.95 * (-4) + 0.28 * (0) = -3.8, right? But it doesn't match score(1,1), which is -4.23... what am I missing here?

Réponses (1)

Githin George
Githin George le 4 Oct 2023
Hello Mark,
I understand you have a few doubts related to PCA. To answer your queries:
  1. The "explained"/ "explained_noMean" variable contains the percentage weight of data, captured by each of the Principal Component (PC_1 and PC_2 in this case).
  2. PCA with standardized data yields the same result as doing SVD. I suggest you refer to the following answer know more: https://www.mathworks.com/matlabcentral/answers/774902-pca-vs-svd-or-eig-functions?s_tid=srchtitle_site_search_1_pca%20vs%20svd
  3. The equation "PC1 = alpha_1 * v1 + alpha_2 * v2" gives the projected value of original data point (v1,v2), on the principal axis. But note that "score" is a measure of correlation of data points to the corresponding PC. It does not equal to the projected value in the Principal Component.
I hope this addresses your queries.

Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by