pca versus princomp diferences in signs of coeffs (loadings) for cities dataset
10 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
So with the latest install of MatLab (2019B), I no longer have access to the function princomp from the old statistics toolbox. I ran into a problem switching to the recommended function pca and tried to diagnose it using the cities dataset (example dataset from princomp documentation). The principal component loadings are different between princomp and pca. The magnitude of the loadings are the same but the signs are sometimes different: pc1 same, pc2 - pc4 opposite signs, pc5-8 same and pc9 opposite sign. The pareto variance explained by each component plots are the same. Has anyone else noticed this? My code follows:
--------------------
close all
clear
clc
% PCA analysis of MatLab "cities" data
load cities
% display categories
categories
%%
% display names
names(1:3,:)
names(end-2:end,:)
%%
% display ratings
ratings(1:3,:)
%%
figure(1)
boxplot(ratings,0,'+',0)
set(gca,'Yticklabel',categories)
%%
stdr = std(ratings); % standard deviation of each rating category
sr = ratings./stdr(ones(329,1),:); % "standardize" each rating
figure(3)
boxplot(sr,0,'+',0)
set(gca,'Yticklabel',categories)
figure(4)
set(gcf, 'paperposition', [0.5 0.5 10.5 7.5])
set(gcf, 'paperorientation', 'landscape')
% use pca
subplot(2,1,1)
[pcs, newdata variances, t2, explained] = pca(sr,'Algorithm','svd','centered','on'); % principal components analysis
%
p3 = pcs(:,1:3); %subset the first three PCs
I = p3'*p3 % shows that the PCs are orthogonal*p3
bar(pcs) % plot the coefficients/loadings for the first 3 PCs;
set(gca,'Xticklabel',categories)
legend('PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9')
title('pca')
%use princomp
subplot(2,1,2)
clear pcs neqwdata variances t2
[pcs, newdata, variances, t2] = princomp(sr); % principal components analysis
% pcs = principal component coefficients, aka "loadings"
%
p3 = pcs(:,1:3); %subset the first three PCs
I = p3'*p3 % shows that the PCs are orthogonal*p3
bar(pcs) % plot the coefficients/loadings for the first 3 PCs;
set(gca,'Xticklabel',categories)
legend('PC1','PC2','PC3','PC4','PC5','PC6','PC7','PC8','PC9')
title('princomp')
%%
% [pcs, newdata, variances, t2] = pca(sr); % principal components analysis
% newdata = principal component coefficients, aka "loadings"
%
figure(5)
percent_explained = 100.*variances/sum(variances);
pareto(percent_explained) % make a “scree plot”
xlabel('Principal Component')
ylabel('Percent Explained, %')
1 commentaire
Spencer Chen
le 28 Jan 2020
I have used both, but I haven't compared them. You should find the corresponding factors also negated, in which case, the output is alright.
Blessings,
Spencer
Réponses (2)
John D'Errico
le 28 Jan 2020
Modifié(e) : John D'Errico
le 28 Jan 2020
The vectors generated are not unique to within a sign change. They can arbitrarily change sign, and nothing matters. (Not just one element can change, but the entire vector can have a sign flip.) This has been true since time began, well, at least since MATLAB began. Not a question of anyone "noticing". This is a known characteristic of these algorithms.
0 commentaires
Voir également
Catégories
En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!