MATLAB Answers

0

PCA scaling and centering documentation wrong?

Asked by Ari Paul on 24 Mar 2015
Latest activity Answered by the cyclist
on 26 Jun 2019 at 21:03
The pca() documentation says that the raw data is automatically centered at the start of the process. If true, then pca(X) should be equal to pca(Y), where Y = centered data. But they're not (specific data below). Additionally, when I use either eig() or svd() to compute the principal components, I can only get them to match the pca output when I first manually center the data before using pca(). Ultimately my question is simply how do I correctly calculate the principal components of raw data? I.e. do I need to manually center and scale it first? Only manually center? Only manually scale?
Sample data: X =
1.0000 -3.0000 -1.0000; 2.0000 -2.0000 -0.5000; 3.0000 -0.5000 0.2500; 4.0000 2.0000 1.0000; 5.0000 5.0000 2.5000;
Centering X -> Y= -2.0000 -3.3000 -1.4500; -1.0000 -2.3000 -0.9500; 0 -0.8000 -0.2000; 1.0000 1.7000 0.5500; 2.0000 4.7000 2.0500;
pca(X) = -0.7360 -0.6037 -0.3062; -0.6688 0.7186 0.1907; -0.1049 -0.3452 0.9327;
pca(Y) =
0.4058 0.8414 0.3569
0.9124 -0.3960 -0.1036
0.0542 0.3676 -0.9284
svd(Y) = 0.4058 0.9124 0.0542; 0.8414 -0.3960 0.3676; 0.3569 -0.1036 -0.9284;
eig(cov(Y)) = 0.0542 0.9124 0.4058; 0.3676 -0.3960 0.8414; -0.9284 -0.1036 0.3569; ^this is the same output just in a different order.

  0 Comments

Sign in to comment.

2 Answers

Answer by Sagar
on 9 Aug 2015

You got it little wrong. When you do PCA(Y), by default, PCA again centers the data. So if you want to get the same values as PCA(X), use 'centered', 'off' name-value pair option: PCA_of_Y = PCA (Y, 'centered', 'off'); Now it will definitely be equal to PCA(X).

  0 Comments

Sign in to comment.


Answer by the cyclist
on 26 Jun 2019 at 21:03

Answering a gazillion years after-the-fact, because I just turned this up in my own search.
X = [1.0000 -3.0000 -1.0000; 2.0000 -2.0000 -0.5000; 3.0000 -0.5000 0.2500; 4.0000 2.0000 1.0000; 5.0000 5.0000 2.5000];
Y = X - mean(X);
pca(X)
pca(Y)
both give the same PCA results for me, in MATLAB Online (as of when I answered this).
So, either something got fixed, or you made a mistake.

  0 Comments

Sign in to comment.