Principal Component Analysis Reconstructing Centred Data

4 vues (au cours des 30 derniers jours)
BOB
BOB le 19 Avr 2019
Réponse apportée : Aditya le 27 Juin 2024
Hi, I have the following dataset which I have performed PCA on:
DATASET =
10.0000 6.0000
11.0000 4.0000
8.0000 5.0000
3.0000 3.0000
2.0000 2.8000
1.0000 1.0000
As I understand, the "score" output multiplied by the "coeff" output reconstructs the centered data. I assume by centred data it means fixing the data to the origin as descrbed in this tutorial video "https://www.youtube.com/watch?v=FgakZw6K1QQ"? If so why does my data when centred manually not equal the outputs of score*coeff? The score*coeff results in:
>> score*coeff
ans =
4.7231 -0.8092
4.2295 -2.9901
2.5422 -0.3156
-2.5933 1.3053
-3.4936 1.7842
-5.4078 1.0253
But then, the mean of the first column minus every value in that column (for the centred values of the first variable) and the mean of the second column minus every value in that column (for the centred values of the second variable) equals different values, even though this is presumably how you centre the data around the origin?
>> CentredVariable1 = mean(DATASET(:,1))-DATASET(:,1)
CentredVariable1 =
-4.1667
-5.1667
-2.1667
2.8333
3.8333
4.8333
>> CentredVariable2 = mean(DATASET(:,2))-DATASET(:,2)
CentredVariable2 =
-2.3667
-0.3667
-1.3667
0.6333
0.8333
2.6333

Réponses (1)

Aditya
Aditya le 27 Juin 2024
he discrepancy you are observing is due to a misunderstanding of how data centering works in the context of Principal Component Analysis (PCA). When you perform PCA, the data is centered by subtracting the mean of each column from the respective column values. However, it seems like you are subtracting the mean from the data incorrectly.
To center the data correctly, you should subtract the mean of each column from each element in that column.
Example code for the same is as follows:
% Original dataset
DATASET = [
10.0000 6.0000;
11.0000 4.0000;
8.0000 5.0000;
3.0000 3.0000;
2.0000 2.8000;
1.0000 1.0000
];
% Step 1: Calculate the mean of each column
mean_data = mean(DATASET);
% Step 2: Center the data by subtracting the mean
centered_data = DATASET - mean_data;
% Step 3: Perform PCA
[coeff, score, ~] = pca(DATASET);
% Step 4: Reconstruct the centered data
reconstructed_centered_data = score * coeff';
% Display results
disp('Original Centered Data:');
disp(centered_data);
disp('Reconstructed Centered Data:');
disp(reconstructed_centered_data);
% Verify that the centered data matches the reconstructed centered data
assert(isequal(round(centered_data, 4), round(reconstructed_centered_data, 4)), 'Centered data does not match reconstructed data.');
The correct way to center the data is:
centered_data = DATASET - mean(DATASET);
This ensures that each column of the data has a mean of zero, which is a crucial step before performing PCA.

Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by