# problem when performing PCA on large dataset

4 vues (au cours des 30 derniers jours)
Mariem Harmassi le 2 Mai 2013
Hello every body i am trying to perform PCA on a large dataset 1000*1290240 i heard about iterative PCA : 1-calculate the the mean values per column 2-Calculate the covariance matrix: # Calculate all cross-products # Save those crossproducts in a variable # Repeat 1-2 until end of file. # divide by the number of rows minus 1 to get the covariance. I tried the cross function cross(A,B) but what will be the second term ,A is my subdataset . Can someone help to solve this problem
##### 1 commentaireAfficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens
Mariem Harmassi le 2 Mai 2013
Please can someone help me

Connectez-vous pour commenter.

### Réponses (1)

Ilya le 2 Mai 2013
If you can load the dataset into memory, the rest should be easy. Here is how you compute coefficients and scores for wide data (many variables and just a few observation):
X = rand(100,1000000);
X = bsxfun(@minus,X,mean(X)); % center
[U,D] = eig(X*X');
>> rank(D)
ans =
99
invS = diag(D).^(-1/2);
[~,imax] = max(invS);
invS(imax) = 0;
coeff = X'*U*diag(invS);
score = X*coeff;
Note that for 100 observations you can have at most 99 principal components after centering X. This is why you need to set the inverse of the smallest eigenvalue to zero. You may need to set more inverse values to zero for your data.
##### 2 commentairesAfficher AucuneMasquer Aucune
Mariem Harmassi le 2 Mai 2013
i have 10000 observation and 1290240 variables and i cannot load all the matrix so i have to treat dataset by subsets .How can i calculate the covarience matrix iteratively ? One way consists on calculating the cross products iteratively but how can i do this ?
Ilya le 2 Mai 2013
The covariance matrix, X'*X, would be 1290240-by-1290240 and most certainly not fit into memory. Instead of computing the covariance matrix, compute X*X', which is only 10000-by-10000, and then follow the prescription above. An element (I,J) of X*X' is determined by the dot product of rows I and J in X. Load two rows, take their dot product and store somewhere. Repeat for all pairs of rows. Before you do that, you need to center X (or decide if you need centering).

Connectez-vous pour commenter.

### Catégories

En savoir plus sur Dimensionality Reduction and Feature Extraction dans Help Center et File Exchange

### Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by