File Exchange

image thumbnail

Fast Principal Component Analysis for high dimensional data

version 1.0 (2.32 KB) by Dominik Blum
When analyzing very high-dimensional data, this implementation of Principal Component Analysis is much faster than MATLAB's pca.m.

6 Downloads

Updated 08 Aug 2019

View License

[COEFF,SCORE,LATENT,EXPLAINED] = fastpca(data)

Fast principal component analysis for very high dimensional data (e.g. voxel-level analysis of neuroimaging data), implemented according to C. Bishop's book "Pattern Recognition and Machine Learning", p. 570. For high-dimensional data, fastpca.m is substantially faster than MATLAB's in-build function pca.m.

According to MATLAB's PCA terminology, fastpca.m needs an input-matrix with each row represents an observation (e.g. subject) and each column a dimension (e.g. voxel). fastpca.m returns principal component (PC) loadings COEFF, PC scores (SCORE), variances explained by the PCs in absolute values (LATENT) and in percent (EXPLAINED). Additionally, fastpca returns the PC loading of the small covariance matrix (COEFFs).

Decrease in computation time results from calculating the PCs from the (smaller) covariance matrix of the transposed input-matrix "data" instead of the large covariance matrix of the original input matrix which are then use to project the observations to achieve the PCs of the large DxD covariance matrix.

By default, fastpca removes the mean of each observation. In this first implementation of fastpca, I skipped calculation of Hotelling’s T-Squared Statistic as I didn't need it so far.

Example:
In medical image analysis, there are often datasets with few to several hundreds of observations (subjects) and hundreds of thousands dimensions (voxels). As an example, I compare MATLABs PCA and fastpca using a random matrix with 300 rows (e.g. subjects) and 500000 columns (e.g. voxels):

data = rand(300,500000);

tic; [COEFF,SCORE,LATENT,~,EXPLAINED] = pca(data); toc
>> Elapsed time is 37.295108 seconds.

tic; [COEFF,SCORE,LATENT,EXPLAINED] = fastpca(data); toc
>> Elapsed time is 4.853614 seconds.

Version 1.0 from 08/08/2019.
Implemented by Dominik Blum.
E-Mail: dominik.blum@med.uni-tuebingen.de
Homepage: https://www.medizin.uni-tuebingen.de/de/das-klinikum/mitarbeiter/profil/284?search=dominik20Blum&mode=popup

Cite As

Dominik Blum (2019). Fast Principal Component Analysis for high dimensional data (https://www.mathworks.com/matlabcentral/fileexchange/72396-fast-principal-component-analysis-for-high-dimensional-data), MATLAB Central File Exchange. Retrieved .

Comments and Ratings (1)

MATLAB Release Compatibility
Created with R2017a
Compatible with any release
Platform Compatibility
Windows macOS Linux