## Distinguish between 2 variables using PCA

### moose (view profile)

on 9 Aug 2015
Latest activity Edited by Sagar

### Sagar (view profile)

on 10 Aug 2015
Hello, I am trying to understand the PCA function. I have 6 recordings of Heart Rate. Four of them from person A, and two of them from person B. Does PCA can help me somehow to distinguish between the 2 persons? I mean, when I do coeff = pca(signal_matrix); ('signal_matrix' is the matrix of my 6 recordings) what exactly can I get from the coeff matrix I receive? How should I interpret it?

### Sagar (view profile)

on 9 Aug 2015

PCA can certainly give some insights in your problem. Run PCA in your data and look at the different principal components. In you case, I guess the four variables would dominate in one principal component (presumably first) representing characteristics of A and the rest two variables would dominate another principal component (presumably second) representing characteristics of B. To make it more clear, when you look at the coefficients of the first principal component,the first four values should have higher values than the rest two. Similarly, in the second principal component, the last two coefficients should have higher values than the first four. Of course, I am presuming that A and B are distinguishable.

moose

### moose (view profile)

on 10 Aug 2015
Thank you Sagar. Can you please be more explicit (sorry, I am a bit new to this) - My coeff matrix is 6x6 (I've add a picture). What exactly should I look at? my input matrix is 3000x6, where the first 2 columns are recordings from person A, and the 4 last columns are from person B. Sagar

### Sagar (view profile)

on 10 Aug 2015
In you first principal component, second element has the highest weight (0.9975) so it means that this component represents the characteristics of the second recording for A. Similarly, in the second principal component, first value has the largest weight (0.9966) so it represents characteristics of second recording of A. Similarly look at the highest values in other columns. But most importantly, look at the percentage variance explained by using a complete formula, [coeff,score,latent] = pca(___), where latent is the variance explained by the principal components. First value in latent divided by the sum of all the values in latent gives you the % variance explained by the first principal component. From those values you can know which components are important and which you can choose to drop. For further understanding, read this post: https://onlinecourses.science.psu.edu/stat505/node/54