getting a NaN in correlation coefficient

Question

Sumera Yamin le 19 Fév 2020

1
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/506464-getting-a-nan-in-correlation-coefficient

Commenté : Adam Danz le 18 Mar 2021

Hi, i have a simple problem which unfortunately i am unable to understand.

I have matrices and i am trying to calculate correlation coefficient between two variables. A simple example from my code is attched. Why am i getting a NaN here. What does this implies

x=[-7.501899598769999514e-04;-6.501899598769999514e-04;-5.501899598769999514e-04];
y=[-0.414;-0.414;-0.414];
c11=corr2(x,y)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Adam Danz le 19 Fév 2020

6
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/506464-getting-a-nan-in-correlation-coefficient#answer_416400

Modifié(e) : Adam Danz le 27 Août 2020

Ouvrir dans MATLAB Online

When NaNs appear in the output but are not present in the inputs

Notice that all of the values in y are identical y=[-0.414; -0.414; -0.414];

If you look at the equations for corr2() or Pearson's corr() you'll notice that both have a term in the denominator that subtracts the mean of y from each y-value. When each value of y is identical, the result is a vector of 0s. When you divide by zero, you get NaN.

Another way of putting it, the standard deviation of x or y cannot be 0. When you have a vector of identical values, the std is 0.

The NaN, in this case, is interpretted as no correlation between the two variables. The correlation describes how much one variable changes as the other variable changes. That requires both variables to change.

NaN values in the inputs spreading to the outputs

For r=corr2(x,y):

When there is 1 or more NaN values in the inputs, to corr2(x,y), the output will be NaN. Fill in the missing data before computing the 2D correlation coefficient.

For r=corr(x):

A single NaN value in position (i,j) of input matrix x will result in a full row of NaN values at row i and a full column of NaN values in column j of the output matrix r (see explanation).

x = [
     6     5     1
     3   NaN     9
     5     3     7
     9     5     5 ];
 
 r = corr(x)
            1          NaN     -0.52699
          NaN          NaN          NaN
     -0.52699          NaN            1

For r=corr(x,y):

A single NaN value in position (i,j) of either x or y inputs will results in a column of NaN values in column j of the output matrix r.

x = [
     9     5     1
     1     4     4
     2     6     4
     2     5     9 ];
y = [
     6     5     1
     3   NaN     9   
     5     3     7
     9     5     5 ];
 
 r = corr(x,y)
       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Ignoring NaNs in corr() inputs

The rows option in corr() can be set to complete or pairwise which will ignore NaN values using different methods.

'rows','complete' removes the entire row if the row contains a NaN. In other words, it will remove row 2 from both x and y input matrices. Using the same inputs above,

r = corr(x,y,'rows','complete')
     -0.27735          0.5     -0.94491
     -0.69338           -1      0.75593
      0.81224      0.14286      0.53995
      
      
r2 = corr(x,y) % for comparison
       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Notice that this changes all of the correlation values since the entire row #2 was removed from both inputs x and y. To confirm that, we can remove those rows and recompute the correlation matrix.

% Remove row 2 which contains a NaN in y
r3 = corr(x([1,3,4],:) ,y([1,3,4],:));  
     -0.27735          0.5     -0.94491
     -0.69338           -1      0.75593
      0.81224      0.14286      0.53995

Voila! Outputs r and r3 match.

'rows','pairwise' only removes rows only if a NaN appears in the pairing of two columns. For the same x, y inputs as above, the correlation with columns in x paired with the 2nd column in y will omit the NaN and will be based on the remaining 3 values. All other column-paired correlations will use all 4 rows of values.

r = corr(x,y,'rows','pairwise')
       0.1623          0.5     -0.92394
       0.3266           -1     -0.23905
      0.62312      0.14286      0.32367
      
      
r2 = corr(x,y) % for comparison
       0.1623          NaN     -0.92394
       0.3266          NaN     -0.23905
      0.62312          NaN      0.32367

Notice that values in columns 1 and 3 haven't changed since they do not involve column #2 in y. To confirm the correlation values in column 2 of r,

% Remove row 2 which contains a NaN in y
r3 = corr(x([1,3,4],:) ,y([1,3,4],:)); 
% Replace NaN column in r2 with new r values
r2(:,2) = r3(:,2)
       0.1623          0.5     -0.92394
       0.3266           -1     -0.23905
      0.62312      0.14286      0.32367

Voila! Updated output r2 matches r.

10 commentaires
Afficher 8 commentaires plus anciensMasquer 8 commentaires plus anciens

Sumera Yamin le 19 Fév 2020

Ouvrir dans MATLAB Online

i am trying to find correlation between a large set of variables and then trying to represent the dependence using correlation coefficient matrix. One strange thing i observed in my study is that if the values of variables doen not change, i gives a NaN, which effectively means no correlation, however, if there is very small differnce, the correlation coefficient is near 1 deoicting strong correlation, although i would assume it to be having weak correlation with values near to zero not to one. I compare the three cases in example below.

x=[-7.501899598769999514e-04;-6.501899598769999514e-04;-5.501899598769999514e-04];
y=[-0.414;-0.414;-0.414];
z=[0.571;0.57;0.571];
t=[4.54;4.59;4.55];
s=[4.39;4.40;4.42];
cy=corr2(x,y)  %no change
cz=corr2(x,z)  % very small but symmetric change w.r.t central value. central value is design value for all cases
ct=corr2(x,t)  % significant change
cs=corr2(x,s)  % small change
%However the correlation coefficients respectively are NaN,0,0.189 and 0.982. Although from data, i would expect them to be 0, near to 0, close to 1 and between 0-5 respectively. Any idea why is there this apparent discrepancy between what i expect by looking at data and what i get from computation?

Adam Danz le 19 Fév 2020

Modifié(e) : Adam Danz le 20 Fév 2020

Ouvrir dans MATLAB Online

Why are you using corr2() with column vectors? Use corr() instead.

[rho,pval] = corr(X,Y) returns the p-value for each rho showing the significance of the correlation.

I'm not surprised by these results. Keep in mind you only have 3 data points for each comparison.

x=[-7.501899598769999514e-04;-6.501899598769999514e-04;-5.501899598769999514e-04];
y=[-0.414;-0.414;-0.414];
z=[0.571;0.57;0.571];
t=[4.54;4.59;4.55];
s=[4.39;4.40;4.42];
[cy, py]=corr(x,y)  %no change
[cz, pz]=corr(x,z)  % very small but symmetric change w.r.t central value. central value is design value for all cases
[ct, pt]=corr(x,t)  % significant change
[cs, ps]=corr(x,s)  % small change
tiledlayout(2,2)
nexttile
plot(x,y,'ro')
lsline
title(sprintf('x y r=%.3f p=%.3f', cy, py))
nexttile
plot(x,z,'ro')
lsline
title(sprintf('x z r=%.3f p=%.3f', cz, pz))
nexttile
plot(x,t,'ro')
lsline
title(sprintf('x t r=%.3f p=%.3f', ct, pt))
nexttile
plot(x,s,'ro')
lsline
title(sprintf('x s r=%.3f p=%.3f', cs, ps))

Adam Danz le 24 Fév 2020

No. The first output is the correlation coefficient and the 2nd output is the p-value indicating the statistical significance of the correlation.

Check out the descriptions of the outputs here

https://www.mathworks.com/help/stats/corr.html#d118e240043

Sumera Yamin le 28 Août 2020

many thanks for this excellent explaination. I understand it very well now.

Connectez-vous pour commenter.

Answer 2

Jan Pokorny le 18 Mar 2021

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/506464-getting-a-nan-in-correlation-coefficient#answer_651537

Ouvrir dans MATLAB Online

Hi, I tried this funkcion as bellow, but it returns all NaN. I don't know why, any idea?

I then lately solved this by making cov() and then corrcov(), but it shouldn't go like that.

Data=[3 0 2 0 3 11 7 6 6 25 31 22 25 48 109 85 67 110 205 124 158 114 126 185 291 259 373 262 160 184 307 281 269 332 282 115 235 195 295];
for Zp=0:21
    X=Data(1,1+Zp:39);
    Y=Data(1,1:39-Zp);
    RMat=corr(X,Y);
    Koef(1+Zp)=RMat(1,2);   
end
plot(0:Zp,Koef,'red*-')

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Adam Danz le 18 Mar 2021

Ouvrir dans MATLAB Online

From the documentation,

rho = corr(X,Y) returns a matrix of the pairwise correlation coefficient 
between each pair of columns in the input matrices X and Y.

Note the word columns. You're using rows.

Instead,

RMat=corr(X(:),Y(:))

Connectez-vous pour commenter.

getting a NaN in correlation coefficient

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

10 commentaires
Afficher 8 commentaires plus anciensMasquer 8 commentaires plus anciens

Plus de réponses (1)

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

getting a NaN in correlation coefficient

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

10 commentaires Afficher 8 commentaires plus anciensMasquer 8 commentaires plus anciens

Plus de réponses (1)

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

10 commentaires
Afficher 8 commentaires plus anciensMasquer 8 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens