Why does my mex-function for linear regression return r-squared values above 1?

In a mex-file I have a function called ols() that is supposed to calculate a linear regression for two vectors, 'x' and 'y'. Unfortunately, every now and then ols() returns r-squared values that are higher than 1, so at some point there must be an error in my calculations...but I can't figure out where. Any help would be be greatly appreciated.
std::vector<float> ols(const std::vector<float>& x, const std::vector<float>& y){
if(x.size() != y.size()){
mexErrMsgTxt("Error: The length of vector 'x' does not equal the length of vector 'y'.");
}
float sumx = 0.0;
float sumx2 = 0.0;
float sumxy = 0.0;
float sumy = 0.0;
float sumy2 = 0.0;
float n = x.size();
for (int i=0;i<n;i++){
sumx += x[i];
sumx2 += pow(x[i], 2);
sumxy += x[i] * y[i];
sumy += y[i];
sumy2 += pow(y[i], 2);
}
float denominator = (n * sumx2 - pow(sumx, 2));
float slope = (n * sumxy - sumx * sumy) / denominator;
float intercept = (sumy * sumx2 - sumx * sumxy) / denominator;
float r = (sumxy - sumx * sumy / n) /
sqrt((sumx2 - pow(sumx, 2)/n) *
(sumy2 - pow(sumy, 2)/n));
float rsquared = pow(r, 2);
std::vector<float> stats;
stats.push_back(slope);
stats.push_back(intercept);
stats.push_back(rsquared);
return stats;
}

 Réponse acceptée

James Tursa
James Tursa le 2 Oct 2018
Modifié(e) : James Tursa le 2 Oct 2018
I haven't run your code, but just looking at it you are using the worst algorithm numerically with single precision numbers, which is not a good combination. At the very least you should be using a better algorithm, and maybe doing the calculations in double precision. E.g., see this link:

5 commentaires

The reason for why I use floats instead of doubles is that I am dealing with very large 2D matrices (which are being passed to the mex Routine). If these matrices were stored as doubles I would run out of memory constantly.
James Tursa
James Tursa le 2 Oct 2018
Modifié(e) : James Tursa le 2 Oct 2018
So you've got good reasons to use single precision for your data, but you should still use a better algorithm and use double precision variables to do the accumulation etc calculations.
I will do that. Thanks for your suggestions.
James Tursa
James Tursa le 2 Oct 2018
Modifié(e) : James Tursa le 2 Oct 2018
As a start, maybe try the two pass algorithm since it is much better than the one pass algorithm you are currently using.
Thanks - I will give the two pass algorithm a go.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Descriptive Statistics dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by