how to perform ols regression using combinations of independent variable?
5 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi!
I have been struggling for a while with the following problem.
Suppose we have y as a dependent variable and x1,...,xn as exogenous variables (n>7).
What I want to do is try to see which combination of exogenous variables gives best fit for y ...
So, if we have, for example, 3 exogenous variables, I would like to see which of the following regressions is best for fitting y (assuming that I know what statistic I will be using to discriminate between a "good" model from a "bad one"):
y~x1 ;
y~x2 ;
y~x3 ;
y~x1+x2 ;
y~x1+x3 ;
y~x2+x3 ;
y~x1+x2+x3
For only 3 variables, it is not that complicated (2^3-1 possibilities). The problem appears when I begin introducing more and more exogenous variables (2^7-1 = 127). How can I do it (somehow automatically) for all combinations when number of exogenous is big (>7)?
Thanks for your help!
Cheers!
0 commentaires
Réponses (3)
Image Analyst
le 29 Nov 2014
Why not just use all of them and let the regression figure out how to weight the different xn?
y = alpha0 + alpha1 * x1 + alpha2 * x2 + alpha3 * x3
You can't use polyfit() but you can use the standard least squares formula
alpha = inv(x' * x) * x' * y; % Get estimate of the alphas.
Where x = an N rows by 4 columns matrix.
1, x1(1), x2(1), x3(1)
1, x1(2), x2(2), x3(2)
1, x1(3), x2(3), x3(3)
1, x1(4), x2(4), x3(4)
...
1, x1(N), x2(N), x3(N)
If one of the xn is not a good predictor, it should have a small alpha weight.
1 commentaire
Matt J
le 29 Nov 2014
Modifié(e) : Matt J
le 29 Nov 2014
You can't use polyfit() but you can use the standard least squares formula
No, don't do that. Just do
alpha=x\y;
for better conditioning. However, I assume that the OP's case is really more complicated, and that the x matrix does not have full column rank.
Star Strider
le 29 Nov 2014
You are describing a stepwise multiple linear regression. It is a well-known, established technique, and the statistical procedure for adding and removing variables to get the best fit is not trivial.
If you have the Statistics Toolbox, see the documentation for Stepwise Regression and specifically stepwiselm, stepwise, and stepwisefit.
With 127 variables, and especially if you have a large data set, it is going to take some time. Have something else to do for a few minutes while the regression runs.
0 commentaires
Matt J
le 29 Nov 2014
Modifié(e) : Matt J
le 29 Nov 2014
As ImageAnalyst says, performing an OLS regression with the entire data set should give you the unique best regression in one step, unless your x1,...,xn are over-complete.
If they are over-complete, and you are looking for the sparsest solution, the Matching Pursuit algorithm seems to be the standard alternative to an exhaustive search. There are several implementations on the File Exchange, but I've never used any of them:
Also, the solution is not guaranteed to be globally sparsest - the price paid for not doing an exhaustive search, it seems.
0 commentaires
Voir également
Catégories
En savoir plus sur Linear and Nonlinear Regression dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!