Defining preferences in non-linear least square solving?
3 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I'm trying to approximate the solution to a system of linear equations C*x=d. If I use commands like lsqlin or lsqnonneg, can I stipulate a preference for some variables? I.e. that it is more important to me that I accurately approximate some entry of x than another. Thanks!
0 commentaires
Réponses (2)
Star Strider
le 5 Nov 2016
Modifié(e) : Star Strider
le 5 Nov 2016
The lsqlin function will allow you to constrain them.
If you want to weight them, you would likely have to write that yourself, or use the nlinfit function that allows a weight vector and that you write your own objective function. (It is for nonlinear parameter estimation. This is overkill for a linear problem.)
Otherwise, you have to do it yourself:
C = ...; % Independent Variables
d = ...; % Dependent Variable Vector
wv = ...; % Weight Vector
W = diag(wv); % Weight Matrix
B = (C'*W*C)\C'*W*d; % Estimate Parameters
This is obviously UNTESTED CODE. The reference I used for this is in the Curve Fitting Toolbox documentation (that I do not have) on Weighted Least Squares.
---------------------------------------------------------------------
EDIT — (16:30 UTC 05 November 2016)
One possibility that just now occurred to me is to use a step-wise regression. This will tell you what variables in your equation are most important to explaining the relation between your independent and independent variables. See the documentation on stepwisefit and its friends for more information.
3 commentaires
Star Strider
le 5 Nov 2016
Thank you, John.
I agree with your analysis and comments.
I was looking for a function that did this automatically for linear fits, similar to what nlinfit does (although it’s relatively easy to do with the cost function for fminsearch as well). The only discussion I could find in the documentation was the section I linked to. I then just copied the equation in the LaTeX markup.
I thought a weighted linear fit option existed in the Statistics and Machine Learning Toolbox functions, but couldn’t find any. I even did an Internet search and couldn’t find anything other than the page I linked to. If you know of one, please post the link. Otherwise, I’ll submit an enhancement request that a weighting option be included in future releases for regress and some other linear regression functions.
John D'Errico
le 5 Nov 2016
I had to say something, because too often others will see an answer, and claim it must be so because you said it was. So I try to tell people not to use the normal equations, whenever I see them used.
The thing is, I'll claim there really is no way you can control how well you estimate a variable, at the expense of the others, for a fixed set of data in a multivariate linear regression. That can be impacted by design of experiments, but not after the fact in the analysis.
John D'Errico
le 5 Nov 2016
Modifié(e) : John D'Errico
le 5 Nov 2016
If you want to weight some VARIABLES, thus unknowns in the problem as being more important than others in a linear system, then
A*x = y
where some of the variables in the vector x are more important to you to estimate them well compared to others, this is difficult to do. Why?
Given the linear problem, for a non-singular matrix A, there is a unique solution that minimizes the sum of squares of residuals. That is given in MATLAB as:
x = A\y;
or
x= pinv(A)*y;
Either is acceptable, although they give you subtly different solutions for singular problems.
So now lets look at how you might achieve a variable weighting. Given a vector of weights (W) that describe the importance of one variable over another, we might write the problem as
Wd = diag(W); % create a diagonal matrix with the VARIABLE weights
Wdi = diag(1./W); diagonal, but with inverse elements
Now, re-write the problem as:
A*(Wd*Wdi)*x = y
You can see the two diagonal matrices will cancel each other out. Shuffling the parens around, we get
(A*Wd)*(Wdi*x) = y
Solve as
Wdi*x = (A*WD)\y;
recover x as:
x = Wdi\((A*Wd)\y;);
Effectively, we can think of this as amplifying some information in the matrix A as it applies to some variables, at the expense of information on the other variables.
Will it really help? Not really that much, I don't think. Suppose you decided that variable 1 was 2, or 5, or 10 times as important as variable 2? The solution for most problems has 16 digits in it. Linear least squares does not employ a tolerance. There is no need for one, as the solution is trivially obtained using the solution to a generally well-posed system of equations.
But suppose you decided to form a problem where the variables have a SERIOUS, SIGNIFICANT different in the importance? Thus, suppose variable 1 is 1e8 times as important as variable 2? Now you will create a fairly ill-conditioned linear system.
Forexample, suppose we wish to solve the two variable problem
A*x = y
where A is:
A = rand(1000,2);
y = rand(1000,1);
Yeah, the problem is complete crap, random numbers. Who cares? :) It is an example, and a valid one here.
Solve it using backslash:
x = A\y
x =
0.45356
0.41322
Choose some serious weights on the variables:
W = [1 1e8];
Wd = diag(W);
Wdi = diag(1./W);
What does this do to the condition number of the linear system?
cond(A)
ans =
2.6453
But when we transform the system...
cond(A*Wd)
ans =
1.4951e+08
That is significantly worse. But somewhat surprisingly, the solution is identical.
format long g
x = A\y
x =
0.453559554168612
0.413215264457575
xw = Wdi\((A*Wd)\y)
xw =
0.453559554168612
0.413215264457575
xw = Wdi\pinv(A*Wd)*y
xw =
0.453559554168612
0.413215264457576
Out to 16 digits, they are all the same. The problem is that again, least squares does not use a tolerance on the variables. Unless I made the variable weights so unequal that the problem started to fail, we would see little difference in the result. Had I made W = [1,1e17], then the problem would have been nasty.
W = [1 1e17];
Wd = diag(W);
Wdi = diag(1./W);
xw = Wdi\pinv(A*Wd)*y
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 1.000000e-17.
xw =
5.73965816909132e-35
0.757075064254075
As you can see, things have gone to hell there.
Ok, so given a fixed set of data, you cannot really assign some weights to a variable in a multi-variate linear regression.
What you can do is to improve your data. If some variable is more important than another, then design your problem to estimate that variable well. This gets into design of experiments, something I always left to statisticians who know the subject better than do I.
But it is also something that you may not be able to control. The data is what it is, and you just want a better estimate. Sadly, that is not an option. The data controls the information available to estimate your variables. You cannot come in after the fact and decide that you really want to estimate one variable better than another. Sorry, but not gonna happen.
For example, consider a really bad case. Lets estimate two variables, for the problem
y = a + b*x
Simple, I know. But, now lets make it a really bad set of data...
n = 100;
s = 1e-16;
x = 2 + rand(n,1)*s;
y = rand(n,1);
The classic solution in MATLAB is this:
ab = [ones(n,1),x]\y
Warning: Rank deficient, rank = 1, tol = 4.440892e-13.
ab =
0
0.257000898969274
Essentially, there is only one data point here, at x=2, with a TINY amount of variability. There is essentially no information available to estimate TWO unknowns, so backslash arbitrarily sets one of them to 0. We could have as arbitrarily have set the second variable to 0, then in theory we could have estimated variable 1.
Alternatively, we could have used the pinv solution, which generates a minimum norm solution, here:
ab = pinv([ones(n,1),x])*y
ab =
0.10280035958771
0.205600719175419
This answer is no better than the other in terms of information content. Sadly though, we cannot control how "well" we estimate one of the unknowns.
0 commentaires
Voir également
Catégories
En savoir plus sur Linear Least Squares dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!