Regression Technique Gives Bad Coefficients

I'm trying to perform an unfolding technique using regression techniques, and to be blunt, my abilities with matlab and with regression techniques in general is basic at best. I've tried using the lsqcurvefit command, but I'm running into one problem. When doing a fit of an equation such that Y=x1A1+x2A2+...xnAn, where A are data sets and Y is the response I'm trying to fit the summation to, Matlab is unwilling to return only positive coefficients. I've tried setting the lower and even upper bounds, as well as using the options=optimset but this does not seem to help. Instead of returning coefficients which makes sense, matlab will spit out several identical values with an e-14 magnitude and produce one or two spikes. If the lower bounds are not set to require that x be greater than or equal to 0. Can anyone suggest a fix for this, or am I just hitting my head against a wall here?
Example of the code I'm trying to get working, it was based on what I found in the help manual as well as examples online.
MDS_X = @(z_X,xdata_X)(z_X(1)*A0_121+z_X(2)*A0_167+z_X(3)*A0_217+z_X(4)*A0_293+z_X(5)*A0_360);
z0 = ones(1,5);
options = optimset('MaxIter',10,'Tolfun',1e-10)
[z_X,resnorm_X,residual_X] = lsqcurvefit(MDS_X,z0,t0_121,LR_C_PuBe,zeros(1,5),ones(1,5),options);

Réponses (1)

Star Strider
Star Strider le 11 Nov 2014
I’m slightly lost. I understand that your anonymous function:
MDS_X = @(z_X,xdata_X)(z_X(1)*A0_121+z_X(2)*A0_167+z_X(3)*A0_217+z_X(4)*A0_293+z_X(5)*A0_360);
picks up ‘A0_121’ and the other from your workspace, but where do you use your independent variable values, ‘xdata_X’? Your equation doesn’t seem to be a function of it, and that could be causing problems for lsqcurvefit, that is doing it best to fit your dependent variable to a function of your independent variable.

12 commentaires

Jess
Jess le 11 Nov 2014
That's what I get for trying to work on matlab at 1am... I think my grasp of the examples might have been confused.
The A's are constant data sets, each a unique matrix of 200x1 values. I want to find the values for the z_X matrix, which will fit the MDS_X function to another constant set of data, LR_C_PuBe.
Star Strider
Star Strider le 11 Nov 2014
It still has to be a function of ‘data_X’ for lsqcurvefit to be happy with it. How do you want to state it as a function of ‘data_X’?
Jess
Jess le 12 Nov 2014
Sorry, I think I'm missing the difference between z_X and data_X. MDS should be a function of z_X, unless I need to include an additional variable?
Star Strider
Star Strider le 12 Nov 2014
‘MDS_X’ must be a function of ‘data_X’ to work with lsqcurvefit.
Jess
Jess le 12 Nov 2014
Modifié(e) : Jess le 12 Nov 2014
Yes, you said this in your previous post. What I'm not understanding is what the difference is between z_X and xdata_X in the MDS_X = @(z_X,xdata_X) command. You identified xdata_X as the independent variable which must be included in the equation, but what is z_X then? Is this also an independent variable, or can I just remove z_X from the first line and change the remaining z_X terms to xdata_X to correct the problem?
As I read your objective function, ‘z_X’ are your vector of parameters that you are estimating. (The the format for MATLAB objective functions is that the parameter vector is the first argument, and the independent variable the second.) That means that by definition, ‘xdata_X’ is your independent variable. I have no idea what you’re doing, so I’m going on what I can understand from your code.
Ideally, your equation is something like:
z_X(1).*xdata_X + z_X(2);
(for instance for a linear regression with ‘z_X(1)’ the slope and ‘z_X(2)’ the intercept in this illustration) but obviously with your own constants and function definition.
Jess
Jess le 12 Nov 2014
Modifié(e) : Jess le 12 Nov 2014
Yes! That is very close to what I'm attempting. I'm trying to do multiple linear regression, which I thought lsqcurvefit was capable of. The point is to do a spectrum reconstruction based on fitting the summation of several known responses (R) to a given data set (N), in order to get a set of coefficients (S):
As you said, "z_X" is the parameter vector (S-values) which I'm estimating and trying to solve for. That should make "xdata_x" equivalent to the "A0_" data sets (R's), but I'm not sure how to properly incorporate this into the equation.
If R(i,j) are fixed constant matrices and S(j) are parameters, this may be an optimisation problem more suited to fminsearch than lsqcurvefit (a much as I like lsqcurvefit).
You objective function then becomes (with ‘xdata_X’ playing the role of N(j)) and the norm function taking the RMS value of the difference:
MDS_X = @(z_X,xdata_X) norm((z_X(1)*A0_121+z_X(2)*A0_167+z_X(3)*A0_217+z_X(4)*A0_293+z_X(5)*A0_360) - xdata_X);
and your call to fminsearch becomes:
[S,fv] = fminsearch(@(z_X) MDS_X(z_X,xdata_X), rand(5,1))
See if that works to estimate ‘z_X’ with reasonable accuracy.
Jess
Jess le 12 Nov 2014
Modifié(e) : Jess le 12 Nov 2014
I've never used fminsearch before. The code you provided appears to give identical results to those found by using the following though, so It looks like it does provide better accuracy than the lsqcurvefit command I was trying to implement.
A_mat = [A0_121 A0_167 A0_217 A0_293 A0_360];
S = A_mat\xdata_X;
Does "rand(5,1)" just provide a starting guess for the parameters collected in vector S? Also using the fminsearch reports several negative values in the S-vector. Is there a way to limit the lower bounds of S so that no single value is less than zero? I've looked at the use of
options = optimset()
but the only value restrictions fminsearch appears to allow for are NAN and complex values through FunValCheck.
The rand call substitutes for your initial parameter estimates. If you know the range the should be, substitute arbitrary values in those ranges for them, since those are likely to be closer to your optimal estimates.
The fmincon function will probably do what you want. You can use the ‘lb’ and ‘ub’ limits only and set everything else in the constraint argument list to empty arguments, for instance:
x = fmincon(fun,x0,A,b,Aeq,beq,lb,ub)
becomes for your purposes for example:
lb = zeros(5,1);
ub = inf(5,1);
x = fmincon(fun,x0,[],[],[],[],lb,ub)
I believe it uses the same optimset function, with different options.
Yes, fmincon uses the same optimset function and a slightly different array of options, but the tolerance and number of iterations can still be set using "Tolfun" and "MaxIter". I think fmincon may actually have been worse than using lsqcurvefit though. It returns the same coefficients that lsqcurvefit did, but now the fit is horrible when comparing plots of x_dataX and the summation of Rij*Sj.
xdata_X = LR_C_PuBe;
MDS_X = @(z_X,xdata_X) norm((z_X(1)*A0_121+z_X(2)*A0_167+z_X(3)*A0_217+z_X(4)*A0_293+z_X(5)*A0_360) - xdata_X);
z0 = zeros(1,5);
lb = zeros(1,5);
ub = inf(1,5);
options = optimset('MaxIter',1000,'Tolfun',1e-100)
[S,fv] = fmincon(@(z_X) MDS_X(z_X,xdata_X),z0,[],[],[],[],lb,ub,[],options)
MDS_Coeff = S;
MDS_size = [0.121 0.167 0.217 0.293 0.360];
Star Strider
Star Strider le 12 Nov 2014
I’m still not understanding what you’re doing, but if lsqcurvefit gives the best results, go with it.
I don’t understand your regression objective function, because it does not seem to be a function of ‘xdata_X’, and lsqcurvefit is calculating your ‘z_X’ parameters as though it is.

Connectez-vous pour commenter.

Produits

Question posée :

le 11 Nov 2014

Commenté :

le 12 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by