Fittype issue: unrealistic results and problems with upper and lower bounds

12 vues (au cours des 30 derniers jours)
Hi me and my colleague are working on a problem to solve for the a, b, and d coefficients in the following equation:
y = s*[a*exp(-x/b)+(1-a)*exp(-x/d)];
where
x (x2 in the code) = transposed vector E2=[7.0, 14.0, 28.0, 42.0, 70.0, 90.0, 112.0, 180.0, 224.0, 327.0], later called E2T;
y (y2 in the code) = transposed vector M2 = [59012.590, 38612.883, 18706.516, 11165.926, 5639.9341, 3804.1899, 2776.5547, 1011.2273, 919.47906, 829.12860 ], later called M2T;
We're trying to use fittype function, which is the following:
ft = fittype( 's*[a*exp(-x/b)+(1-a)*exp(-x/d)]');
opts = fitoptions( 'Method', 'NonlinearLeastSquares', 'StartPoint', [60000, 0.8, 100, 1000], 'Upper', [inf, 1, 2000, 2000], 'Lower', [100, 0.1, 1, 1]);
[fitresult2, gof2] = fit(x2,cast(y2, "double"),ft,opts);
f = fit(x2,cast(y2, "double"),ft,opts);
% Assign results to coefficients
t2w_1=fitresult2.b;
t2w_2=fitresult2.d;
ampw1=fitresult2.a;
ampw2=1-fitresult2.a;
Our goal is to obtain the values of ampw1, ampw2, a specifically of t2w_1 and t2w_2.
The problem is that my colleague uses the same equation, initial guesses and bounds and gets a perfect fit in Excel, while my MATLAB code fails to do so and even ignores the bounds I set for "a". I am obtaining terrible results: "a" is supposed to be between 0.1 and 1 while I get something like 590 instead. One thing we found is that MATLAB probably tries to "normalize" the data somehow before the fit which might be part of the problem, but I am not sure how to resolve the issue. I tried other functions than fittype but the issue persists. Please help me fix this problem so that I can get a realistic fit.
I'd appreciate your help. Thanks in advance.
  2 commentaires
the cyclist
the cyclist le 2 Mar 2024
It's difficult to help you debug this without being able to run your code. Since you don't tell us the values of the components of y, we can't do that. Can you upload the data?
Edham
Edham le 2 Mar 2024
I've just uploaded the data for y. I hope it is more clear now.

Connectez-vous pour commenter.

Réponse acceptée

Star Strider
Star Strider le 2 Mar 2024
Modifié(e) : Star Strider le 2 Mar 2024
The bounds may not be set correctly. Note that while ‘a’ is supposed to be between 0.1 and 1, your bounds for it are between 100 and Inf. The names appear in the order returned by the coeffnames function, and the bounds are with repsect to those names.
Try this —
x2 = [7.0, 14.0, 28.0, 42.0, 70.0, 90.0, 112.0, 180.0, 224.0, 327.0].';
y2 = [59012.590, 38612.883, 18706.516, 11165.926, 5639.9341, 3804.1899, 2776.5547, 1011.2273, 919.47906, 829.12860 ].';
ft = fittype( 's*(a*exp(-x/b)+(1-a)*exp(-x/d))');
Names = coeffnames(ft)
Names = 4×1 cell array
{'a'} {'b'} {'d'} {'s'}
opts = fitoptions( 'Method', 'NonlinearLeastSquares', 'StartPoint', randn(4,1)*1E2)%, 'Upper', [inf, 1, 2000, 2000], 'Lower', [100, 0.1, 1, 1]);
opts =
nlsqoptions with properties: StartPoint: [-81.4193 -26.5274 68.7417 -174.9159] Lower: [] Upper: [] Algorithm: 'Trust-Region' DiffMinChange: 1.0000e-08 DiffMaxChange: 0.1000 Display: 'Notify' MaxFunEvals: 600 MaxIter: 400 TolFun: 1.0000e-06 TolX: 1.0000e-06 Robust: 'Off' Normalize: 'off' Exclude: [] Weights: [] Method: 'NonlinearLeastSquares'
[fitresult2, gof2] = fit(x2,cast(y2, "double"),ft,opts)
fitresult2 =
General model: fitresult2(x) = s*(a*exp(-x/b)+(1-a)*exp(-x/d)) Coefficients (with 95% confidence bounds): a = 0.02498 (-0.01766, 0.06762) b = -8.269e+06 (-5.99e+11, 5.99e+11) d = 17.79 (14.22, 21.37) s = 8.517e+04 (7.597e+04, 9.436e+04)
gof2 = struct with fields:
sse: 1.3851e+07 rsquare: 0.9960 dfe: 6 adjrsquare: 0.9940 rmse: 1.5194e+03
f = fit(x2,cast(y2, "double"),ft,opts);
figure
plot(f, x2, y2)
grid
The fit is good, although only ‘d’ and ‘s’ are statistically different from zero, however the magnitudes of all the parameters appear to be appropriate.
.
  4 commentaires
Star Strider
Star Strider le 2 Mar 2024
@Edham — As always, my pleasure!
As @the cyclist noted, I did not intially set any boundaries, since I wanted to see what the unbounded fit looked like.
John D'Errico
John D'Errico le 2 Mar 2024
Modifié(e) : John D'Errico le 2 Mar 2024
The use of randn to generate starting values is a TERRIBLE idea. Sorry, but that is an obscenely bad thing. It will generate random numbers, which will be negative half the time. And if they are negative, they won't even have the right SIGN. And that is a truly bad idea when fitting exponential models. NEVER DO THAT.
Your poor choice of starting values also suggest why the fit you got was relatively poor.

Connectez-vous pour commenter.

Plus de réponses (1)

John D'Errico
John D'Errico le 2 Mar 2024
Modifié(e) : John D'Errico le 2 Mar 2024
Star pointed much out to you, but I want to say a few extra things, and the fit he got was relative crap due to the terrible choice of starting values, so I'll post this as as answer.
First, that a parameter in a nonlinear model is statistically different from zero is often MEANINGLESS. It matters when you are fitting polynomial models, where if a coefficient is zero, then that term could potentially be dropped from the model. But that is a meaningless statement here.
Next, lets look at your problem and model:
x2 = [7.0, 14.0, 28.0, 42.0, 70.0, 90.0, 112.0, 180.0, 224.0, 327.0].';
y2 = [59012.590, 38612.883, 18706.516, 11165.926, 5639.9341, 3804.1899, 2776.5547, 1011.2273, 919.47906, 829.12860 ].';
ft = fittype( 's*(a*exp(-x/b)+(1-a)*exp(-x/d))')
ft =
General model: ft(a,b,d,s,x) = s*(a*exp(-x/b)+(1-a)*exp(-x/d))
I pointed out in my comment to @Star Strider that you should never use randn to generate starting values, ESPECIALLY when applied to nonlinear exponential models. You never want to choose exponential rates with the wrong sign, and randn will do that randomly.
As Star points out, the parameters are listed in the order fittype shows them. So, here, they are {a,b,d,s}. I'll need to guess which order those bounds should lie in, and what intelligently chosen starting values should be.
a we know should lie in the interval [0,1]. It is a coefficient that trades the two exponential terms off between each other. A start value of 0.5 is probably reasonable.
b is an exponential rate parameter. It should ALWAYS be positive here, but because you are dividing x by b, we will expect b to be on the order of 10 to maybe 100.
d is also an exponential rate parameter. It should ALWAYS be positive here, but because you are dividing x by d, we will expect d to be on the order of 10 to maybe 100. It is VERY important to note that the starting value for b and d should NOT be the same. That will often blow many fitting tools out of the water! Never use the same rate constant for two exponential terms.
s is a multiplicative constant, scaling the entire problem. Since your y values are on the order of 10000 to 60000, a start point of 60000 seems reasonable.
opts = fitoptions( 'Method', 'NonlinearLeastSquares', 'StartPoint', [ 0.5, 100, 1000, 60000], 'Upper', [1, 2000, 2000, inf], 'Lower', [0, 1, 1, 10000]);
[fitresult2, gof2] = fit(x2,cast(y2, "double"),ft,opts)
fitresult2 =
General model: fitresult2(x) = s*(a*exp(-x/b)+(1-a)*exp(-x/d)) Coefficients (with 95% confidence bounds): a = 0.8509 (0.7982, 0.9036) b = 13.17 (11.66, 14.67) d = 71.5 (49.87, 93.13) s = 9.286e+04 (8.916e+04, 9.655e+04)
gof2 = struct with fields:
sse: 7.0338e+05 rsquare: 0.9998 dfe: 6 adjrsquare: 0.9997 rmse: 342.3891
plot(fitresult2,x2,y2)
And that fit looks eminently reasonable! The RMSE is pretty small, and the R^2 coefficient is also near 1, not that I ever give a tinker's damn about numbers like that. If the fit looks good, then it is good. Your eye is a far better measure of goodness of fit than some scalar number. Trust your eyes.
As you can see, what matters a lot is the start point, but more importantly that you have the bounds and starting values in the CORRECT ORDER for the parameters. Don't just guess which order they might be in. You can read it off directly from the fittype return.
Were I to look further into this the huge variation in y suggests that additive errors may not be the best choice. I might consider a weighted model perhaps, or even a log transformation to fit in terms of a proportional error structure. But that fit seems reasonable.

Catégories

En savoir plus sur Linear and Nonlinear Regression dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by