Problems to find the right function for curve fitting

I am trying to find a function that fits good with curve obtained from a dynamic function. The dynamic model is a trajectory model with drag force and wind included. From this model I have exported 300 rows that contains Outlet velocity, launch angle, wind speed and landing spot(x) in each column. This is in the eqnarray. This is plotted for launch angle vs. landing spot(x), with a new curve for each wind setup. Outlet velocity is constant.
Then I try to run an minimization with fmincon in order to find a curve that fits well with the imported data. Im using a function like this:
length1(:,1)=B*T1.^4 + C*T1.^3 + D*T1.^2 + E*T1 + F*W1.^3 + G*W1.^2.5 + H*W1.^2 + I*W1 + J;
Where T1 is lanch angle in degree, theta. W1 is wind speed. -10,0,10. A..J are constant that is to be determined.
I discovered that the fmincon function was vulnerable for how i set the initial point for the function. Due to that I put it in a for loop, that puts randomly numbers as inital guesses. And extracts the best one.
However, this does not fit very well with my data that is to be fitted. I guess that it is the functiion that I have that not has the right format. But I dont have any more clues. Does anyone have an idea?
I am attaching the files if any one would like to have them, as well as an image that shows the fit I obtained with the function I have now.
Red is the data to be fitted, and blue is the result of the fmincon sovler.

Réponses (2)

John D'Errico
John D'Errico le 28 Fév 2015
Modifié(e) : John D'Errico le 28 Fév 2015
There is simply no need to use fmincon to fit this. That is equivalent to the use of a Mack truck to take a pea to Boston.
You could simply use my polyfitn, from the file exchange to fit that model. Or as easily, one use of backslash would suffice. There is no need to involve a routine that requires starting values, although fmincon SHOULD yield a stable result, so my initial guess is you did not code that call to fmincon properly.
So if T1 and W1 are column vectors, then all you need do is
coef = [T1.^4,T1.^3,T1.^3,T1,W1.^3,W1.^2.5,W1.^2,W1,ones(size(W1))]\length1;
It is entirely possible that there will be some numerical problems here. In fact, looking at the wind speeds in W1, I can predict that as a fact, and in fact, I can guess this is one reason why fmincon had problems.
Your file shows the 3rd column of eqnarray to be entirely one of the numbers -10, 0 or +10.
What is -10 raised to the 2.5 power? No matter how you try to do that, the number is complex. It will have an imaginary part.
(-10)^2.5
ans =
0 + 316.23i
So I would suggest you have a serious problem with that proposed model.
You MIGHT decide to modify it, so that instead of raising W1 to the 2.5 power, you raise the absolute value of W1 to that power, then multiply by the sign of W1.
So let us try that...
coef = [T1.^4,T1.^3,T1.^3,T1,W1.^3,sign(W1).*abs(W1).^2.5,W1.^2,W1,ones(size(W1))]\length1
Warning: Rank deficient, rank = 6, tol = 1.104783e-05.
coef =
2.2787e-06
-0.00046873
0
2.2945
0.020476
0
-0.024339
0
9.5561
The warning message tells us the main reason why fmincon was susceptible to starting values. Your problem is rank deficient. Even if we use backslash, which needs no starting values at all, it turns out there is NO unique solution. One reason for that singularity can be seen in the values that W1 takes on. W1 is only ever one of three numbers in your data, -10, 0, 10. But you are trying to estimate the coefficient of 4 distinct parameters that include W1. Sorry, you have insufficient data to try that.
Again, you need to rethink your model.
So next, lets actually look more closely at your model. Is it even close to being reasonable? Sorry, but not in a million years. Lets do one simple plot.
plot(T1,length1,'o')
I suppose I could have been nice and color coded those three curves you see as a function of w1. I really don't need to do so.
Those three curves show that the fundamental shape of that curve, including the location of where it has its maximum, changes according to the value of W1. This tells me that the model you have posed, where W1 and T1 are involved as completely separate things, cannot possibly be a good model for your curve.
Again, you need to change your model. I'm not sure who suggested the one you did choose, but it was a poor choice of model.
So, the next step is to look at your problem more closely yet again. The current approach I'll use is one I learned many years ago in modeling. Lets look at the curve you would get for EACH wind speed independently, then see if we can come up with a model after inspecting the coefficients we would get from each separate model.
When I do that, segmenting the data on Wind speed, I see these models arise:
W1 == 10
29.235 + 2.0699*T1 + -0.071679*[T1 - 24.576].^1.9037
W1 == 0
280.75 + 6.7572*x + -8.1935e-10*[x - -312.63].^4.6175
W1 = -10
1.7422 + 2.2603*x + -8.2398e-12*[x - -126.51].^5.7191
The idea is to see if we could have modeled those coefficients as a function of wind speed. In fact, while each of these models fit very nicely, the coefficients seem to have nothing in common across wind speeds.
So perhaps I need to make my model simpler, not more complex. That first set of models I tried may have been overfitting the curves.
W1 == 10
-8.5525 + 5.7726*T1 + -0.21358*T1.^1.7238
W1 == 0
10.663 + 2.5083*T1 + -0.0067148*T1.^2.3111
W1 == -10
-5.0297 + 1.6707*T1 + -0.00020652*T1.^2.9591
In fact, these models seem pretty reasonable. The next step could be to model the parameters in these models as a function of wind speed. But in fact, much simpler is not to do so. I'd suggest that you could get a reasonable prediction for any wind speed by simply evaluating each of those models, then interpolating those predictions as a function of wind speed.
mdl_10 = @(T1) -8.5525 + 5.7726*T1 + -0.21358*T1.^1.7238;
mdl_0 = @(T1) 10.663 + 2.5083*T1 + -0.0067148*T1.^2.3111
mdl_m10 = @(T1) -5.0297 + 1.6707*T1 + -0.00020652*T1.^2.9591
mdls = @(T1) [mdl_m10(T1), mdl_0(T1), mdl_10(T1)];
predfun = @(T1,W1) interp1([-10 0 10],mdls(T1),W1,'spline')
So now, to predict at any particular wind speed (-7.5) and elevation angle (37 degrees), you need do no more than this:
predfun(37,-7.5)
ans =
55.133
The use of the 'spline' option in interp1 here was surely massive overkill with only effectively 3 data points in wind speed. Regardless, lets look at the resulting surface.
[W,T] = meshgrid(-10:10,20:70);
L = zeros(size(W));
for i = 1:numel(W)
L(i) = predfun(T(i),W(i));
end
surf(W,T,L)
The z axis here is length, as a function of wind speed and elevation angle.
That seems eminently reasonable, and we can surely do no better without better data to model.

6 commentaires

John D'Errico
John D'Errico le 28 Fév 2015
Modifié(e) : John D'Errico le 28 Fév 2015
I hope this helps as a rather extensive tutorial on how one might go about fitting such a relationship. There are other things one might do, but every problem is a different one, and one must find a scheme that works for that specific problem.
For example, I might also have written the solution as a direct call to interp2, and avoided much of the mess we saw. But I think the modeling I did was instructive too.
Thank you for the instructive tutorial, it worked out very well. When it comes to the model I presumed, I see now that it was not a good assumption. I should at least had a product of W1 and T1 in the equation.
However, how can I get the equation for the 3D plot as a function of W1 and T1?
Exactly. There must be at the very lest an interaction term in the model, if the shape of those curves changed for different W1. I might have tried adding in interaction terms, looking for such a term that would yield a viable model. But the fact is, that might have taken a while, playing around with terms to find some combination that would yield a reasonable fit.
In fact, the modeling I did do was an indication that it would NOT have been a simple interaction term though. Why? Look at the models I built, for each level of wind speed. I'll repeat them here.
W1 == 10
-8.5525 + 5.7726*T1 + -0.21358*T1.^1.7238
W1 == 0
10.663 + 2.5083*T1 + -0.0067148*T1.^2.3111
W1 == -10
-5.0297 + 1.6707*T1 + -0.00020652*T1.^2.9591
As you can see, the constant term varies with wind speed, but it varies at least in a quadratic manner. So we would have been forced to change the model to something like...
L(T,W) = a0 + a1*W + a2*w^2 + a3*T + a4*t^a5
And that was just to deal with the previously constant term in those models. But, then look at how the coefficient of T1 varied. It too was not a simple constant across all three levels, nor was it even linear in W. So we would need to write the coefficient of T as a function of W.
L(T,W) = a0 + a1*W + a2*w^2 + a3*T + a4*W*T + a5*W^2*T + a4*t^a5
Similarly, that last term has both a coefficient and an exponent that vary with W. And the coefficient in front does not vary in a way that looks at all like a simple quadratic polynomial in W. So I'd need to find some sort of exponential model there.
So in the end, I would have needed to build a quite long and complicated model. But the problem is, you don't have sufficient data to estimate the coefficients of that model at all well. You would have needed data with more levels in W to do that at all well.
Finally, you ask for the equation of the 3-d plot. Sadly, as it was formulated there, this is essentially a spline model in the W1 dimension. As a function of T1, you have those parts. I did give you a working solution that predicts well, and can be used. It simply cannot be written down easily.
If you were willing to accept a piecewise linear function in W1, we could write something out easily enough. (With some effort, we could also compute some simple spline coefficients.)
syms W1 T1
mdl_m10 = -8.5525 + 5.7726*T1 + -0.21358*T1.^1.7238;
mdl_0 = 10.663 + 2.5083*T1 + -0.0067148*T1.^2.3111;
mdl_10 = -5.0297 + 1.6707*T1 + -0.00020652*T1.^2.9591;
So, for W1 in the range of [-10,0], a model that is LINEAR in W1 is:
L_m10_0 = (1 + (W1 + 10)/10)*mdl_m10 + ((W1 + 10)/10)*mdl_0
L_m10_0 =
(W1/10 + 1)*((25083*T1)/10000 - (967704664891757*T1^(23111/10000))/144115188075855872 + 10663/1000) - (W1/10 + 2)*((10679*T1^(8619/5000))/50000 - (28863*T1)/5000 + 3421/400)
Likewise, for W1 in the interval [0,10], the simple linear interpolant model yields
L_0_10 = (10 - W1)/10*mdl_0 + (W1/10)*mdl_10
L_0_10 =
-(W1*((7619243172204993*T1^(29591/10000))/36893488147419103232 - (16707*T1)/10000 + 50297/10000))/10 - (W1/10 - 1)*((25083*T1)/10000 - (967704664891757*T1^(23111/10000))/144115188075855872 + 10663/1000)
To be honest, I've just leave it in the interp1 form. Those models I wrote out are too messy to bother with, and they yield nothing that you can look at and understand anyway. This is the nature of piecewise functions (and splines in general).
I see. The data I wanted to make unto a function was exported from a dynamic model that I have made. I only exported three set of data, because I wanted to make it simpler to make a function that fitted. However, this means that I can get much more data, if that is necessary to get a good result.
What I need is actually a function that describes theta, as a function of wind and trajectory length. To do this I thought it smart to first find a function that fitted the three trajectories with different wind. And then invert it, to be a equation that describes theta as a function of wind and trajectory length. I am going to use this to estimate the nozzle angle (theta), based on knowledge about where to hit and the wind conditions.
The surface plot i great, but I need a function that describes it. No matter how many terms I need.
John D'Errico
John D'Errico le 4 Mar 2015
Modifié(e) : John D'Errico le 4 Mar 2015
With many terms, you will never be able to invert it, especially if your goal is an analytical solution.
And even if you could invert that model, there are multiple solutions.
HOWEVER, if you have the modeled surface as I built it (and better yet if you have more wind speed data points to fill in the model), you can generate the function you so desire, although only as a piecewise linear curve. But that will be entirely adequate.
I'll look back in later today and explain how you do this from the model you have.
Yes, if I could have the function of the surfaced model. I would have what I needed.

Connectez-vous pour commenter.

My best guess is (I had a quick glance at your codes) that your fitting model
length1(:,1)=B*T1.^4 + C*T1.^3 + D*T1.^2 + E*T1 + F*W1.^3 + G*W1.^2.5 + H*W1.^2 + I*W1 + J;
is not satisfactory. I mean, you can give MATLAB the data and tell it that it has to fit a linear model like
y= a*x + b
(which is clearly not the case for your parabolic-like graph) and it will give you the most 'perfect' linear fit. That linear fit however is not nearly a nice fitting of the data, but the best MATLAB could do with what you gave it to work with. Check if your fitting model is OK.

1 commentaire

Yes, I understand that my assumption of model was not good. But I dont really know how to include something like the peak shift in the model. Therefor I tried to set it in the power of 2.5. It didn't contribute anything though..

Connectez-vous pour commenter.

Catégories

En savoir plus sur Get Started with Curve Fitting Toolbox dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by