What is the best way to smooth and compute the derivatives of noisy data?

296 vues (au cours des 30 derniers jours)
Hello all,
I have a set of experimental data (temperature vs time), and have no idea of the function which satisfies the data behaviour. As a part of analysis, I need to estimate the first derivative using central differences and obtain the maximum gradient.
Because my data is too noisy, I need to filter it before taking the derivative. To keep the precision of data and minimize any distortion, I tried to remove the outliers from my data using a Savitzky–Golay filter. It shows great results, but my data is not quite smoothed as it can be seen in a picture of Savitzky–Golay filter. To compute the exact inflection point further curve fitting is required, as it can be seen in the first derivative picture.
Any suggestions or better approaches , would be highly appreciated as I’m new to Matlab and signal processing. I heard about the cubic spline for better smoothness, but I have no idea about it and not sure if it can solve my problem.
Many thanks in advance
  2 commentaires
Image Analyst
Image Analyst le 17 Mar 2019
Your derivative looks like something you'd see with an S-curve - flat at the beginning, steep in the middle, then becoming flatter again - not like what your actual data looks like (something like a quadratic). Please attach your actual data with the paper clip icon, and say what you want to do with the derivative once you have it so we know why it needs to be smoother than it is now.
Abdulaziz Gheit
Abdulaziz Gheit le 17 Mar 2019
Sorry about any misunderstanding.
It's just part of my data to present it more clear. here is an excel file contains the original and the smoothed data.

Connectez-vous pour commenter.

Réponse acceptée

John D'Errico
John D'Errico le 17 Mar 2019
Modifié(e) : John D'Errico le 17 Mar 2019
The "best" way? That will be completely dependent on your data, on what you know about your data, and on what you are willing to assume about the process that generated the data.
Looking at your data, since I don't have your data, nor do I know anything about the data or the process, I can only guess. Thus, can I assume that what looks like noise really is noise? I had one group of clients who were not willing to make that assumption. Any smoothing at all was a bad thing, because there was potentially information in the high frequencty stuff that they needed to know about. Luckily for them, the noise was not overwhelming.
In general though, numerical differentiation is a noise amplifying process. So you need to kill off the noise to whatever extent possible. You can do so by use of a spline fit (IF you use the right spline model. An interpolating spline would be generally a bad idea for you), or you could use a regression model of some sort, IF you have a viable model for this process.
Again, I don't have your data. So, at best, I can only make some data up for an example. So, let me see, this should look at least a little like what you have shown:
x = linspace(0.5,2.5,100);
y = 19 + exp((x - 0.5)/2)/10 + randn(size(x))/100;
plot(x,y,'.')
untitled.jpg
The picture you show has a fair amount of data, yet it is not that complex of a relationship. As such, a low order polynomial model might be sufficient. Then you could just differentiate the polynomial. For example...
P3 = polyfit(x,y,3)
plot(x,y,'b.',x,polyval(P3,x),'r-')
untitled.jpg
P3d = polyder(P3)
plot(x,polyval(P3d,x),'-')
untitled.jpg
A problem with such an approach is it does not build information that you may have about the underlying functional relationship. That is, do you know that temperature will ALWAYS increase with time? If so, then the first derivative must always be positive, yet a polynomial fit gives you no such assurance. In fact, you might notice that the true underlying function behind my data has an everywhere increasing derivative, yet a cubic polynomial model had a negative second derivative. Worse, what if you use too high an order for a polynomial? That is easy to do.
P7 = polyfit(x,y,7);
plot(x,y,'b.',x,polyval(P7,x),'r-')
untitled.jpg
The same fact applies to a Savitsky-Golay filter. A Savitsky-Golay derivative estimation on noisy data will need to have a long window width, and low order implicit polynomial model. However, remember that Savitsky-Golay does not produce a differentiable or continuous function/derivative estimate. So, using my own movingslope code, that builds polynomial model on a moving window, then returns the derivative, we see:
plot(x,movingslope(y,20,1),'.')
untitled.jpg
Anyway, if we look at the bottom end of the curve, the 7th degree polynomial started to get a little squirrely. So, I might try to use a regression spline, constrained to have the expected shape. Lets see,
slm = slmengine(x,y,'plot','on','increasing','on','concaveup','on','jerk','positive');
So here, a cubic spline, constrained to have first, second, and third derivatives entirely positive over the domain.
untitled.jpg
Better yet, of course, is to use a nonlinear model, IF you know a good choice of model in advance.
But really, what matters is as I said in the beginning. What do you know? Because once you can provide information to the modeling process, you will GREATLY improve the derivative estimation process. Tools like Savitsky-Golay assume almost nothing about the system. A polynomial model assumes relatively little, but somewhat more.
  5 commentaires
John D'Errico
John D'Errico le 17 Mar 2019
Modifié(e) : John D'Errico le 17 Mar 2019
As always, all data has problems. ;-) Sometimes I just hate the real world. Seriously, this is not that bad, as data sets go.
It looks like your data always has an initial transient. So, if we plot one of your curves, just the first 200 points, we see both the transient, as well as a rough picture of the noise.
plot(x(1:200),y(1:200,1),'.')
xlabel 'Time'
ylabel 'Temp'
The problem is that as I said, differentiation is a noise amplification process. First, I'll compare the results of a simple call to gradient, to a Savitsky-Golay style of filter, varying the window length, and the local polynomial order. I've just used my own movingslope utility (which can be found on the file exchange.)
plot(gradient(y(:,1),0.0125),'.')
hold on
plot(movingslope(y(:,1),10,1,0.0125),'r.')
plot(movingslope(y(:,1),100,3,0.0125),'g.')
legend('Gradient','SG: 10 points, linear','SG: 100 points, cubic')
As you can see, the tiny noise in your data was amplified heavily. Also, even a tool like Savitsky-Golay does not produce a smooth derivative estimate, because it makes no presumption of the underlying function being a smooth, differentiable curve. The same thing would apply to a running median filter.
Since all 6 curves had the same x, I packed them all into one array. I can now estimate the noise standard deviation for each curve, as:
sqrt(estimatenoise(y))
ans =
0.016051 0.015394 0.01592 0.016541 0.016655 0.017764
It seems pretty consistent, roughly 0.0165 or so on average. So a band of roughly +/-0.03. ESTIMATENOISE is another of my submissions that can be found on the File Exchange. (I'll return and add links to these codes at the end of this response.)
One simple solution, if you have the curvefitting toolbox (I think SPAPS is in there) is to use asimple smoothing spline. It makes no presumption of monotonicity.
spl = spaps(x,y(:,1),0.03);
fnplt(spl)
hold on
plot(x,y(:,1),'.')
(I've zoomed the curve to show only the first part of the data.)
As you can see, it misses that initial transient, but the remainder of the curve does quite well.
And, here, a plot of the derivative of that spline, again, using tools in the curvefitting toolbox.
fnplt(fnder(spl))
I don't dislike that result at all, and it was fairly simple to arrive at. I might decide to fit only the data after the initial transient, so just delete the data before a time of 0.25 or so, UNLESS that portion of the curve is important to you. If it is, then you need to chase after it.
Now, could I gain as good a result using other tools? Well, yes. I'd do this:
slm = slmengine(x,y(:,1),'plot','on','knots',[0 .25 .5 ,1:2:25],'increasing',[0.5 25],'rightslope',0);
The plot produced by SLMENGINE has the ability to plot the derivative curve. See that SLM did have the ability to handle the initial transient.
Could I have used a polynomial model on this? NO!!!! Absolutely not. You will never get a good result from a polynomial model of this data.
Ok, finally, could I have used a nonlinear regression fit? Well, unless you have asufficiently good nonlinear model that would fit EXACTLY that shape of curve, it is unlikely that you will be as happy as you would with one of the spline options I have discussed. A nonlinear function will almost always miss just a bit in one part or another of that curve. There will be significant lack of fit. And unless you put something in that model to specifically handle that initial transient, it will have problems there too.
So I would use either a smoothing spline, or my own SLM toolbox. Either will probably make you happy.
Links:
SLM:
movingslope:
estimatenoise:
Jim Riggs
Jim Riggs le 6 Août 2019
+1 Nice presentation and effort.

Connectez-vous pour commenter.

Plus de réponses (1)

Abdulaziz Gheit
Abdulaziz Gheit le 24 Juil 2019
Thanks John for this great tool. I figuered out how to run it but I do need to know how to extract the smoothed data.
Could you please explain to me the way to do that.
Thanks
  2 commentaires
John D'Errico
John D'Errico le 25 Juil 2019
"extract the smoothed data"? Just evaluate the spline at the x-values. The prediction will lie exactly on the spline, so the data is now smoothed.
How do you do this? Just a call to slmeval. Or, if you tell it to eturn the spline in a 'pp' form, then you can use fnval, or ppval. slmeval will still work, even if you create a pp form spline.
Abdulaziz Gheit
Abdulaziz Gheit le 6 Août 2019
That is awesome. Definitely, a great tool.
BIG THANKS JOHN!

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by