Numerical Integration on Discrete Data Sets

87 vues (au cours des 30 derniers jours)
Andy Hall
Andy Hall le 9 Mar 2019
Commenté : John D'Errico le 10 Mar 2019
Hi All,
Im currently trying to run the following code:
close all; % close all programs
clc; % clears command window
clear all; % Clears memory
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Parameters
%Function and Interval
A= csvread('finaldata.csv');
x= A(:,2)
value= @sin
a= 0; % Lower limit of function
b= pi; % Upper limit of function
% Number of Points
N= 1250; % Number of points can be adjusted later
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Discrete Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
dx = (b-a)/N;
x = [0.0010:N - 0.0010]*dx; % Centre positions of segments under the curve to be integrated
% Calculate Function
y = value(x) % Calculates function at midpoints
% Numerical Integration
I_Discrete = sum(y)*dx; % Numerical integration by summing and multiplying by dx
subplot(1,3,1)
plot(x,y)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Trapezoidal Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
x = linspace(a,b,N); % To place N points on a and b exactly
dx = x(2) -x(1); % Extracts spacing between x values
% Calculate Function
y = value(x); % Caluclates function at midpoints
% Numerical Integration
W = [0.5 ones(1,N-2) 0.5];
I_Trap = sum(W.*y)*dx;
subplot(1,3,2)
plot(x,y)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Simpsons 1/3rd Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
x = linspace(a,b,N); % To place N points on a and b exactly
dx = x(2) -x(1); % Extracts spacing between x values
% Calculate Function
y = value(x); % Caluclates function at midpoints
% Numerical Integration
I = 0;
for n = 1 : 2 : N-2
a = y(n) + 4*y(n+1) + y(n+2);
I = I + a;
end
I_simp = I*(dx/3);
subplot(1,3,3)
plot(x,y);
Im trying to feed a data set (it forms a sinewave) to integrate to remove the phase shift on the signal. The program works fine with the sin function, but cannot seem to get it working well with my data set. the data set has two collumns which when plotted together shows the sinewave over a number of periods.
Is it a case that i cannot apply the data in this way? Any help is appreicated.
Andy

Réponse acceptée

John D'Errico
John D'Errico le 9 Mar 2019
Modifié(e) : John D'Errico le 9 Mar 2019
I might first guess that your data is not uniformly spaced. Simpson's rule does not apply to non-uniformly spaced data (thus non-uniform in x.) Even trapezoidal rule needs to be constructed correctly for it to work. But, since we do not see your data, this is only a guess.
Most simply, you should just use trapz. It is designed to take a vector x and y, then compute the integral. Writing code to do what is already available to solve your problem is usually a bad idea, unless you know enough to do a better job. Here you don't. Sorry, but that is just a statement of fact. So use trapz.
x = [0,sort(rand(1,100)),1]*pi;
y = sin(x);
trapz(x,y)
ans =
1.99847987386531
We know the integral of sin(x) over [0,pi] is 2, so that is quite reasonable. Trapz has no problem with the unevenly spaced data, yet if I did what you did, it would fail miserably, as would Simpson's rule.
If you have the curve fitting toolbox, then a far better solution would come from an integral of a spline.
spl = spline(x,y);
diff(fnval(fnint(spl),[0,pi]))
ans =
1.99999912870377
But even lacking that toolbox, you can still use integral.
spl = spline(x,y);
integral(@(x) ppval(spl,x),0,pi)
ans =
1.99999912487824
Again though, unless you have been forbidden from using better tools as a homework assignment, writing code to replace existing code is a bad idea.
Could I have shown you how to fix the problem with trapezoidal rule? Well, yes. You constructed the vector of weights improperly, assuming unequally spaced data. A non-uniform trapezoidal rule would use the entire vector of differences in x to compute the weights on each data point. Could you fix Simpson's rule to work for unequally spaced data? That is not as easily doable, and the order of the rule would potentially suffer unless you were very careful. So for a more accurate integration of data, the best idea is to use the spline integral I did show you how to use.
Finally, is there a best way to solve the problem in the case where your data is noisy, thus with significant noise in y? In this case, it depends on how much noise there is in y then. For example...
y = sin(x) + randn(size(x))/5;
trapz(x,y)
ans =
1.98676569356795
spl = spline(x,y);
diff(fnval(fnint(spl),[0,pi]))
ans =
1.36442608167266
What you should see here is the spline integral did quite poorly on noisy data, whereas trapezoidal rule was still pretty good. An integral of a pchip interpolant is nearly as good as trapezoidal rule.
  2 commentaires
Andy Hall
Andy Hall le 10 Mar 2019
Thanks for your feedback.
I have attached the CSV data file, yes its not uniformly spaced which was my inital concern with trying to use it in this application as raw data.
This is the orignal code I was using is below (the screen shot attached (testresults) shows that this yields the same answers as you have above). The preivous was attemtping to feed the data into the program based as advised, although i wasnt confident it would work that way as i didnt know how to fix the program for non uniform data.
%% Ready MATLAB
close all; % close all programs
clc; % clears command window
clear all; % Clears memory
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Parameters
%Function and Interval
value= @sin
a= 0; % Lower limit of function
b= pi; % Upper limit of function
% Number of Points
N= 50; % Number of points can be adjusted later
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Discrete Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
dx = (b-a)/N;
x = [0.5:N - 0.5]*dx; % Centre positions of segments under the curve to be integrated
% Calculate Function
y = value(x); % Calculates function at midpoints
% Numerical Integration
I_Discrete = sum(y)*dx % Numerical integration by summing and multiplying by dx
subplot(1,3,1)
plot(x,y)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Trapezoidal Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
x = linspace(a,b,N); % To place N points on a and b exactly
dx = x(2) -x(1); % Extracts spacing between x values
% Calculate Function
y = value(x); % Caluclates function at midpoints
% Numerical Integration
W = [0.5 ones(1,N-2) 0.5];
I_Trap = sum(W.*y)*dx
subplot(1,3,2)
plot(x,y)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% Simpsons 1/3rd Integration
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Positions of the points
x = linspace(a,b,N); % To place N points on a and b exactly
dx = x(2) -x(1); % Extracts spacing between x values
% Calculate Function
y = value(x); % Caluclates function at midpoints
% Numerical Integration
I = 0;
for n = 1 : 2 : N-2
a = y(n) + 4*y(n+1) + y(n+2);
I = I + a;
end
I_simp = I*(dx/3)
subplot(1,3,3)
plot(x,y)
The cumtrapz screen shot attached shows the integrated sinewave against the original sinewave, yet it hasnt yielded the results i wanted (assuming this is down to non uniform data).
Im trying to analysis the three methods of numerical integration to see what the most accurate method is (expected 1/3rd rule), but I want to see a reconstructed sinewave so I can compare against other anlogue integrator circuits.
I was wondering whether using interpolation to formulate a function which can then be fed back into the program would work instead of using the raw data.
Sorry if i didnt explain myself that well, I posted after a long frustrating few days.
John D'Errico
John D'Errico le 10 Mar 2019
Read what I said, again. You cannot use Simpson's rule for unequally spaced data. Then read again what I said. Did I not show you how to interpolate your data with a spline, and then integrate the spline?
However, looking at your data, it is equally spaced!!!!!
xy = csvread('finaldata.csv');
x = xy(:,1);
y = xy(:,2);
[min(diff(x)),max(diff(x))]
ans =
3.99999999999984e-05 4.00000000000053e-05
std(diff(x))
ans =
2.04966491279464e-18
So as far as I care, or any numerical method would care about, x is equally spaced well enough, certainly compared to any interpolation error you would see. So where is your problem? Why do you think this did not perform well in terms of an integral?
Perhaps the answer lies in what bastardization you did to the data. If I dump out your y values to the command window, I see this:
format long g
>> y(1:10)
ans =
0.52
0.56
0.52
0.56
0.56
0.56
0.56
0.56
0.6
0.56
Is there ANY reason in the universe why you would truncate those values to 2 significant digits, and now you expect any degree of accuracy for the integral? Seriously? Using a higher order integration will not produce anything meaningful. You already threw out any useful information there when you captured only 1 or 2 significant digits for the y variable. Sorry, but that is simply fact. ARGH.
Ugh. There is a moderate amount of noise in y. But you made the noise worse, when you decided to round all numbers to 2 significant digits. Essentially you added a significant amount of noise into the process. As such, you would need to now try a smoothing algorithm to see what extent we can recover the underlying relationship.
First, I'll look to see what the noise standard deviation is. I've posted a tool called estimatenoise on the File Exchange. It canestimate the noise variance in such a series.
noisestd = sqrt(estimatenoise(y))
noisestd =
0.0258527084366784
So we can imagine noise on y as having two sources: truncation error to 2 significant digits, and then some additional noise already in the system. A simple way to understand the magnitude of truncation error is to consider this simple example:
u = rand(1,1000);ud = round(u,2);
std(u - ud)
ans =
0.00288400459876424
So, while truncation error is clearly a significant part of the noise in y, it is not all of the noise that can be seen.
Can we recover the shape of at least an approximation to the curve? Well, yes. You might use a low pass filter to try to filter out the junk, perhaps a Savitsky-Golay filter of some order. Or you might consider a smoothing spline of some sort, possibly spaps, or a regression spline. Here is the result of a regression spline, as posted on the file exchange in my SLM toolbox.
slm = slmengine(x,y,'knots',20,'result','pp','plot','on');
untitled.jpg
You can see the discrete nature of the data in the plot, and the smoothed approximation as found there.
trapz(x,y)
ans =
0.0049552
slmpar(slm,'integral')
ans =
0.00495420521660388
Is the integral of the spline that signifcantly better an approximation than trapezoidal rule? I'm not sure that it is. In the nature of trapezoidal rule is that it is the method of choice for noisy data. Perhaps this is surprising. But that is true. In fact, it is easy to show that in the presence of significant noise, higher order rules like Simpson's rule can actually have a higher variance in the integral.
For example, consider an interpolating spline integral.
spl = spline(x,y);
diff(fnval(fnint(spl),[min(x), max(x)]))
ans =
0.00495625988650244
But if we look at the interpolating spline, we see this:
fnplt(spl)
untitled.jpg
The interpolating spline passes through each noisy point exactly. Effectively, Simpson's rule does the same. If we then integrate that mess, we expect the result to be actually a bit worse than a simple trapezoidal rule integration.
This happens because we are integrating what is essentially the sum of a nice smooth curve (as found for example in my regression spline fit above) and the interpolating spline. We can see that differecne here:
fplot(@(x) fnval(slm,x) - fnval(spl,x),[min(x), max(x)])
untitled.jpg
Essentially, this tells us that again, we don't want to use higher order integrations on noisy data. We should either use the lowest order rule possible, or we might smooth the data FIRST, and only then perform an integration. Even then, you are severely limited in terms of how good of a job you can do, due to the noise.
Noise degrades the information content in any signal. In fact, once information has been tossed into the bit bucket, there are limits to what you can recover.

Connectez-vous pour commenter.

Plus de réponses (0)

Tags

Produits


Version

R2014a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by