How to compare data series?

Question

Jake le 10 Mai 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2117296-how-to-compare-data-series

Modifié(e) : Jake le 23 Juin 2024

Suppose, I'm performing five time-domain simulations with 5 different coefficient values (say beta = [0.2, 0.3, 0.4, 0.5, 0.6]). For each beta value, I obtain a complete set of results as a time series (say, for one beta value, results = [speed, angle, wind force, heat]). I can then plot each of those results against time. The goal is to identify the effect of beta on each simulated parameter.

I can plot each result for each beta against time to see the qualitative difference/comparison. But the problem is that the minute deviations cannot be seen clearly in these type of plots.

I read about Dynamic Time Warping (DTW) but it is a bit difficult to wrap my head around. Is there any other (rather simpler) method that one can sue to analyse these type of time series data?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

William Rose le 10 Mai 2024

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2117296-how-to-compare-data-series#answer_1455661

@Jake,

It is hard to say without having the actual data.

If you restrict your analysis to one variable at a time, then you may want to plot "deviation from the mean" as a function of time, for the different value of beta, where " the mean" is the mean at each instant, for all vaues of beta examined. This could reveal features that might be hard to see otherwise.

If you want to analyze effects of beta on all four variables simultaneously: you may think of your system as evolving in a four-dimensional space, over rtime (time would be a 5th dimension). Since the four variables have different units, you may want to remove the mean from each, and normalize each variable by its standard deviation, in order to have a dimensionless "z-score" for each variable, as a function of time.

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

William Rose le 10 Mai 2024

Ouvrir dans MATLAB Online

sampleData.mat

@Jake,

load('sampleData');

p=[p1;p2;p3;p4;p5];

% pzm=(p with zero mean)

% I.e. the mean vaue of array pzm, at each instant, is zero

pzm=p-mean(p);

% plot results

figure

subplot(211)

plot(time,p1,'-r',time,p2,'-g',time,p3,'-b',time,p4,'-c',time,p5,'-m')

title('Raw p(t)');

legend('\beta=1','2','3','4','5')

subplot(212)

plot(time,pzm(1,:),'-r',time,pzm(2,:),'-g',time,pzm(3,:),'-b',...

time,pzm(4,:),'-c',time,pzm(5,:),'-m')

legend('\beta=1','2','3','4','5')

xlabel('Time'); title('p_{zm}(t)')

Now a few things are evident, which were not obvious to me in the initial post: 1. the frequency of oscillation starts out the same, but then seems to be inversely related to the value of beta. In other words, the oscillations are fastest for beta=1 and slowest for beta=5, with in between being in between. 2. The moving-average mean value is inversely related to beta. You can do more analyses and plotting to demonstrate points 1 and 2.

William Rose le 10 Mai 2024

Ouvrir dans MATLAB Online

sampleData.mat

@Jake,

load('sampleData');

beta=[.2,.3,.4,.5,.6];

p=[p1;p2;p3;p4;p5];

pzm=p-mean(p);

fs=(length(time)-1)/(time(end)-time(1)); % sampling rate

% Find peaks in each trace:

% For p1, p2: Find peak heights and locations, to make an illustrative plot.

% For p3, p4, p5: Find locs only, since only need locs to compute instataneous freq.

[pks1,locs1]=findpeaks(p1,fs);

[pks2,locs2]=findpeaks(p2,fs);

[~,locs3]=findpeaks(p3,fs);

[~,locs4]=findpeaks(p4,fs);

[~,locs5]=findpeaks(p5,fs);

% compute instantaneopus frequency

instFreq={1./diff(locs1); 1./diff(locs2); 1./diff(locs3); 1./diff(locs4); 1./diff(locs5)};

% Next: time associated with each estimate of instFreq

tInstFreq={locs1(2:end);locs2(2:end);locs3(2:end);locs4(2:end);locs5(2:end)};

% plot results

figure

subplot(311)

plot(time,p1,'-r',time,p2,'-g',time,p3,'-b',time,p4,'-c',time,p5,'-m')

title('Raw p(t)');

for i=1:5, legstr{i}=sprintf('b=%.1f',beta(i)); end

legend(legstr)

subplot(312)

plot(time,p1,'-r',locs1,pks1,'r*',time,p2,'-b',locs2,pks2,'bx')

legend('p1','p1 peaks','p2','p2 peaks')

%plot(time,pzm(1,:),'-r',time,pzm(2,:),'-g',time,pzm(3,:),'-b',time,pzm(4,:),'-c',time,pzm(5,:),'-m')

%legend('\beta=1','2','3','4','5')

title('p(t) with peaks')

subplot(313)

plot(tInstFreq{1},instFreq{1},'-r.',tInstFreq{2},instFreq{2},'-g.',tInstFreq{3},instFreq{3},'-b.',...

tInstFreq{4},instFreq{4},'-c.',tInstFreq{5},instFreq{5},'-m.')

legend(legstr)

title('Instantaneous Frequency'); xlabel('Time');

The middle plot above shows that findpeaks() is working as we hope it will. I have defined instantaneous frequency as the reciprocal of the time between successive peaks. The plot of instantaneous frequency versus time confirms what I said in my earlier post: instFreq is initially the same for all values of beta, but then instFreq diverges, with instFreq being higher when beta is smaller. The plot also shows that instFreq oscillates slowly.

With appropriate smoothing, you will be able to show how the mean value of p(t) (mean over approximately one cycle) is different for different vaues of beta.

William Rose le 10 Mai 2024

Ouvrir dans MATLAB Online

sampleData.mat

@Jake,

Here are plots which show more about how p(t) is affected by the value of beta.

load('sampleData');

beta=[.2,.3,.4,.5,.6];

% Compute smoothed versions of p

ps=[smooth(p1,220),smooth(p2,220),smooth(p3,220),smooth(p4,220),smooth(p5,220)];

% Compute pzm=p_zeromean and smoothed version of pzm

p=[p1;p2;p3;p4;p5];

pzm=p-mean(p);

pzms=[smooth(pzm(1,:),220),smooth(pzm(2,:),220),smooth(pzm(3,:),220),...

smooth(pzm(4,:),220),smooth(pzm(5,:),220)];

% plot results

figure

subplot(211)

plot(time,p1,'-r',time,p2,'-g',time,p3,'-b',time,p4,'-c',time,p5,'-m')

for i=1:5, legstr{i}=sprintf('b=%.1f',beta(i)); end

legend(legstr,Location='southwest'); title('Raw p(t)')

subplot(212)

plot(time,ps(:,1),'-r',time,ps(:,2),'-g',time,ps(:,3),'-b',time,ps(:,4),'-c',time,ps(:,5),'-m')

legend(legstr,Location='southwest'); title('Smoothed p(t)'); xlabel('Time')

figure

subplot(211)

plot(time,pzm(1,:),'-r',time,pzm(2,:),'-g',time,pzm(3,:),'-b',time,pzm(4,:),'-c',time,pzm(5,:),'-m')

legend(legstr,Location='southwest'); title('p_{zm}(t)');

subplot(212)

plot(time,pzms(:,1),'-r',time,pzms(:,2),'-g',time,pzms(:,3),'-b',time,pzms(:,4),'-c',time,pzms(:,5),'-m')

legend(legstr,Location='southwest'); title('Smoothed p_{zm}(t)'); xlabel('Time');

The code above uses smooth() with a width of 220 points. I chose this width because it is about 2 cycles long, so it does a moving average of approximately two cycles of data. The third plot in the previous post showed that the mean frequency (mean across all times and across all five values of beta) is in the ballpark of 0.09, which means the duration of one cycle is about 11, and two cycles is 22. Sampling rate is 10, so that is 220 points per two cycles. Which is just an approximate value. You could try to get fancier, for example, by taking the mean value between successive peaks on each separate trace.

The top figure shows that the smoothed p(t) traces are together initially, then diverge, with smoothed p(t) being higher when beta is greater. The bottom figure shows the samed thing, but the differences are more obvious than in the upper figure, because the bottom figure shows the zero-mean version of p. In both figures, the smoothing is not perfect, because the width of the smoothing window does not exactly equal the oscillation period, which varies over time and from trace to trace. The smoothed signals are less smooth as time approaches 200, because the width of the smoothing window decreases at the edge.

William Rose le 10 Mai 2024

@Jake,

You wrote: "this is very nice, and I can understand the differences of the approach. One question though, in the middle plot (p(t) with peaks vs Time), you have chosen p1 and p2 and not p1,p2,p3,... (all). Was there a specific reason for this, or did you simply chose 2 to convey that findpeaks() work in this context?"

Yes I only showed p1, p2 to show that findpeaks() is working in a reasonable way. If the data were not so smooth, then findpeaks would probably find spurious peaks.

and you wrote "I'm not sure if I understood what you meant by the last sentence ("With appropriate smoothing, you will be able to show how the mean value of p(t) (mean over approximately one cycle) is different for different vaues of beta.") though."

See my recent comment, which demonstrates smoothing by finidng the moving average. I ended up using a moving average width of approximately two cycles, rather than one cycle, which I had originally suggested.

Connectez-vous pour commenter.

How to compare data series?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

How to compare data series?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

6 commentaires Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens