Effacer les filtres
Effacer les filtres

Detecting gaps in time series data and replacing with NaN

14 vues (au cours des 30 derniers jours)
Brianne Zimmerman
Brianne Zimmerman le 21 Juil 2011
Hi there! I recently asked a question about detecting gaps in time series data and producing the start and end times of the gaps. My newest task is detecting these gaps and replacing them with NaN.
The time series information is stored as a [nx1 double] array in a cell array called neptune. The corresponding data for each time point is also stored as a separate [nx1 double] array in the neptune cell array.
What I would like to be able to do is find gaps in the time data and replace these gaps with NaN. The time stamps are stored as 734556.750208611 for example (serial date numbers?) but they do not record gaps as anything, they simply skip to the next date for which data was recorded.
The code I have so far is:
sp=getSamplePeriod; %each instrument has a unique sampling period (in this example sp=0.25 seconds)
t=datenum(neptune.time);
t=t*86400;
idx=find(diff(t)>(2*sp))'+1 %to detect gaps greater than twice the sampling period
This gives me back (for my current example):
idx =
830 8949
This is correct so far and the gaps would be found between (idx-1) and (idx). From here however I am stuck. I don't know how to insert NaN into these gaps and also how to have NaN in the same locations in the neptune.data [nx1 double] array. If anyone could offer any suggestions it would be very much appreciated! Additionally, I'm not sure how easy it would be to insert NaN every 0.25 s interval in these gaps to make the data more accurately represented. If this is too complicated then I will be satisfied with simply inserting NaN into the single gap. Thanks!

Réponses (3)

Kelly Kearney
Kelly Kearney le 21 Juil 2011
I assume that your measurement times are not necessarily exact, so I've built in some tolerance for error in the example below. But I'm assuming the time measurement error is smaller than the time measurement interval.
% Expected time spacing and number of data points
sp = 0.25;
n = 20;
% Expected measurement times
dntoday = datenum(date);
texact = dntoday + (0:n-1).*sp;
% Some fake measured data
maxerr = (1/24)*.5; % +/- 30 minutes
tmeasured = texact + 2*maxerr*(rand(size(texact))) - maxerr;
ismissing = rand(size(tmeasured)) > .6;
tmeasured = tmeasured(~ismissing);
ymeasured = sin(tmeasured);
%---------------
% Important part [Edited 7/22/2011 with additional comments]
%---------------
% Construct new timeseries, with NaN for missing
% Round each measured time to nearest sp
tround = round(tmeasured/sp)*sp;
% loc tells me the index of texact that matches
% each element of tround
[tf, loc] = ismember(tround, texact);
% Create a vector of NaNs the same size as texact
% for both y and t
tfinal = nan(size(texact));
yfinal = tfinal;
% Fill in the measured values where they matched an exact value
tfinal(loc) = tmeasured;
yfinal(loc) = ymeasured;
  5 commentaires
Kelly Kearney
Kelly Kearney le 22 Juil 2011
The error is probably occurring on the second to last line; if any of your tmeasured don't match a texact (i.e. if ~(all(tf))), then loc will contain some 0's, which can't work as indices.
I think you're mixing up units. Because you added in the
t = t*86400
line, your texact timeseries is in days, while your tmeasured timeseries is in seconds. I assume your spacing interval is in days, not seconds, so get rid of that line (or fix accordingly, if I assume incorrectly).
Also, you'll probably need a larger value of n when you construct texact. Remember that neptune.time has data gaps, so your final series will need to be longer than that. I was assuming you knew off the top of your head how many points to expect, but if you don't, this should get you close:
n = ceil((max(neptune.time) - startdate)/sp);
Brianne Zimmerman
Brianne Zimmerman le 22 Juil 2011
My units were definitely mixed up that was the problem. The interval I was using is 0.25 seconds so I needed to change everything into seconds. Thank you for all of your help and patience!

Connectez-vous pour commenter.


Fangjun Jiang
Fangjun Jiang le 22 Juil 2011
Your time stamp is always incremental and the smallest gap is sp. So it might worth to take a different approach.
%%Raw Data
neptune = struct('time',{'20110222T180326.761'
'20110222T180327.011'
'20110222T180844.239'
'20110222T180844.444'
'20110222T180844.665'
'20110222T180944.665'});
Temp=datenum({neptune.time}, 'yyyymmddTHHMMSS.FFF');
sp=0.25/86400;
TimeIndex=round((Temp-Temp(1))/sp)+1;
CompleteIndex=1:max(TimeIndex);
NewNeptune=struct('time',repmat({nan},max(TimeIndex),1));
SearchIndex=ismember(CompleteIndex,TimeIndex);
[NewNeptune(SearchIndex).time]=neptune.time;
  9 commentaires
Brianne Zimmerman
Brianne Zimmerman le 22 Juil 2011
ans =
double
ans =
1 1
ans =
7.3456
7.3456
7.3456
7.3456 etc.
[NewNeptune(SearchIndex).time]=t says that there are too many output arguments and [NewNeptune(SearchIndex).time]=neptune.time says that there is insufficient output arguments..
Fangjun Jiang
Fangjun Jiang le 23 Juil 2011
You are confusing me. In one comment, you confirmed that your neptune is 1 by 4 structure. Now you said the size of neptune is 1 by 1. Tell me the data structure of neptune before you do any of the above processing, your original data structure.

Connectez-vous pour commenter.


Chris Miller
Chris Miller le 15 Sep 2011
By now you probably have a solution. I came across the same need when trying to visualize data, and have submitted my insertNaN.m function to the 'File Exchange' if you haven't built your own solution. File ID: #32897

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by