Filtering max values within every 3 hours of data

Question

mashtine le 11 Mai 2014

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/129188-filtering-max-values-within-every-3-hours-of-data

Modifié(e) : Cedric le 13 Mai 2014

Réponse acceptée : Cedric

upload.mat

Ouvrir dans MATLAB Online

Hey everyone,

I attached the first 50 rows of a sample dataset. My time data is in the format of '2000-01-01 01:00' where the last 5 characters hold the hour (HH) and minute (MM) values respectively. I would like to have a code that goes through the time column, and assigns the maximum value of the second column between the 01,02,03 hours to the 03 hour timestep. I need to then have this done for the 04 - 06 time period and so forth until 22 - 00.

At the end the cell array would not need to keep data from other columns and should just be XXXX x 2 cell array and just reassigns the maximums to the 03,06,09 ...... 00 timesteps. There may also be missing hours, so the script would need to accommodate for that.

example:

'1984-01-01 00:00'  10.80  240  20.06  231  0.8
'1984-01-01 01:00'  11.83  230  20.06  6  0.6
'1984-01-01 02:00'  11.31  230  18.51  100  0.6
'1984-01-01 03:00'  10.80  230  18.51  212  0.7
'1984-01-01 04:00'  9.77  240  18.65  348  0.8
'1984-01-01 05:00'  9.25  240  16.97  406  0.8
'1984-01-01 06:00'  9.77  230  17.48  524  0.7
'1984-01-01 07:00'  9.77  230  16.97  612  0.7

would then become:

'1984-01-01 03:00'  11.83  
'1984-01-01 06:00'  9.77

I am completely lost in writing this and any help would be appreciated!

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Cedric le 11 Mai 2014

Well, not all answers below will work with missing data, but if you work for understanding them all (with their limitations), you'll learn a lot about data manipulation in MATLAB ;-)

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Cedric le 11 Mai 2014

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/129188-filtering-max-values-within-every-3-hours-of-data#answer_136412

Modifié(e) : Cedric le 13 Mai 2014

Ouvrir dans MATLAB Online

You could build an approach along the following line:

 % - Build array of time components. 
 dv = datevec( datatoupload(:,1), 'yyyy-mm-dd hh:MM' ) ;
 % - Build logical index of relevant/boundary hours.
 hrId      = rem( dv(:,4), 3 ) == 0 ;
 hrId(end) = true ;                     % Last entry always relevant.
 % - Build vector of (time-) group IDs.
 hrGrId = cumsum( [true; hrId(1:end-1)] ) ;
 % - Get statistics per group.
 grMax = accumarray( hrGrId, cell2mat(datatoupload(:,2)), [], @max ) ;
 % - Build output.
 output = [datatoupload(hrId,1), num2cell(grMax)] ;

Applied to your MAT file, this gives:

 >> output
 output = 
    '1984-01-01 00:00'    [10.8024]
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 06:00'    [ 9.7736]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 12:00'    [ 8.7448]
    '1984-01-01 15:00'    [ 8.7448]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]
    '1984-01-03 01:00'    [10.8024]

EDIT 1: the following solution should work with missing data, but you'll have to understand it and fully test it. In the beginning, I am building a test data set with missing data.

 load( 'masao_1.mat' ) ;
 data = datatoupload ;
 % - Remove rows so for day 1:
 %  > Hours 4,5,6 have only data for hours 5,6.
 %  > Hours 10,11,12 have only data for hours 10, 11. 
 %  > Hours 13,14,15 have no data. 
 data([16,15,14,13,5], :) = [] ;
 % - Build array of time components. 
 dv = datevec( data(:,1), 'yyyy-mm-dd hh:MM' ) ;
 % - Build vector of (time-) group IDs.
 %  1. Absolute hour ID.
 hrId = dv(:,4:-1:1) * cumprod( [1; 24; 32; 12] ) - 1 ;
 %  2. Contiguous group start pos.
 grStart = [true; diff(floor(hrId/3)) ~= 0] ;             
 %  3. Group start pos.
 grEnd = [grStart(2:end); true] ;
 %  4. Contiguous group IDs.
 grId = cumsum( grStart ) ;
 % - Get statistics per group.
 grMax  = accumarray( grId, cell2mat(data(:,2)), [], @max ) ;
 % - Build output.
 output = [data(grEnd,1), num2cell(grMax)] ;

Looking at the output

 >> output
 output = 
    '1984-01-01 00:00'    [10.8024]
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 06:00'    [ 9.7736]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 11:00'    [ 8.7448]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]
    '1984-01-03 01:00'    [10.8024]

you see that it seems to be working when missing data hit boundary hours or not, and when they cover full period(s).

EDIT 2: the following solution removes data from groups/periods which don't include 3 entries/measures. Again, you have to fully understand it and test it. You may find an easier way to eliminate entries, I didin't really optimize.

load( 'masao_1.mat' ) ;
data = datatoupload ;
% - Remove rows so for day 1:
%  > Hours 4,5,6 have only data for hours 5,6.
%  > Hours 10,11,12 have only data for hours 10, 11. 
%  > Hours 13,14,15 have no data. 
data([16,15,14,13,5], :) = [] ;
% - Build array of time components. 
dv = datevec( data(:,1), 'yyyy-mm-dd hh:MM' ) ;
% - Build vector of (time-) group IDs.
%  1. Absolute hour ID.
hrId = dv(:,4:-1:1) * cumprod( [1; 24; 32; 12] ) - 1 ;
%  2. Contiguous group start pos.
grStart = [true; diff(floor(hrId/3)) ~= 0] ;             
%  3. Contiguous group IDs, 1st shot, any size taken into account.
grId = cumsum( grStart ) ;
%  4. Flag data which do not belong to a full group (= 3 entries).
grIsValid    = accumarray( grId, ones(size(grId)) ) == 3 ;
entryIsValid = ismember( grId, find(grIsValid) ) ;
%  5. Eliminate invalid entries, rebuild valid group starts and IDs.
data(~entryIsValid,:)    = [] ;
grStart(~entryIsValid,:) = [] ;
grId = cumsum( grStart ) ;
% - Get statistics per group.
grMax = accumarray( grId, cell2mat(data(:,2)), [], @max ) ;
% - Build output.
grEnd  = [grStart(2:end); true] ;
output = [data(grEnd,1), num2cell(grMax)] ;

With that, we obtain..

 >> output
 output = 
    '1984-01-01 03:00'    [11.8312]
    '1984-01-01 09:00'    [ 9.7736]
    '1984-01-01 18:00'    [ 9.2592]
    '1984-01-01 21:00'    [ 7.2016]
    '1984-01-02 00:00'    [ 5.1440]
    '1984-01-02 03:00'    [ 4.6296]
    '1984-01-02 06:00'    [ 3.6008]
    '1984-01-02 09:00'    [ 2.5720]
    '1984-01-02 12:00'    [ 7.2016]
    '1984-01-02 15:00'    [10.2880]
    '1984-01-02 18:00'    [11.8312]
    '1984-01-02 21:00'    [12.3456]
    '1984-01-03 00:00'    [10.2880]

which don't contain summaries when fewer than 3 entries are available.

8 commentaires
Afficher 6 commentaires plus anciensMasquer 6 commentaires plus anciens

mashtine le 13 Mai 2014

Where can I send your trophy my friend? I am really appreciative of the help this community gives! God knows I would still be in the ice age learning. Many thanks Cedric

Cedric le 13 Mai 2014

Modifié(e) : Cedric le 13 Mai 2014

Ouvrir dans MATLAB Online

My pleasure, I'm glad it helped!

You can favor simpler approaches when you don't know how to proceed. Basically, you know very well how to proceed (easily) by hand: you would take and discard the first entries until you find the beginning of a group, check that the group is full, take the max, associate it with the time stamp, and then repeat the process. This can quite easily be implemented with a loop and a few IF statements .. it may not be nice to see, not be too efficient either (even though, loops can be quite efficient nowadays), but you'd make it. The code would look like what follows in the great lines:

 nRows = size( data, 1 ) ;
 times = cell( nRows, 1 ) ;  % Overshoot prealloc => trunc in the end.
 maxes = zeros( nRows, 1 ) ; % Overshoot prealloc => trunc in the end.
 inId  = 0 ;
 outId = 0 ;
 while inId <= nRows
    inId = inId + 1 ;
    if "not a start hour"
       continue ;
    end
    if "next 2 entries not contiguous"
       continue ;
    end
    outId = outId + 1 ;
    % Compute max of current and next 2 entries, store in maxes(outId)
    ...
    % Store time stamp in times{outId}.
    ...
 end
 times = times(1:outId) ;    % Truncate to relevant size.
 maxes = maxes(1:outId) ;    % Truncate to relevant size.

Connectez-vous pour commenter.

Answer 2

dpb le 11 Mai 2014

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/129188-filtering-max-values-within-every-3-hours-of-data#answer_136413

Ouvrir dans MATLAB Online

An alternate approach...

>> dat=circshift(dat(:,1:2),-1);   % place the first of groups of three on first row
>> ddat=cell2mat(dat(:,2));        % convert numeric to array for convenience(+)
>> [~,imx]=max(reshape(ddat(1:fix(length(dat)/3)*3),3,[]));  % find max for each group
>> res=dat(imx+[0:3:(length(imx)-1)*3],1:2)                  % select those from full array
res = 
  '1984-01-01 01:00'    [11.8300]
  '1984-01-01 04:00'    [ 9.7700]
>>

I didn't read Cedric's answer carefully enough to unequivocally answer your question re: missing values; I'd suggest trying it for a sample dataset and see how it works.

For correct operation of the above you will need to fill in missing data "holes" if they do exist similarly as to how Cedric built a date vector from your first to last to ensure the "divide by 3" works to build the right number of columns.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Andrei Bobrov le 11 Mai 2014

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/129188-filtering-max-values-within-every-3-hours-of-data#answer_136424

Modifié(e) : Andrei Bobrov le 11 Mai 2014

Ouvrir dans MATLAB Online

load upload.mat
a = datatoupload;
[~,~,~,hh] = datevec(a(:,1));
hh(hh==0) = 24;
h2 = (rem(hh-1,3)+1)==1;
h2(1) = 1;
ii = cumsum(h2);
[~,b] = unique(ii,'last');
out = [a(b,1) accumarray(ii,cat(1,a{:,2}),[],@(x){max(x)})];

add variant

t = datenum(1984,1,1,3*(0:ceil(diff(datenum(a([1,end],1)))*24/3))',0,0);
t0 =  datenum(a(:,1));
t1=t+eps(1e6);
[~,ii] = histc(t0,t1);
out = [cellstr(datestr(t,'yyyy-mm-dd HH:MM')),...
                            accumarray(ii+1,cat(1,a{:,2}),[],@(x){max(x)})];

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Filtering max values within every 3 hours of data

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

8 commentaires
Afficher 6 commentaires plus anciensMasquer 6 commentaires plus anciens

Plus de réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Filtering max values within every 3 hours of data

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponse acceptée

8 commentaires Afficher 6 commentaires plus anciensMasquer 6 commentaires plus anciens

Plus de réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

8 commentaires
Afficher 6 commentaires plus anciensMasquer 6 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens