Improve speed to max calculation for max daily output
2 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
mashtine
le 6 Août 2015
Commenté : Peter Perkins
le 14 Sep 2015
Hi there,
I have done a good few variations of this and this code is the simpliest but not the most time efficient and it seems that the max calculation is the slowest.
[nrow,ncol] = size(test_data);
test_output = cell(nrow,ncol);
for i = 1:nrow;
for j = 1:ncol;
inpdata = double(test_data{i,j});
% Build array of time components
dv = datevec(inpdata(:,1)) ;
% Find all timestamps where the HH value (col 4) is 0 and 6
time_markers = find(dv(:,4)==6 | dv(:,4)==0);
% Preallocate output array
daily_max = zeros(length(test_data),4);
for n = 1:2:length(time_markers)-2;
daily_max(n,1) = inpdata(time_markers(n+1,1),1);
daily_max(n,2) = mean(inpdata(time_markers(n,1):time_markers(n+1,1),2));
daily_max(n,3) = mean(inpdata(time_markers(n,1):time_markers(n+1,1),3));
daily_max(n,4) = max(inpdata(time_markers(n,1):time_markers(n+1,1),4));
end
% Remove any extra cells in output file
daily_max(daily_max(:,1)==0,:) = [];
% Plug this into the final output
test_output{i,j} = daily_max;
end
end
Any ideas of improving its performance? I know there are some unnecessary lines that I need to fine tune, but the main issue slowing the performance is the calling of the max, particularly, the mean function. The dv variable (first 10 rows) looks like this, if it helps understand what I want done:
1980 1 1 6 0 0
1980 1 1 12 0 0
1980 1 1 18 0 0
1980 1 2 0 0 0
1980 1 2 6 0 0
1980 1 2 12 0 0
1980 1 2 18 0 0
1980 1 3 0 0 0
1980 1 3 6 0 0
1980 1 3 12 0 0
and I want anything between 6 and the consecutive 0 to count as a day. The data has been formatted already so that there are no missing timesteps.
2 commentaires
Peter Perkins
le 6 Août 2015
Mashtine, you're using a triple-nested loop. That almost certainly is not the way to go.
You should attach a short example of your input data, what you want as the result, and an explanation of the calculations to create that result.
Réponse acceptée
Peter Perkins
le 6 Août 2015
There are lots of ways to do this. Here's one that assumes you have R2014b or later. If you only have R2013b or later, you can still use a table, but you'd have to use datenum and datestr rather than datetime, which was added in R2014b.
First load your numeric matrix and create a table, and then convert the datenum to a datetime:
>> load test_data2.mat
>> test_data = array2table(test_data,'VariableNames',{'Time' 'X' 'Y' 'Z'});
>> test_data.Time = datetime(test_data.Time,'ConvertFrom','datenum')
test_data =
Time X Y Z
____________________ ______ _______ ______
01-Jan-1980 06:00:00 3.2872 0.34067 5.4056
01-Jan-1980 12:00:00 1.268 0.20843 2.9019
01-Jan-1980 18:00:00 2.8944 0.22515 4.5896
02-Jan-1980 00:00:00 7.9143 0.57301 10.884
02-Jan-1980 06:00:00 14.369 1.0058 18.924
02-Jan-1980 12:00:00 17.886 1.2894 23.48
[snip]
Next, create a variable that defines the way you want to group the rows of that table. On the minus side, you want to group midnight of tomorrow with 6am, 12pm, and 6pm of today, so you can't just get the day number. On the plus side, your data are completely regular, so you can just take each consecutive group of four rows:
>> n = height(test_data);
>> test_data.Day = repelem(1:(n/4),4)'
test_data =
Time X Y Z Day
____________________ ______ _______ ______ ___
01-Jan-1980 06:00:00 3.2872 0.34067 5.4056 1
01-Jan-1980 12:00:00 1.268 0.20843 2.9019 1
01-Jan-1980 18:00:00 2.8944 0.22515 4.5896 1
02-Jan-1980 00:00:00 7.9143 0.57301 10.884 1
02-Jan-1980 06:00:00 14.369 1.0058 18.924 2
02-Jan-1980 12:00:00 17.886 1.2894 23.48 2
[snip]
Finally, do the grouped calculation on the table, and pretty up the result:
>> dailyStats = @(x,y,z) deal(mean(x),mean(y),max(z));
>> dailies = rowfun(dailyStats,test_data, ...
'GroupingVariable','Day', 'InputVariables',{'X' 'Y' 'Z'}, ...
'OutputVariableNames',{'meanX' 'meanY' 'maxZ'});
>> dailies.Properties.RowNames = {}; % don't need these
>> dailies.Day = dateshift(test_data.Time(1:4:n),'start','day');
>> dailies.Day.Format = 'dd-MMM-yyyy'
dailies =
Day GroupCount meanX meanY maxZ
___________ __________ ______ _______ ______
01-Jan-1980 4 3.841 0.33681 10.884
02-Jan-1980 4 16.812 1.1289 24.982
03-Jan-1980 4 9.4298 0.62444 17.041
04-Jan-1980 4 14.185 0.97222 24.212
05-Jan-1980 4 12.899 0.99925 21.861
06-Jan-1980 4 6.2882 0.53728 10.743
[snip]
Hope this helps.
2 commentaires
Peter Perkins
le 14 Sep 2015
I think you mean, "I have to do the same grouped calculation of 121*97 sets of data." If that's the case, it seems like you have two options:
- Loop over the data sets and do the calculation 121*97 times, or
- Somehow combine the separate data sets into one
I can't say how to do the latter, since I don't really know anything about your data.
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Dates and Time dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!