How do I obtain 8760 mean hourly data points from 10.6 years data?

4 vues (au cours des 30 derniers jours)
Chish
Chish le 2 Oct 2023
Commenté : Chish le 5 Oct 2023
I have data from 01-Jan-2013 to 23- Aug-2023. I have attached the XWinddata.xlsx file. The data comprise hourly measured wind speed, direction, temperature and pressure. I want to find hourly average values for the period so that I have 8760 hourly data corresponding to hours in a year. I have cleaned the data to remove NANs. I failed to obtain the required values. I have followed two similar questions that I came accross on this site but was getting errors. Please may you assist.
Following the codes given I created my code
XWinddata.DoY = day(XWinddata.Timestamp,'dayofyear'); % I follwed the answer in the above link, it worked
XWinddata.HoD = hour(XWinddata.Timestamp); % This also worked
head(XWinddata) % Also worked and the DoY and HoD columns were added to my timetable
t = varfun(@mean,XWinddata,'GroupingVariables',{'DoY' 'HoD'},'OutputFormat','table'); % I got an error and got stuck
Error obtained while running the code
Error in tabular/varfun>@(s,varargin)dfltErrHandler(grouped,funName,s,varargin{:}) (line 201)
errHandler = @(s,varargin) dfltErrHandler(grouped,funName,s,varargin{:});
Error in tabular/varfun (line 280)
outVals{igrp} = errHandler(s,inArg);
I also obtained errors.
Thank you in anticipation.

Réponse acceptée

dpb
dpb le 2 Oct 2023
Modifié(e) : dpb le 2 Oct 2023
tt=readtimetable('XWinddata.xlsx');
head(tt,1)
Timestamp Velocityms DirectionDegrees TemperatureDegrees PressurePa yearhour DoY ____________________ __________ ________________ __________________ __________ ____________ ___ 11-Jan-2013 04:00:00 5.1 180 16.9 0.0093189 {'01/11 04'} 11
tt=removevars(tt,{'yearhour','DoY'}); % clean so start over
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
ans = 4486
[head(ttHr);tail(ttHr)]
ans = 16×7 table
DoY HoD GroupCount mean_Velocityms mean_DirectionDegrees mean_TemperatureDegrees mean_PressurePa ___ ___ __________ _______________ _____________________ _______________________ _______________ 1 0 20 4.705 135.5 21.66 0.009343 1 2 20 5.255 128.5 21.25 0.0093378 1 4 20 5.68 125.5 20.96 0.0093348 1 6 20 5.705 117.5 20.64 0.0093323 1 8 20 5.64 116.5 20.265 0.0093322 1 10 20 5.67 130 20.1 0.009336 1 12 20 5.595 126.5 20.15 0.0093402 1 14 20 5.5 125 20.255 0.0093432 366 8 4 4.325 127.5 21.4 0.00933 366 10 4 3.725 110 21.45 0.0093339 366 12 4 3.3 107.5 20.875 0.0093391 366 14 4 3.125 147.5 21.05 0.0093391 366 16 4 2.725 92.5 21.525 0.0093403 366 18 4 3.5 115 21.7 0.0093416 366 20 4 3.475 100 22 0.0093374 366 22 4 4.075 80 22.125 0.0093302
But, you are missing most of the odd hours...
[numel(unique(tt.DoY)) numel(unique(tt.HoD)) numel(unique([tt.DoY tt.HoD],'rows'))/2 366*24]
ans = 1×4
366 24 4486 8784
As the above confirms, there are only 4,486 unique DoY, HoD combinations, not the 8784 that exist in a 366-day year. That's because you have data for most days only on two-hour increments.
This also leads back to the problem that the poster in one of the earlier links had in that there are leap years and they will show up with years that do have 366 days in them and the DOY algorithm will always count them in its conversion, even if you were to remove Feb 29 data from the timetable before computing DoY.
You didn't show us the top level user code error, only the internal error trap so we can't tell for absolute certain what went wrong; I'd guess the problem was you didn't remove the 'yearhour' variable (or use 'InputVariables' named argument pair) to not try to average it as well -- being nonnumeric, you can't take the mean of it.
NOTA BENE: However, it appears as you have only a very limited number of observations for the odd hours; whether to try to interpolate to fill those first or not I'll leave to you to decide...
Says there almost two entries for each time stamp, however, and interpolation only works for unique x values so will have to average first and then fill in..
tt=removevars(tt,{'DoY','HoD'}); % clean to start over
tt=rmmissing(retime(tt,'hourly','mean')); % average remove missing hours
tt=retime(tt,'hourly','linear'); % now fill in odd hours
tt=addvars(tt,day(tt.Timestamp,'dayofyear'),hour(tt.Timestamp),'NewVariableNames',{'DoY','HoD'},'After',{'PressurePa'});
[numel(unique(tt.DoY)) numel(unique(tt.HoD))]
ans = 1×2
366 24
isleapyr=@(t)(eomday(year(t),2)==29); % leap year logical function
isleapday=@(t)((month(t)==2)&(day(t)==29)); % leap day logical function
tt(isleapday(tt.Properties.RowTimes),:)=[]; % now remove Feb29 globally
[numel(unique(tt.DoY)) numel(unique(tt.HoD))] % still 1:366 in DoY
ans = 1×2
366 24
ix=isleapyr(tt.Properties.RowTimes)&(month(tt.Properties.RowTimes)>2); % dates needing fixup
tt.DoY(ix)=tt.DoY(ix)-1; % and adjust to 365 days
[numel(unique(tt.DoY)) numel(unique(tt.HoD))] % now we're back to 365 day year w/ DoY to match
ans = 1×2
365 24
ttHr=varfun(@mean,tt,'GroupingVariables',{'DoY','HoD'},'OutputFormat','table');
height(ttHr)
ans = 8760
[head(ttHr);tail(ttHr)]
ans = 16×7 table
DoY HoD GroupCount mean_Velocityms mean_DirectionDegrees mean_TemperatureDegrees mean_PressurePa ___ ___ __________ _______________ _____________________ _______________________ _______________ 1 0 10 4.705 135.5 21.66 0.009343 1 1 10 4.98 132 21.455 0.0093404 1 2 10 5.255 128.5 21.25 0.0093378 1 3 10 5.4675 127 21.105 0.0093363 1 4 10 5.68 125.5 20.96 0.0093348 1 5 10 5.6925 121.5 20.8 0.0093336 1 6 10 5.705 117.5 20.64 0.0093323 1 7 10 5.6725 117 20.452 0.0093323 365 16 10 3.615 114 20.23 0.0093648 365 17 10 3.7075 119 20.355 0.0093652 365 18 10 3.8 124 20.48 0.0093655 365 19 10 3.8225 124.75 20.63 0.0093645 365 20 10 3.845 125.5 20.78 0.0093636 365 21 10 4 126.5 20.935 0.0093613 365 22 10 4.155 127.5 21.09 0.0093589 365 23 10 4.43 131.5 21.375 0.009351
The above does it by just throwing away Feb 29, you could alternatively perhaps collapse Feb 28-29 to Feb 28 or Feb 29/Mar 1 to Mar 1 to somehow keep the values observed in the mix, but the above is the simplest approach, certainly.
  8 commentaires
dpb
dpb le 4 Oct 2023
Moral: Have to be VERY careful with Excel and dates...<for many reasons>. See @Walter Roberson's comment to the Q? there for another nasty trait.
Chish
Chish le 5 Oct 2023
The site is very useful. Thanks once again

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur MATLAB dans Help Center et File Exchange

Produits


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by