Removing a Date (Leap Year) from being Averaged
    6 vues (au cours des 30 derniers jours)
  
       Afficher commentaires plus anciens
    
    CMatlabWold
 le 13 Avr 2020
  
    
    
    
    
    Commenté : CMatlabWold
 le 3 Mai 2020
            Hi,. 
I have a code. The data takes place over 10 years. And I need the average for each day of the year: 1-365. I think everything is fine with my code.
My biggest problem are the leap years of 2012 and 2016. I don't even want to know their average. I just want them removed.
t= readtable('SSArcGIS.xls'); % read as table
t.S1 = single(strcmp(t.ComplaintLocation,'S1'));
t.S2 = single(strcmp(t.ComplaintLocation,'S2'));
t.S3 = single(strcmp(t.ComplaintLocation,'S3'));
t.S4 = single(strcmp(t.ComplaintLocation,'S4'));
t.S5 = single(strcmp(t.ComplaintLocation,'S5'));
t.S6 = single(strcmp(t.ComplaintLocation,'S6'));
t.S7 = single(strcmp(t.ComplaintLocation,'S7'));
t.S8 = single(strcmp(t.ComplaintLocation,'S8'));
t.S9 = single(strcmp(t.ComplaintLocation,'S9'));
t.S10 = single(strcmp(t.ComplaintLocation,'S10'));
t.S11 = single(strcmp(t.ComplaintLocation,'S11'));
t.S12 = single(strcmp(t.ComplaintLocation,'S12'));
t.S13 = single(strcmp(t.ComplaintLocation,'S13'));
t.S14 = single(strcmp(t.ComplaintLocation,'S14'));
% convert to date time
t.Date = datetime(t.Date, 'Format', 'MM/dd/yyyy');
% get unique dates
T1=datetime('01/01/2010');
T2=datetime('12/31/2019');
T=T1:T2;
unique_dates = T;
for ii =1: length(unique_dates)
    tmp = t(ismember(t.Date, unique_dates(ii)),:);
    % get unique dates
    data(ii).Date = unique_dates(ii);
    data(ii).S1 = sum(tmp.S1);
    data(ii).S2 = sum(tmp.S2);
    data(ii).S3 = sum(tmp.S3);
    data(ii).S4 = sum(tmp.S4);
    data(ii).S5 = sum(tmp.S5);
    data(ii).S6 = sum(tmp.S6);
    data(ii).S7 = sum(tmp.S7);
    data(ii).S8 = sum(tmp.S8);
    data(ii).S9 = sum(tmp.S9);
    data(ii).S10 = sum(tmp.S10);
    data(ii).S11 = sum(tmp.S11);
    data(ii).S12 = sum(tmp.S12);
    data(ii).S13 = sum(tmp.S13);
    data(ii).S14 = sum(tmp.S14);
end
% convert structure to table
data = struct2table(data);
temp = data;
temp.day = day(temp.Date,'dayofyear');
temp.Date = [];
% Now use varfun with @mean as your function and Day as your GroupingVariable
dailyAverage = varfun(@sum,temp,'GroupingVariables','day');
temp = data;
temp.Year = year(temp.Date);
temp.Date = [];
Any help will be appreciated.
Thanks
0 commentaires
Réponse acceptée
  Peter Perkins
    
 le 14 Avr 2020
        Not 100% sure what you are trying to end up with, but my guess is the average number of each type of complaint for each day of the year: 365 of them. You are on the right track using 'dayofyear', but there are easier ways to get to that point. First ,read the data into a timetable.
>> tt1 = readtimetable('SSArcGIS.xls');
>> tt1.ComplaintLocation = categorical(tt1.ComplaintLocation);
>> head(tt1)
ans =
  8×1 timetable
       Date        ComplaintLocation
    ___________    _________________
    02-Jan-2010           S13       
    02-Jan-2010           S11       
    02-Jan-2010           S5        
    04-Jan-2010           S5        
    05-Jan-2010           S13       
    06-Jan-2010           S12       
    06-Jan-2010           S5        
    06-Jan-2010           S3        
You have one row per complaint, across all types in one variable. You want one variable for each type. unstack is good at doing that. In this case, you want counts, so you'll need to add a dummy variable to "count up".
>> tt1.Dummy = ones(height(tt1),1);
>> tt2 = unstack(tt1,'Dummy','ComplaintLocation');
>> head(tt2)
ans =
  8×14 timetable
       Date        S1     S10    S11    S12    S13    S14    S2     S3     S4     S5     S6     S7     S8     S9 
    ___________    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___    ___
    02-Jan-2010    NaN    NaN      1    NaN      1    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN
    04-Jan-2010    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN
    05-Jan-2010    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    06-Jan-2010    NaN    NaN    NaN      1    NaN    NaN    NaN      1    NaN      1      1    NaN    NaN    NaN
    07-Jan-2010      1    NaN      1      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    08-Jan-2010    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN
    09-Jan-2010    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN    NaN
    13-Jan-2010    NaN    NaN    NaN    NaN    NaN      1    NaN    NaN    NaN    NaN    NaN    NaN      1    NaN
Unstack uses @sum by default to "aggregate", in other words, each date might have three S1 complaints, so unstack would compute sum([1;1;1]), and fill the corresponding element of the output with 3. For empty cases, sum returns NaN. It would be easy to use the AggregationFunction parameter to fill those with 0, but it's even easier to replace the NaNs afterwards.
>> tt2 = fillmissing(tt2,'Constant',0);
>> head(tt2)
ans =
  8×14 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __
    02-Jan-2010    0      0      1      0      1      0     0     0     0     1     0     0     0     0 
    04-Jan-2010    0      0      0      0      0      0     0     0     0     1     0     0     0     0 
    05-Jan-2010    0      0      0      0      1      0     0     0     0     0     0     0     0     0 
    06-Jan-2010    0      0      0      1      0      0     0     1     0     1     1     0     0     0 
    07-Jan-2010    1      0      1      1      0      0     0     0     0     0     0     0     0     0 
    08-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     1     0 
    09-Jan-2010    0      0      0      1      0      0     0     0     0     0     0     0     0     0 
    13-Jan-2010    0      0      0      0      0      1     0     0     0     0     0     0     1     0 
The data skips quite a few days:
>> caldiff(tt1.Date([1 end]),'days')
ans = 
  calendarDuration
   3651d
>> height(tt2)
ans =
        3122
One wonders if maybe the complaint center is closed on weekends?
>> unique(day(tt2.Date,'dow'))
ans =
     1
     2
     3
     4
     5
     6
     7
Nope, apparently not, so all those missing days must have had no complaints. To get the true mean per day, you'll need to add them in.
>> t = datetime(2010,1,1):caldays(1):datetime(2019,12,31);
>> tt3 = retime(tt2,t,'FillWithConstant','Constant',0);
>> head(tt3)
ans =
  8×14 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __
    01-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     0     0 
    02-Jan-2010    0      0      1      0      1      0     0     0     0     1     0     0     0     0 
    03-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     0     0 
    04-Jan-2010    0      0      0      0      0      0     0     0     0     1     0     0     0     0 
    05-Jan-2010    0      0      0      0      1      0     0     0     0     0     0     0     0     0 
    06-Jan-2010    0      0      0      1      0      0     0     1     0     1     1     0     0     0 
    07-Jan-2010    1      0      1      1      0      0     0     0     0     0     0     0     0     0 
    08-Jan-2010    0      0      0      0      0      0     0     0     0     0     0     0     1     0 
>> height(tt3)
ans =
        3652
Now we're in business: add day of year as a grouping variable, and compute grouped means. Once you have day of year, the actual dates are irrelevant, so convert to a table.
>> tt3.DoY = day(tt3.Date,'dayofyear');
>> tt4 = timetable2table(tt3,'ConvertRowTimes',false);
>> tt4 = varfun(@mean,tt4,'GroupingVariable','DoY');
>> head(tt4)
ans =
  8×16 table
    DoY    GroupCount    mean_S1    mean_S10    mean_S11    mean_S12    mean_S13    mean_S14    mean_S2    mean_S3    mean_S4    mean_S5    mean_S6    mean_S7    mean_S8    mean_S9
    ___    __________    _______    ________    ________    ________    ________    ________    _______    _______    _______    _______    _______    _______    _______    _______
     1         10          0.1        0.1           0         0.4         0.1           0         0.1        0.1        0.1        0.1          0        0.1        0.1          0  
     2         10            0        0.2         0.4         0.5         0.6         0.2           0        0.4        0.2        0.5        0.1        0.1        0.1          0  
     3         10          0.6        0.7         0.1         0.7         0.4         0.2         0.3        0.4        0.2          1        0.1          0          0          0  
     4         10          0.1        0.2         0.2         0.6         0.4         0.2         0.2        0.3        0.2        0.9          0        0.4        0.1        0.1  
     5         10          0.1        0.2         0.1         0.3         0.3         0.3           0        0.3        0.2        0.3        0.4        0.1        0.4        0.1  
     6         10            0        0.1         0.4         0.9         0.5         0.2         0.2        0.6        0.1        0.6        0.6        0.1        0.1        0.3  
     7         10          0.1        0.1         0.4         0.8         0.3         0.3         0.1          0          0          0          0          0        0.2          0  
     8         10          0.1        0.2         0.2         0.6         0.3         0.1         0.3        0.3        0.1        0.6        0.2        0.1        0.4          0  
You could also do that with groupsummary. Leap days? Just remove them.
>> tt3(tt3.DoY==366,:)
ans =
  2×15 timetable
       Date        S1    S10    S11    S12    S13    S14    S2    S3    S4    S5    S6    S7    S8    S9    DoY
    ___________    __    ___    ___    ___    ___    ___    __    __    __    __    __    __    __    __    ___
    31-Dec-2012    0      0      0      0      0      0     0     0     0     4     0     0     1     0     366
    31-Dec-2016    0      0      0      0      0      0     0     0     0     0     0     0     0     0     366
>> tt5 = tt4(tt4.DoY~=366,:);
There were a bunch of steps there. Several of them could be combined, but simplicity.
6 commentaires
  Adam Danz
    
      
 le 29 Avr 2020
				You can read in the data using readtable() and then convert the table to a timetable using table2timetable(). 
Plus de réponses (0)
Voir également
Catégories
				En savoir plus sur Calendar dans Help Center et File Exchange
			
	Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!


