Identifying Missing data years in and retaining years with maximum value
Afficher commentaires plus anciens
Hello,
I have a matrix of meteorological data in foll. format, with 1st column is for the year (1993 to 2014), 2nd column is month, 3rd column is day, 4th is hour, and the last column is value. The value of 4th column varies from 0 to 23 hr, making a full day. An winter period is defined as from previous year's month of October to next year March. Since it's hourly data, the total number of rows for each season should have 26064 rows (= 24 hr*6 months* 182 days) for nonleap years and 26352 rows (24 hr*6 months* 183 days) for leap years. I need to check whether the winter period of successive years has more than 25% of data (or 6516 for nonleap years/ 6588 for leap years) available. In case data is less than 25%, I have to check in which half of year more number of data are missing (for example in present case it is the year 1993) and exclude that year's rows completely from the output file while retaining the next year's row. Likewise, I have to check for all successive years from 1993 to 2014.
1993 10 1 0 2.44
1993 10 1 1 2.04
1993 10 1 2 1.79
1993 10 1 3 1.72
1993 10 1 4 1.395
1993 10 1 5 1.154
1993 10 1 6 0.913
1993 10 1 7 0.672
1993 10 1 8 0.431
1993 10 1 9 0.19
1993 10 1 10 2.44
1993 10 1 11 2.04
1993 10 1 12 Nan
1993 10 1 13 Nan
1993 10 1 14 Nan
1993 10 1 15 Nan
1993 10 1 16 Nan
1993 10 1 17 Nan
1993 10 1 18 Nan
1993 10 1 19 Nan
1993 10 1 20 Nan
1993 10 1 21 Nan
1993 10 1 22 Nan
1993 10 1 23 Nan
...................................
...................................
1994 3 31 23 3.82
1994 3 31 23 3.9
1994 3 31 23 3.66
4 commentaires
dpb
le 13 Août 2017
"...have to check in which half of year more number of data are missing ... and exclude that year's rows completely from the output file while retaining the next year's row."
Not sure exactly which half the "half of year" is referring to -- would you not simply exclude the year for which the 25% isn't meet for the winter season for the year of that winter? Now whether you're naming Oct-Mar as the year of October or March of the following year is the question it would seem, not whether there were more or less data for one or the other in the time series determining which year it is going to be called belonging to -- that would lead to possibly inconsistent years being assigned.
Poulomi Ganguli
le 13 Août 2017
Well, actually I've been off 'spearmint-ing with the timeseries object, which while it isn't really all that new I've not ever actually used. It seemed as though it should be suited for such manipulations...
Of course, it's not terribly difficult to simply use datetime or datenum and do manipulations directly on the times, but I thought I'd see if the time series actually has anything to add here...a start along the way is in the Answer albeit not complete as yet...but I think it should lead to a solution if you'll pursue on the path--
However, you really didn't answer the question regarding the 50% rule you gave; you did answer the "which year does the data belong?" question so I'll just presume the time during which data are missing within the season actually is immaterial -- seems to me it has to be, anyway. You can adjust however you see fit if there is some other reason/pattern that is significant.
dpb
le 13 Août 2017
...
1994 3 31 23 3.82
1994 3 31 23 3.9
1994 3 31 23 3.66
Whassup w/ that? Are there really such duplicates in the file or is that just an error in the post that those are supposed to be hours 21, 22, 23?
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur Calendar dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!