Multiple selection of data (dates,mon​ths,days,h​ours)

Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;
Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0
2005-01-01 00:00:00.0/2005-01-01 01:00:00.0
2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]
I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.
Please Help !!!!

1 commentaire

Sofia Fourkioti
Sofia Fourkioti le 4 Mai 2020
Modifié(e) : Sofia Fourkioti le 4 Mai 2020
I forgot to mention that my data is from 2004-02-01 to 2020-01-31, including everyday hourly values . I want to group the values as specific year,specific month, specific day for the hour range of 0-23.
Thanks!

Connectez-vous pour commenter.

 Réponse acceptée

Adam Danz
Adam Danz le 4 Mai 2020
Modifié(e) : Adam Danz le 4 Mai 2020
Follow these steps to prepare in your data more efficiently and in a user-friendly format.
Use readtable to read in the data.
filePath = 'C:\Users\name\Documents\MATLAB'; % Path to your csv file
fileName = 'κοζάνη.csv';
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);
Let's look at T using head() which shows us the first few rows
>> head(T)
ans =
8×11 table
x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI ClearSkyDHI ClearSkyBNI GHI BHI DHI BNI Reliability
_______________________________________________ ______ ___________ ___________ ___________ ___________ ______ ______ ______ ______ ___________
{'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0 0 0 0 0 0 0 1
{'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659 1.2983 16.531 1.789 0.4964 1.2926 14.457 1
{'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474 33.111 516.95 97.043 61.139 35.905 408.39 1
{'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68 56.764 799.67 268.38 203.61 64.77 710.04 1
Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!
Split the x_ObservationPeriod into start / stop times and convert to datetime format.
This will take a few seconds because you've got 140256 rows of data.
% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:}));
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl);
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'};
Let's look at startStopTblDt
>> head(startStopTblDt)
ans =
8×2 table
ObservationStart ObservationStop
____________________ ____________________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00
Much better format. You can update the output format using this example,
startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';
Combine the two tables
T = [startStopTblDt, T];
View the results (first 10 rows, 6 columns)
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
If you want to remove the x_ObservationPeriod column,
T.x_ObservationPeriod = [];
Add row numbers to the table that correspond to row number in csv file
rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);
View results
>> T(1:10,1:6)
ans =
10×6 table
ObservationStart ObservationStop x_ObservationPeriod TOA ClearSkyGHI ClearSkyBHI
____________________ ____________________ _______________________________________________ ______ ___________ ___________
44 01-Feb-2004 00:00:00 01-Feb-2004 01:00:00 {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'} 0 0 0
45 01-Feb-2004 01:00:00 01-Feb-2004 02:00:00 {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'} 0 0 0
46 01-Feb-2004 02:00:00 01-Feb-2004 03:00:00 {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'} 0 0 0
47 01-Feb-2004 03:00:00 01-Feb-2004 04:00:00 {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'} 0 0 0
48 01-Feb-2004 04:00:00 01-Feb-2004 05:00:00 {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'} 0 0 0
49 01-Feb-2004 05:00:00 01-Feb-2004 06:00:00 {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'} 5.4901 1.8642 0.5659
50 01-Feb-2004 06:00:00 01-Feb-2004 07:00:00 {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'} 175.57 104.59 71.474
51 01-Feb-2004 07:00:00 01-Feb-2004 08:00:00 {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'} 397.79 285.44 228.68
52 01-Feb-2004 08:00:00 01-Feb-2004 09:00:00 {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'} 574.58 440.45 372.5
53 01-Feb-2004 09:00:00 01-Feb-2004 10:00:00 {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'} 693.92 546.61 472.8
Now you can see that the first row of data comes from line 44 of the csv file.

12 commentaires

Thank you for your help!!
Glad I could help. Please concider accepting one of the answers or let us know how we could help you further with this question.
You helped me a lot. Is there a way to categorize the data by year and by hour as in the following example? I would like to know if the data can be presented like this;
Year Month Day Hour TOA..... Year Month Day Hour TOA.....
2004 01 01 0 ... 2004 02 01 1 ...
2005 01 01 0 .... 2005 02 01 1 .....
. .
. .
. .
2020 01 01 0 ...... 2020 02 01 1 ......
And I want to create same tables for the rest days, months and hours of these years. This would be very helpful!
You can add those columns after converting the character arrays to datetime values (shown in my answer). See my answer to learn how to remove a column from the table. Use movevars() to rearrange the order of the columns within the table.
T.Year = year(T.ObservationStart);
T.Month = month(T.ObservationStart);
T.Day = day(T.ObservationStart);
T.Hour = hour(T.ObservationStart);
However, it's almost always a better idea to index the datetime values directly rather than puling the year/month/day/hour values apart.
Yes, you are right,my bad.
The reason I do all that is that I want to calculate the TOA of the average year, which is the average value of every hour of every day of every month of the timerange 2004-2020. To do that, I have to collect all the values of all the years for the 1st hour of 1st January (for example). That's why I initially posted the following form;
Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0
2005-01-01 00:00:00.0/2005-01-01 01:00:00.0
2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]
As you can see what changes is the year and the rest is stable. The struggle is that I have to do this selection many times and more particularly for 365 days * 24 hours for timerange 2004-2020.
Add the year column
T.Year = year(T.ObservationStart);
Then compute the mean TOA per year,
annualMeans = groupsummary(T,'Year', 'mean', 'TOA');
Result:
annualMeans =
17×3 table
Year GroupCount mean_TOA
____ __________ ________
2004 8040 341.05
2005 8760 327.44
2006 8760 327.44
2007 8760 327.44
2008 8784 326.98
2009 8760 327.44
2010 8760 327.44
2011 8760 327.44
2012 8784 326.97
2013 8760 327.44
2014 8760 327.44
2015 8760 327.44
2016 8784 326.98
2017 8760 327.44
2018 8760 327.44
2019 8760 327.45
2020 744 175.05
Sofia Fourkioti
Sofia Fourkioti le 5 Mai 2020
Modifié(e) : Sofia Fourkioti le 5 Mai 2020
Yes, this is a part of the final solution, but the exact thing I am trying to do is that I want to gather , for example, all Januaries (from 2004-2020), all Februaries (from 2004-2019) etc and then find the average values of each group.
Sorry I wasn't clear earlier, but it's a bit difficult for me to explain it.
I appreciate a lot your help!
Adam Danz
Adam Danz le 5 Mai 2020
Modifié(e) : Adam Danz le 5 Mai 2020
It sounds like retime()is what you're looking for. This will compute means over years, months, days, whatever. But first you need to convert your table to a timetable which is easy.
% Create a timetable
% Choose which row-time variable to use ObservationStart or ObservationStop
% and inclue any numerical variables you wish to average.
TT = table2timetable([T(:,'ObservationStart'), T(:,{'TOA','ClearSkyGHI'})]);
If your table T had row names, remove them from the timetable,
TT.Row = [];
Compute monthly, yearly, etc, means,
mu = retime(TT, 'monthly', 'mean');
mu = retime(TT, 'yearly', 'mean');
Results for monthly means (first few rows)
head(mu)
ans =
8×2 timetable
ObservationStart TOA ClearSkyGHI
________________ ______ ___________
01-Feb-2004 237.34 173.27
01-Mar-2004 322.04 235.79
01-Apr-2004 402.99 294.1
01-May-2004 459.16 341.42
01-Jun-2004 481.67 355.41
01-Jul-2004 467.79 346.46
01-Aug-2004 419.7 309.82
01-Sep-2004 346.55 250.19
Sofia Fourkioti
Sofia Fourkioti le 5 Mai 2020
Modifié(e) : Sofia Fourkioti le 6 Mai 2020
Thank you very much for your help!!I think I found the solution to my problem!!
If anything else comes up I will inform you!!
I also would like to ask you about groupsummary. I saw something relative on another problem, which looks like mine, but I don't know how to use it. Do you think that it could be a better solution?
Adam Danz
Adam Danz le 6 Mai 2020
Modifié(e) : Adam Danz le 6 Mai 2020
"Do you think that it could be a better solution"
No, because I don't think groupsummary is better. Retime is more flexible and it's designed to work with timetables. With groupsummary, you would need to create a grouping variable for every type of average (yearly, monthly, etc). Retime does that for you.
You can follow examples for groupsummary in the documentation.
Example
>> T.month = month(T.ObservationStart);
>> groupsummary(T(:,{'TOA','month'}), 'month', 'mean')
% Result (first 3 rows)
% ans =
% 12×3 table
% month GroupCount mean_TOA
% _____ __________ ________
% 1 11904 175.52
% 2 10848 237.48
% 3 11904 321.15
But this requires you to add an additional column ('month') to an already large table and the data in that column is redundant.
Thank you for your help!!

Connectez-vous pour commenter.

Plus de réponses (1)

dpb
dpb le 4 Mai 2020
Attach text that folks can do something with instead of pictures...or just attach the .csv file itself.
Use split to break the two times apart going into datetime
>> datetime(split(TSTR,'/').','InputFormat','uuuu-MM-dd''T''HH:mm:ss.S')
ans =
1×2 datetime array
01-Jan-2004 00:00:00 01-Jan-2004 01:00:00
>>
where I just copied your first line above into TSTR, split it into the two sections and returned that as the 2-column cell array to pass to datetime.
Once you have datetimes, put those into a table or timetable and use the selection on those datetime values.

1 commentaire

Sofia Fourkioti
Sofia Fourkioti le 4 Mai 2020
Modifié(e) : Sofia Fourkioti le 4 Mai 2020
Thank you!I have attached my .csv in the comments.
I have already split my data, but this is not my problem. Is there a way to locate this data and find their exact position in the .csv?

Connectez-vous pour commenter.

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by