Multiple selection of data (dates,months,days,hours)

Question

0 votes

Hi! I have a csv file and I want to extract a range of data. The form of the data is the following(see comments). As I am a newbie in Matlab, I don't know how to use correctly the "for"loop. My aim is to select data by the year,month,day,hour and create a new table. The selected data must have the following form;

Example : [2004-01-01 00:00:00.0/2004-01-01 01:00:00.0

2005-01-01 00:00:00.0/2005-01-01 01:00:00.0

2006-01-01 00:00:00.0/2006-01-01 01:00:00.0 .....]

I tried to split the dates using the T=split(t,{'-','T','/'}); and I also selected only the start dates (only the part before the " / "sign in the example ["2004-01-01 00:00:00.0"]) as the end dates aren't necessary, but once again my problem is that I have a difficultly to extract the data.

Please Help !!!!

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Sofia Fourkioti le 4 Mai 2020

Modifié(e) : Sofia Fourkioti le 4 Mai 2020

I forgot to mention that my data is from 2004-02-01 to 2020-01-31, including everyday hourly values . I want to group the values as specific year,specific month, specific day for the hour range of 0-23.

Thanks!

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Adam Danz le 4 Mai 2020

Modifié(e) : Adam Danz le 4 Mai 2020

Ouvrir dans MATLAB Online

0 votes

Follow these steps to prepare in your data more efficiently and in a user-friendly format.

Use readtable to read in the data.

filePath = 'C:\Users\name\Documents\MATLAB';  % Path to your csv file
fileName = 'κοζάνη.csv'; 
file = fullfile(filePath, fileName);
opts = detectImportOptions(file);
T = readtable(file,opts);

Let's look at T using head() which shows us the first few rows

>> head(T)
ans =
  8×11 table
                  x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI    ClearSkyDHI    ClearSkyBNI     GHI       BHI       DHI       BNI      Reliability
    _______________________________________________    ______    ___________    ___________    ___________    ___________    ______    ______    ______    ______    ___________
    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0              0              0            0         0         0         0         1     
    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659         1.2983         16.531        1.789    0.4964    1.2926    14.457         1     
    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474         33.111         516.95       97.043    61.139    35.905    408.39         1     
    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68         56.764         799.67       268.38    203.61     64.77    710.04         1 

Notice that the x_ObservationPeriod show what appears to be a START / STOP period of observation but they are in character format. Yuk!

Split the x_ObservationPeriod into start / stop times and convert to datetime format.

This will take a few seconds because you've got 140256 rows of data.

% Split the start/stop times in character format
startStop = cellfun(@(c)strsplit(c, '/'), T.x_ObservationPeriod,'UniformOutput',false);
% Convert to table (2 columns)
startStopTbl = array2table(vertcat(startStop{:})); 
% Convert values from char to datetime
startStopTblDt = varfun(@(v)datetime(v, 'InputFormat', 'yyyy-MM-dd''T''HH:mm:ss.S'), startStopTbl); 
% Name the columns
startStopTblDt.Properties.VariableNames = {'ObservationStart', 'ObservationStop'}; 

Let's look at startStopTblDt

>> head(startStopTblDt)
ans =
  8×2 table
      ObservationStart        ObservationStop   
    ____________________    ____________________
    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00
    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00
    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00
    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00
    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00
    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00
    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00
    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00

Much better format. You can update the output format using this example,

startStopTblDt.ObservationStart.Format = 'MMM dd, yyyy HH:mm:ss';

Combine the two tables

T = [startStopTblDt, T];

View the results (first 10 rows, 6 columns)

>> T(1:10,1:6)
ans =
  10×6 table
      ObservationStart        ObservationStop                     x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI
    ____________________    ____________________    _______________________________________________    ______    ___________    ___________
    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0   
    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0   
    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0   
    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0   
    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0   
    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659   
    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474   
    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68   
    01-Feb-2004 08:00:00    01-Feb-2004 09:00:00    {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'}    574.58      440.45          372.5   
    01-Feb-2004 09:00:00    01-Feb-2004 10:00:00    {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'}    693.92      546.61          472.8   

If you want to remove the x_ObservationPeriod column,

T.x_ObservationPeriod = [];

Add row numbers to the table that correspond to row number in csv file

rowNums = (0:size(T,1)-1) + opts.DataLines(1);
T.Properties.RowNames = compose('%d',rowNums);

View results

>> T(1:10,1:6)
ans =
  10×6 table
            ObservationStart        ObservationStop                     x_ObservationPeriod                   TOA      ClearSkyGHI    ClearSkyBHI
          ____________________    ____________________    _______________________________________________    ______    ___________    ___________
    44    01-Feb-2004 00:00:00    01-Feb-2004 01:00:00    {'2004-02-01T00:00:00.0/2004-02-01T01:00:00.0'}         0           0              0   
    45    01-Feb-2004 01:00:00    01-Feb-2004 02:00:00    {'2004-02-01T01:00:00.0/2004-02-01T02:00:00.0'}         0           0              0   
    46    01-Feb-2004 02:00:00    01-Feb-2004 03:00:00    {'2004-02-01T02:00:00.0/2004-02-01T03:00:00.0'}         0           0              0   
    47    01-Feb-2004 03:00:00    01-Feb-2004 04:00:00    {'2004-02-01T03:00:00.0/2004-02-01T04:00:00.0'}         0           0              0   
    48    01-Feb-2004 04:00:00    01-Feb-2004 05:00:00    {'2004-02-01T04:00:00.0/2004-02-01T05:00:00.0'}         0           0              0   
    49    01-Feb-2004 05:00:00    01-Feb-2004 06:00:00    {'2004-02-01T05:00:00.0/2004-02-01T06:00:00.0'}    5.4901      1.8642         0.5659   
    50    01-Feb-2004 06:00:00    01-Feb-2004 07:00:00    {'2004-02-01T06:00:00.0/2004-02-01T07:00:00.0'}    175.57      104.59         71.474   
    51    01-Feb-2004 07:00:00    01-Feb-2004 08:00:00    {'2004-02-01T07:00:00.0/2004-02-01T08:00:00.0'}    397.79      285.44         228.68   
    52    01-Feb-2004 08:00:00    01-Feb-2004 09:00:00    {'2004-02-01T08:00:00.0/2004-02-01T09:00:00.0'}    574.58      440.45          372.5   
    53    01-Feb-2004 09:00:00    01-Feb-2004 10:00:00    {'2004-02-01T09:00:00.0/2004-02-01T10:00:00.0'}    693.92      546.61          472.8   

Now you can see that the first row of data comes from line 44 of the csv file.

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Adam Danz le 5 Mai 2020

Modifié(e) : Adam Danz le 5 Mai 2020

Ouvrir dans MATLAB Online

It sounds like retime()is what you're looking for. This will compute means over years, months, days, whatever. But first you need to convert your table to a timetable which is easy.

% Create a timetable 
% Choose which row-time variable to use ObservationStart or ObservationStop
% and inclue any numerical variables you wish to average. 
TT = table2timetable([T(:,'ObservationStart'), T(:,{'TOA','ClearSkyGHI'})]);

If your table T had row names, remove them from the timetable,

TT.Row = [];

Compute monthly, yearly, etc, means,

mu = retime(TT, 'monthly', 'mean');
mu = retime(TT, 'yearly', 'mean');

See all options here: https://www.mathworks.com/help/matlab/ref/timetable.retime.html#bvdhiwr-newTimeStep

Results for monthly means (first few rows)

head(mu)
ans =
  8×2 timetable
    ObservationStart     TOA      ClearSkyGHI
    ________________    ______    ___________
    01-Feb-2004         237.34      173.27   
    01-Mar-2004         322.04      235.79   
    01-Apr-2004         402.99       294.1   
    01-May-2004         459.16      341.42   
    01-Jun-2004         481.67      355.41   
    01-Jul-2004         467.79      346.46   
    01-Aug-2004          419.7      309.82   
    01-Sep-2004         346.55      250.19   

Adam Danz le 6 Mai 2020

Modifié(e) : Adam Danz le 6 Mai 2020

Ouvrir dans MATLAB Online

"Do you think that it could be a better solution"

No, because I don't think groupsummary is better. Retime is more flexible and it's designed to work with timetables. With groupsummary, you would need to create a grouping variable for every type of average (yearly, monthly, etc). Retime does that for you.

You can follow examples for groupsummary in the documentation.

https://www.mathworks.com/help/matlab/ref/double.groupsummary.html

Example

>> T.month = month(T.ObservationStart);
>> groupsummary(T(:,{'TOA','month'}), 'month', 'mean')
% Result (first 3 rows)
% ans =
%   12×3 table
%     month    GroupCount    mean_TOA
%     _____    __________    ________
%       1        11904        175.52 
%       2        10848        237.48 
%       3        11904        321.15 

But this requires you to add an additional column ('month') to an already large table and the data in that column is redundant.

Sofia Fourkioti le 11 Mai 2020

Thank you for your help!!

Connectez-vous pour commenter.

Answer 2

dpb le 4 Mai 2020

Ouvrir dans MATLAB Online

0 votes

Attach text that folks can do something with instead of pictures...or just attach the .csv file itself.

Use split to break the two times apart going into datetime

>> datetime(split(TSTR,'/').','InputFormat','uuuu-MM-dd''T''HH:mm:ss.S')
ans = 
  1×2 datetime array
   01-Jan-2004 00:00:00   01-Jan-2004 01:00:00
>> 

where I just copied your first line above into TSTR, split it into the two sections and returned that as the 2-column cell array to pass to datetime.

Once you have datetimes, put those into a table or timetable and use the selection on those datetime values.

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Sofia Fourkioti le 4 Mai 2020

Modifié(e) : Sofia Fourkioti le 4 Mai 2020

Thank you!I have attached my .csv in the comments.

I have already split my data, but this is not my problem. Is there a way to locate this data and find their exact position in the .csv?

Connectez-vous pour commenter.

Multiple selection of data (dates,months,days,hours)

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Plus de réponses (1)

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

Multiple selection of data (dates,mon​ths,days,h​ours)

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

12 commentaires Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

Plus de réponses (1)

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

Multiple selection of data (dates,months,days,hours)

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

12 commentaires
Afficher 10 commentaires plus anciens Masquer 10 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens