Preprocessing using readtable()
14 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Neko Benítez
le 18 Nov 2019
Commenté : Walter Roberson
le 18 Nov 2019
I have a CSV file that I read it as a table with 'readtable'. The original CSV file contains timestamps for a wide range of days. In most of the cases, I do not need that massive amount of timestamps available, because my analyze is centred in a shorter period. Let me put an example to explain better my problem:
The CSV file contains the timestamps in column 1. It starts at timestamp=1573377305, and it starts increasing the timestamps without a determined size, the next timestamp can be at 8 seconds, the third at 10 seconds and so on. What I know are the timestamps of my analysis, the beginning and the end, but I don't know the number of rows that correspond with that interval.
For instance: my timestamps
1573377305
1573377312
1573377326
1573377334
1573377349
1573377355
1573377365
1573377373
1573377386
1573377393
1573377404
1573377416
1573377427
1573377433
1573377445
1573377455
1573377465
1573377475
1573377485
Imagine that my timestamp for analysis is from 1573377326 to 1573377433. I don't want to read all the previous and end information with readtable(). In this case, I could do DataLines = [4 15], but it is an illustrative example. Imagine that you have much more than 20 timestamps, and what you know is the timestamp from beginning to end.
If I upload all the data with 'readtable()', it is inefficient as I am loading information that I am not going to use. How can I do to select the precise interval I am going to use before using readtable()? Or how can I do this process more efficient?
2 commentaires
Turlough Hughes
le 18 Nov 2019
Can you determine which lines of the file you want to read before reading it?
Réponse acceptée
Guillaume
le 18 Nov 2019
Assuming you don't know which rows of the text file correspond to your start and end timestamps, what you want cannot be achieved just ith readtable. There's no way to tell readtable to stop once a specific value has been encountered (even using detectImportOptions). The only way to tell readtable to stop at a specific line is with the DataLines property and you have to know in advance at which line to stop.
You could determine that line by parsing the file yourself line by line with textscan for example but then there'd be no point in using readtable after that.
Unless the file is significantly larger than your range of interest, it's simpler to just read the whole file and then use isbetween to keep the required range.
Otherwise, as said, you'll have to use textscan to parse the file line by line until you encounter your end timestamp. It's significantly more work than readtable, you lose automatic format detection, nice table formating, etc. and because you'll be reading the file line by line may actually be slower than reading the whole file in one go. It would go like this:
fmt = '%D%...'; %replace by actual format of your file. Possibly, you could use detectImportOptions for help here
data = {}
fid = fopen(yourfile, 'rt', 'n', yourfileencoding); %Possibly omit the encoding. readtable detect the file encoding for you.
while ~feof(fid)
line = textscan(fid, fmt, 1); %read one line
%assuming the timestamp is in first column
if line{1} > endtimestamp
break; %end reading
end
if line{1} >= starttimestamp
data = [data; line];
end
end
fclose(fid);
3 commentaires
Guillaume
le 18 Nov 2019
If your timestamps are read as datetime, you'd use something like:
selecteddata = data(isbetween(data.timestamp, starttimestamp, endtimestamp), :);
If it's just plain numbers:
selecteddata = data(data.timestamp >= starttimestamp & data.timestamp <= endtimestamp, :);
where data.timestamp refers to your timestamp column whatever its name is.
Note that if the timestamps are indeed plain numbers you may want to convert them to datetime (for easier manipulation). To tell you how to do that, we'd need to know what these numbers actually mean (maybe seconds since a specific date?)
Walter Roberson
le 18 Nov 2019
>> datetime(1573377305,'convertfrom','posixtime')
ans =
datetime
10-Nov-2019 09:15:05
Unfortunately datetime detection for readtable() and available options with detectImportOptions do not handle posixtime, so if you want datetime objects you would need to convert them after reading.
Plus de réponses (1)
Walter Roberson
le 18 Nov 2019
Use detectImportOptions() on the csv file, and store the result to a variable. Now modify the DataLines property of the import options to be the numbers of the rows you want to read in. Now readtable() passing in those import options.
Voir également
Catégories
En savoir plus sur Whos dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!