Using textscan to skip specific rows

Question

0 votes

Right now I'm interested in using textscan to read specific rows of a .csv file which looks something like this:

 type    value    latency
 92      trigger  19593
 type    value    latency
 20      trigger  20000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000
 type    value    latency
 14      trigger  21000

I want to have textscan read the rows which have the values "92" and "19593", but to ignore the rows that are blank or the rows that have the words "type" "value" "latency". In addition, the documents I am scanning are of varying lengths, and I want textscan to automatically figure out which rows have the values of interest "92, 20, 14" and "19593" (latency values). How would I have textscan automatically scan the numerical values and ignore all other rows? In addition, I am not interested in scanning the column "value" at all. I only need to scan the two columns "type" and "latency".

Thank you in advance.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Stephen23 le 8 Juin 2017

Modifié(e) : Stephen23 le 8 Juin 2017

Ouvrir dans MATLAB Online

0 votes

events file.csv

Second attempt: this time we will first convert the awful file format into a standard CSV format with one header line, no empty rows, no quoted numbers, and no repeated headers. I used regexprep and simply defined appropriate regular expressions:

fnm = 'events file.csv';
str = fileread(fnm);
rgx = {'\s*\n\s*"type"[a-z",]+','"(\d+)\s*"'};
rep = {'','$1'};
str = regexprep(str,rgx,rep);

You could also save str if you want an improved CSV file. Now we can easily extract the data from this improved CSV string:

fmt = '%f%*q%f%f%f';
opt = {'HeaderLines',1, 'CollectOutput',true, 'Delimiter',','};
C = textscan(str,fmt,opt{:});
M = C{1}

giving:

M =
        92       14287           0           1
        20       16459           0           2
        14       19147           0           3
        14       20646           0           4
        14       22047           0           5
        14       24431           0           6
        14       26844           0           7
        14       28819           0           8
        14       30071           0           9
        14       31843           0          10
        14       34788           0          11
        14       36883           0          12
        14       38439           0          13
        14       39571           0          14
        14       40691           0          15
        14       41368           0          16
        14       42443           0          17
        20       42708           0          18
        14       44955           0          19
        14       46092           0          20
        14       47771           0          21
        14       48632           0          22
        14       52780           0          23
        14       54312           0          24
        14       56616           0          25
        14       58724           0          26
        14       59831           0          27
        14       61911           0          28
        14       62859           0          29
        14       64884           0          30
        14       67534           0          31
        14       68367           0          32
        14       70052           0          33
        20       70311           0          34
        14       72450           0          35
        14       73476           0          36
        14       75143           0          37
        14       76615           0          38
        20       76622           0          39

Tested on this file:

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Darren Kenney le 8 Juin 2017

This worked. Thank you very much!

Connectez-vous pour commenter.

Answer 2

Stephen23 le 6 Juin 2017

Modifié(e) : Stephen23 le 7 Juin 2017

Ouvrir dans MATLAB Online

1 vote

temp2.txt

One approach is to play with the EndOfLine character (a little hacky, I know):

opt = {'MultipleDelimsAsOne',true, 'EndOfLine','p',...
    'CollectOutput',true, 'HeaderLines',1};
fmt = '%*s%*s%*s%f%*s%f';
fid = fopen('temp2.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}(1:2:end,:);

giving:

>> M
M =
        92       19593
        20       20000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000
        14       21000

As long as the file is not huge then this will be a reasonably effective method, although it does create a larger output matrix than required.

Note that your comment that the file "looks something like this" and the misleading "a .csv file" (whereas there is not a single comma anywhere in your example data) means that you are not providing an accurate enough specification. Reading files requires a very precise specification: every newline, every separator, non-printable characters, trailing spaces, trailing delimiters, repeated space characters, etc, etc, makes a huge difference to how a file is read. Because you did not provide us with a sample file I created my own one to test my code on, available here:

9 commentaires
Afficher 7 commentaires plus anciens Masquer 7 commentaires plus anciens

Stephen23 le 7 Juin 2017

Modifié(e) : Stephen23 le 7 Juin 2017

Ouvrir dans MATLAB Online

So, having looked at your file, it turns out that your description was incorrect after all. In particular:

All of the strings and some of the numeric values are double quoted, which in the world of CSV files indicates strings (and particular strings that might contain spaces or commas).
It has a different number of columns.
It has trailing commas (a trailing empty field).
Oh, and the "blank" lines do not contain any commas at all (no field delimiters).

Well, one I had the real file to work with it was easy to make my code work. Keep in mind that this is a vary hacky solution, and what you really should do is improve whatever it is that writes that awful file. The best solution is to fix the problem at the source (bad file formatting), rather than trying to mess around and fix it later with hack code.

PS: do not put spaces in the format string.

opt = {'MultipleDelimsAsOne',true, 'CollectOutput',true,...
    'EndOfLine','p', 'HeaderLines',1, 'Delimiter','\n\r\t ,"'};
fmt = '%*s%*s%*s%*s%*s%s%*s%f%f%f';
fid = fopen('events file.csv','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{2}(1:2:end,:);
S = C{1}(1:2:end,:);
Z = [str2double(S),M]

giving:

Z =
          92       14287           0           1
          20       16459           0           2
          14       19147           0           3
          14       20646           0           4
          14       22047           0           5
          14       24431           0           6
          14       26844           0           7
          14       28819           0           8
          14       30071           0           9
          14       31843           0          10
          14       34788           0          11
          14       36883           0          12
          14       38439           0          13
          14       39571           0          14
          14       40691           0          15
          14       41368           0          16
          14       42443           0          17
          20       42708           0          18
          14       44955           0          19
          14       46092           0          20
          14       47771           0          21
          14       48632           0          22
          14       52780           0          23
          14       54312           0          24
          14       56616           0          25
          14       58724           0          26
          14       59831           0          27
          14       61911           0          28
          14       62859           0          29
          14       64884           0          30
          14       67534           0          31
          14       68367           0          32
          14       70052           0          33
          20       70311           0          34
          14       72450           0          35
          14       73476           0          36
          14       75143           0          37
          14       76615           0          38
          20       76622           0          39

Darren Kenney le 7 Juin 2017

Not large enough to crash the program (40 lines for each file), and 2012. I fixed the problem by changing the 'delimiter' from '\n\r\t ,"' to ','

Darren Kenney le 7 Juin 2017

Although, now the leftmost column displays all numbers as NaN which is what I assume you were trying to work around with '\n\r\t ,"'?

Connectez-vous pour commenter.

Using textscan to skip specific rows

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

9 commentaires
Afficher 7 commentaires plus anciens Masquer 7 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

Using textscan to skip specific rows

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Plus de réponses (1)

9 commentaires Afficher 7 commentaires plus anciens Masquer 7 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

9 commentaires
Afficher 7 commentaires plus anciens Masquer 7 commentaires plus anciens