Right now I'm interested in using textscan to read specific rows of a .csv file which looks something like this:
type value latency
92 trigger 19593
type value latency
20 trigger 20000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
type value latency
14 trigger 21000
I want to have textscan read the rows which have the values "92" and "19593", but to ignore the rows that are blank or the rows that have the words "type" "value" "latency". In addition, the documents I am scanning are of varying lengths, and I want textscan to automatically figure out which rows have the values of interest "92, 20, 14" and "19593" (latency values). How would I have textscan automatically scan the numerical values and ignore all other rows? In addition, I am not interested in scanning the column "value" at all. I only need to scan the two columns "type" and "latency".
Thank you in advance.

 Réponse acceptée

Stephen23
Stephen23 le 8 Juin 2017
Modifié(e) : Stephen23 le 8 Juin 2017
Second attempt: this time we will first convert the awful file format into a standard CSV format with one header line, no empty rows, no quoted numbers, and no repeated headers. I used regexprep and simply defined appropriate regular expressions:
fnm = 'events file.csv';
str = fileread(fnm);
rgx = {'\s*\n\s*"type"[a-z",]+','"(\d+)\s*"'};
rep = {'','$1'};
str = regexprep(str,rgx,rep);
You could also save str if you want an improved CSV file. Now we can easily extract the data from this improved CSV string:
fmt = '%f%*q%f%f%f';
opt = {'HeaderLines',1, 'CollectOutput',true, 'Delimiter',','};
C = textscan(str,fmt,opt{:});
M = C{1}
giving:
M =
92 14287 0 1
20 16459 0 2
14 19147 0 3
14 20646 0 4
14 22047 0 5
14 24431 0 6
14 26844 0 7
14 28819 0 8
14 30071 0 9
14 31843 0 10
14 34788 0 11
14 36883 0 12
14 38439 0 13
14 39571 0 14
14 40691 0 15
14 41368 0 16
14 42443 0 17
20 42708 0 18
14 44955 0 19
14 46092 0 20
14 47771 0 21
14 48632 0 22
14 52780 0 23
14 54312 0 24
14 56616 0 25
14 58724 0 26
14 59831 0 27
14 61911 0 28
14 62859 0 29
14 64884 0 30
14 67534 0 31
14 68367 0 32
14 70052 0 33
20 70311 0 34
14 72450 0 35
14 73476 0 36
14 75143 0 37
14 76615 0 38
20 76622 0 39
Tested on this file:

Plus de réponses (1)

Stephen23
Stephen23 le 6 Juin 2017
Modifié(e) : Stephen23 le 7 Juin 2017
One approach is to play with the EndOfLine character (a little hacky, I know):
opt = {'MultipleDelimsAsOne',true, 'EndOfLine','p',...
'CollectOutput',true, 'HeaderLines',1};
fmt = '%*s%*s%*s%f%*s%f';
fid = fopen('temp2.txt','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{1}(1:2:end,:);
giving:
>> M
M =
92 19593
20 20000
14 21000
14 21000
14 21000
14 21000
14 21000
14 21000
14 21000
As long as the file is not huge then this will be a reasonably effective method, although it does create a larger output matrix than required.
Note that your comment that the file "looks something like this" and the misleading "a .csv file" (whereas there is not a single comma anywhere in your example data) means that you are not providing an accurate enough specification. Reading files requires a very precise specification: every newline, every separator, non-printable characters, trailing spaces, trailing delimiters, repeated space characters, etc, etc, makes a huge difference to how a file is read. Because you did not provide us with a sample file I created my own one to test my code on, available here:

9 commentaires

That's how it looks when I open it up in excel, the .csv file would look like
type, value, latency
92, trigger, 19593
, , ,
type, value, latency
20, trigger, 20000
, , ,
etc
Stephen23
Stephen23 le 7 Juin 2017
@Darren Kenney: if you want me to try your file then please upload a real file by clicking the paperclip button. For the reasons given in my answer, I am not going to follow some vague description of a file: I have been down that path many times before, and it just ends up wasting everyone's time.
Darren Kenney
Darren Kenney le 7 Juin 2017
Ok I have attached the .csv to this comment
I have written this code, but it isn't returning the numbers I want it to
specs = {'MultipleDelimsAsOne', true, 'delimiter', ',', 'EndOfLine, 'p', 'CollectOutput', true, 'HeaderLines', 1};
fileID = fopen(eeg_file, 'rt');
eeg_events_raw = textscan(fileID, '%*s %*s %*s %*s %*s %d %*s %d %*d %*d', specs {:});
eeg_events = eeg_events_raw{1}(1:2:end,:);
Stephen23
Stephen23 le 7 Juin 2017
Modifié(e) : Stephen23 le 7 Juin 2017
So, having looked at your file, it turns out that your description was incorrect after all. In particular:
  • All of the strings and some of the numeric values are double quoted, which in the world of CSV files indicates strings (and particular strings that might contain spaces or commas).
  • It has a different number of columns.
  • It has trailing commas (a trailing empty field).
  • Oh, and the "blank" lines do not contain any commas at all (no field delimiters).
Well, one I had the real file to work with it was easy to make my code work. Keep in mind that this is a vary hacky solution, and what you really should do is improve whatever it is that writes that awful file. The best solution is to fix the problem at the source (bad file formatting), rather than trying to mess around and fix it later with hack code.
PS: do not put spaces in the format string.
opt = {'MultipleDelimsAsOne',true, 'CollectOutput',true,...
'EndOfLine','p', 'HeaderLines',1, 'Delimiter','\n\r\t ,"'};
fmt = '%*s%*s%*s%*s%*s%s%*s%f%f%f';
fid = fopen('events file.csv','rt');
C = textscan(fid,fmt,opt{:});
fclose(fid);
M = C{2}(1:2:end,:);
S = C{1}(1:2:end,:);
Z = [str2double(S),M]
giving:
Z =
92 14287 0 1
20 16459 0 2
14 19147 0 3
14 20646 0 4
14 22047 0 5
14 24431 0 6
14 26844 0 7
14 28819 0 8
14 30071 0 9
14 31843 0 10
14 34788 0 11
14 36883 0 12
14 38439 0 13
14 39571 0 14
14 40691 0 15
14 41368 0 16
14 42443 0 17
20 42708 0 18
14 44955 0 19
14 46092 0 20
14 47771 0 21
14 48632 0 22
14 52780 0 23
14 54312 0 24
14 56616 0 25
14 58724 0 26
14 59831 0 27
14 61911 0 28
14 62859 0 29
14 64884 0 30
14 67534 0 31
14 68367 0 32
14 70052 0 33
20 70311 0 34
14 72450 0 35
14 73476 0 36
14 75143 0 37
14 76615 0 38
20 76622 0 39
Ok thanks for your help. I don't need the two columns on the right in that output. Also When I try to evaluate the line
C = textscan(fid,fmt,opt{:});
matlab does not respond until I have to kill the program myself.
Stephen23
Stephen23 le 7 Juin 2017
@Darren Kenney: how big is the actual data file? What MATLAB version are you using?
Darren Kenney
Darren Kenney le 7 Juin 2017
Not large enough to crash the program (40 lines for each file), and 2012. I fixed the problem by changing the 'delimiter' from '\n\r\t ,"' to ','
Darren Kenney
Darren Kenney le 7 Juin 2017
Although, now the leftmost column displays all numbers as NaN which is what I assume you were trying to work around with '\n\r\t ,"'?

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by