Skip Lines (other than the Header) when Importing CSV File

I have read a couple of entries about skipping header information when importing CSV files. While I don't fully understand them yet, I know that I'll also need to skip lines with text interspersed in with my data as well. How would I import a CSV file that has a Header but also includes lines of text between "blocks" of data? For instance, in the attached file, Lines 1-45 can be considered the "Header" and are easily skipped over. Lines 46-74 contain the actual data... skipping Lines 75-76... and then Lines 77-105 contain the next "block" of data. This pattern repeats and, depending on the length of file to be handled, could repeat a couple of thousand times (meaning could have around 2K "blocks" of data). I would like to be able to import the data blocks only so that I can do math (summing, averaging, max and min values) for specific "blocks" of data... I could do this in Excel, but I don't know how to automate the process without using Matlab. Any suggestions would be appreciated. Thank you.

Réponses (2)

per isakson
per isakson le 10 Fév 2014
Modifié(e) : per isakson le 11 Fév 2014
Are there any string values,which can be used as "Begin" and "End" of the blocks?
.
[The following day]
Try this
str = fileread('cssm.txt');
look_behind = '(?<=Frame \d{1,3}\s*\n)';
look_ahead = '(?=(\s*Frame \d{1,3}\s*)|(\s*$))';
expr2match = '[0-9\.\s]+?';
cac = regexp( str, [look_behind,expr2match,look_ahead], 'match' );
cac{3}
where cssm.txt contains
Frame 1
11.1 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 2
22.2 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
Frame 99
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
returns
ans =
33.3 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
.
To understand might take hours (or more) of reading and experimenting with regular expressions, especially "Lookaround Assertions" . However, it is worth the effort.
.
Use textscan to convert the text to numeric
buf = textscan( transpose(cac{3}), '%f%f%f%f', 'CollectOutput',true );
and
>> buf{1}
ans =
33.3000 2.0000 3.0000 13.0000
5.0000 11.0000 10.0000 8.0000
9.0000 7.0000 6.0000 12.0000
4.0000 14.0000 15.0000 1.0000

3 commentaires

joe
joe le 10 Fév 2014
Modifié(e) : joe le 10 Fév 2014
As a Begin... you could use "Frame "number" "... like "Frame 53" and then the next line would be the data... But the End is just a blank row separating the next block header "Frame "number" ". So, maybe the blank line?
EDIT: After looking at that link... I don't understand the example.
How is the "length" variable found?
What does this loop "do"?
_for ii = 1:length(flag)
tmp(strncmp(data, flag{ii}, length(flag{ii}))) = ii;
end_
per isakson
per isakson le 10 Fév 2014
Modifié(e) : per isakson le 11 Fév 2014
Here is an alternative value of look_behind, which is 'cleaner':
look_behind = '(?<=Frame \d{1,3}\s+)';
I had problems to make \d{1,3} match as many digit as possible, i.e. make it greedy. Next try
look_behind = '(?<=Frame \d++\s+)';
\d++ stands for all consecutive digits that there are (at that position)
per isakson
per isakson le 10 Fév 2014
Modifié(e) : per isakson le 11 Fév 2014
I didn't study the answer of Kelly. However, length is a function of Matlab

Connectez-vous pour commenter.

Ended up doing this
% code
fid = fopen(filename); %opens file of name
r = 1; %starts at 1
tline = fgets(fid); %reads first line of file of name
while(ischar(tline)) %while the first line of the file contains a character
if(isstrprop(tline(1), 'digit') || (tline(1) == 'B' && tline(2) == ',')) %if the first line of the file is a number or a 'B' or a ','... do...
tline = strrep(tline, 'B', '0'); %replace all "B's" with zeros for later mathmatical manipulation of data
%disp(r)
eval(['x = {' tline '}']); %this line takes the string (each line) of file and places in a cell 'x'
M(r,:) = x; %makes a row of the matrix M with x's
r = r + 1; %keeping track of what row in matrix M we are in
end
tline = fgets(fid); %error case
end
fclose(fid); %closing file
end
I'll admit I got some help with the while loop and the "eval" line.

Catégories

En savoir plus sur Large Files and Big Data dans Centre d'aide et File Exchange

Tags

Question posée :

joe
le 10 Fév 2014

Réponse apportée :

joe
le 11 Fév 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by