Import text file with blank lines. Matlab not replacing them with NaN

Matlab is not replacing my blank lines in my txt file with NaN but just joins all the data together. Unfortunately I need to data in the exact order it is as each line is a unique timestamp but the times are do not come in the txt file.
Any ideas? Tried importdata and textscan with no luck. Using R2014b

 Réponse acceptée

per isakson
per isakson le 29 Jan 2015
Modifié(e) : per isakson le 30 Jan 2015
Remains (at least) two possibilities
  • a loop over fgetl
  • read the file as one string, replace empty lines by 'nan nan ... ' and parse with textscan
Example (R2013a)
>> cac = cssm;
>> cac{:}
ans =
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
where
function cac = cssm
str = fileread('cssm.txt');
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)', 'nan nan nan nan');
cac = textscan( str, '%f%f%f%f', 'CollectOutput', true );
end
and where cssm.txt contains
16 2 3 13
5 11 10 8
9 7 6 12
4 14 15 1
&nbsp
Replace
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)', 'nan nan nan nan');
by
str = regexprep( str, '(?<=\r?\n)[ ]*(?=\r?\n)' ...
, 'nan nan nan nan', 'emptymatch' );
to handle empty lines

10 commentaires

Requires at least one char(32) in "empty" lines
Hi Per,
I would replace your pattern by
'(?<=(\r?\n|^))\s*(?=(\r?\n|$))'
to capture potential tabs or spaces on empty lines as well as cases where the file starts or ends with an empty line.
Hi Cedric,
Regular expressions require thorough testing. You propose three modifications
  • add ^| to handle leading empty lines. Yes, I agree.
  • replace [ ]* by \s* to handle potential delimiter on empty lines. OK, but there is one problem, \s* matches new-line. Replacing by \s*? (lazy) solves that.
  • add $| to handle trailing empty lines. No, because one occurrence of new-line at the end of the file does not indicate an empty line. Interactively created files may or may not have new-line at the end of the last line. Automatically created files "always" have new-line at the end of the last line.
My new expression is
'(?<=\r?\n|^)\s*?(?=\r?\n)'
Cedric
Cedric le 29 Jan 2015
Modifié(e) : Cedric le 29 Jan 2015
I agree about point 3, but for point 2: \s* should not match the new-line when it is matched by the look forward. In that case, the pattern is matched because the 0 occurrence defined by the * is verified.
>> regexprep( 'ab', '(?<=a)b*(?=b)', 'z', 'emptymatch' )
ans =
azb
Hi Cedric,
To illustrate how I think, I have created three text files, cssm_0.txt, cssm_1.txt, cssm_2.txt, with zero, one and two empty lines at the end, respectively. The image are clips of the files in NotePad++.
&nbsp
With the expression
'(?<=\r?\n|^)\s*?(?=\r?\n)'
I get the results below
>> clear all,cac = cssm('cssm_0.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
>> clear all,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
>> clear all,cac = cssm('cssm_2.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
NaN NaN NaN NaN
>>
\s* or \s*?
I can reproduce your example
>> regexprep( 'ab', '(?<=a)b*(?=b)', 'z', 'emptymatch' )
ans =
azb
and the lazy ? doesn't hurt
>> regexprep( 'ab', '(?<=a)b*?(?=b)', 'z', 'emptymatch' )
ans =
azb
However it doesn't work with the string from the text file. With the expression
'(?<=\r?\n|^)\s*(?=\r?\n)'
I get
>> clear all,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
and with the expression
'(?<=\r\n|^)\s*(?=\r\n)'
I get
>> clear all, clear classes,cac = cssm('cssm_1.txt');cac{:}
ans =
NaN NaN NaN NaN
16 2 3 13
5 11 10 8
NaN NaN NaN NaN
NaN NaN NaN NaN
9 7 6 12
4 14 15 1
NaN NaN NaN NaN
The problem is with the "?" in \r?\n - I think. In this context "\s*" matches "\r" and the look-ahead is happy with "\n". With "\s*?" the "\r" goes to the look-ahead.
I used "\r*\n" in the first place to match both the DOS and the Windows style of new-line.
Thank you for the illustration, I will have a look in half an hour!
Cedric
Cedric le 30 Jan 2015
Modifié(e) : Cedric le 30 Jan 2015
Hi Per,
Thank you for all the illustrations, I agree with all of your conclusions! After spending quite a bit of time working on alternate approaches based on tokens (which happens to be a little bit slower ultimately), I just realize that we don't need to match the eventual \r in the look behind.
Thanks a lot guys!!
Hi Cedric,
You are right, "\r" is not needed in the "look behind". And possibly, it saves on execution time to exclude it.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by