read empty line by textscan

Question

0 votes

Hi Everyone,

I am trying to organize a txt file with 12000 lines, which is too large to use readtable. And i choose to use textscan.

But the problem is textscan just skip all the empty lines, but i need to the exact lines number of certain element in the original file.

I searched a lot online but didn't help. i tried code like this to delete all whitespace but doesn't help.

default = textscan(fid,'%s%s','Delimiter','=','whitespace', '')

Thank you for your help!

2 commentaires
Afficher Aucune Masquer Aucune

Rik le 11 Avr 2019

Did you try either suggested solution? If you still have issues, we'll be happy to help.

Jeremy Hughes le 11 Avr 2019

I know someone has already added a solution, and it's a fine solution for what you're doing. But I'm surprised that READTABLE has a problem. Can you attach a sample?

12,000 lines isn't all that large especially if there are only two columns.

If you have 19a, you might also try:

M = readmatrix(filename,'OutputType','string','Delimiter','=','Whitespace','')

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Rik le 10 Avr 2019

Modifié(e) : Rik le 10 Avr 2019

Ouvrir dans MATLAB Online

2 votes

If your file doesn't contain any special characters, you could try fileread (which reads a file as one long char array), then split it with regexp. If you aren't sure about the encoding of special characters, you may consider my readfile function (which returns a cell array with 1 element per line, also for empty lines).

default = fileread(filename);
default = regexp(default,'\n','split');
%or:
default = readfile(filename);

The output of those two methods is equivalent if there are no special characters encoded in the file. The allowed characters are shown below. (readfile doesn't have this restriction)

% $%&'()*+,-./0123456789:;<=>?@
% ABCDEFGHIJKLMNOPQRSTUVWXYZ
% [\]^_`abcdefghijklmnopqrstuvwxyz{|}~

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Jeremy Hughes le 11 Avr 2019

Modifié(e) : Jeremy Hughes le 11 Avr 2019

default = regexp(default,'\n','split');

This won't work if there are \r\n windows new lines (or at least you'll have trailing \r characters.)

If you're using 16b or later, try:

https://www.mathworks.com/help/matlab/ref/splitlines.html

default = splitlines(default);

It's a little more robust, and since it has only one job to do, probably slightly faster than regexp.

Rik le 11 Avr 2019

Modifié(e) : Rik le 11 Avr 2019

Ouvrir dans MATLAB Online

To make the regexp splitting more robust (which will be in my nest version of readfile):

CRLF=[13 10];
CRLF=CRLF([any(default==13) any(default==10)]);
if isempty(CRLF),CRLF=10;end
default = regexp(default,CRLF,'split');

splitlines will probably be faster, while the code I showed here is backwards compatible to R14 (v7.0, which was when regexp was expanded to support outkeys).

Edit:

I just noticed I had this line already in my function:

str(str==13)='';

So readfile already splits it correctly for \r\n files.

Connectez-vous pour commenter.

Answer 2

Bob Thompson le 10 Avr 2019

Modifié(e) : Rik le 10 Avr 2019

Ouvrir dans MATLAB Online

0 votes

I'm going to guess that the extra lines are not consistent?

Generally, I would suggest reading the entire file in as one string, then splitting it at the new line characters. The exact coding may be a bit off from the below example, but it should put you on the right track.

default = textscan(fid,'%s'); % Read the file as one block
default = regexp(default,'\n','split'); % Split the string into multiple cells at each new line character

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Bob Thompson le 10 Avr 2019

Yes, I do. Thank you for catching that, I was using repmat for other things recently.

zhiwen wan le 11 Avr 2019

Thank you very much Bob, problem solved:)

Connectez-vous pour commenter.

read empty line by textscan

2 commentaires
Afficher Aucune Masquer Aucune

Réponse acceptée

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Plus de réponses (1)

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Community Treasure Hunt

read empty line by textscan

2 commentaires Afficher Aucune Masquer Aucune

Réponse acceptée

5 commentaires Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Plus de réponses (1)

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

2 commentaires
Afficher Aucune Masquer Aucune

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien