Importing file with text and numbers

Question

0 votes

Hi,

I'm trying to load a text file that contains both text and numbers into matlab. The first few lines of the text are shown below:

<<Time = 0.0494352

Patch: waterFlow found on 1/1 processor(s)

Flux at waterFlow = -0.0125m^3/s [-750 l/min]

Patch: airFlowIn found on 1/1 processor(s)

Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]

Patch: outlet found on 1/1 processor(s)

Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]

Time = 0.0496235

Patch: waterFlow found on 1/1 processor(s)

Flux at waterFlow = -0.0125m^3/s [-750 l/min]

Patch: airFlowIn found on 1/1 processor(s)

Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]

Patch: outlet found on 1/1 processor(s)

Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]

Time = 0.0498117

Patch: waterFlow found on 1/1 processor(s)

Flux at waterFlow = -0.0125m^3/s [-750 l/min]

Patch: airFlowIn found on 1/1 processor(s)

Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]

Patch: outlet found on 1/1 processor(s)

Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]>>

This is a very long file where the data is given at each time step. I need to sort the time and the values for the fluxes. I tried textscan but it was unsuccessful.

I really appreciate any ideas and suggestions.

Thanks \Hale

2 commentaires
Afficher Aucune Masquer Aucune

dpb le 6 Juil 2013

Is the blank line between data records real or a figment of the cut'n paste operation?

Hale le 7 Juil 2013

Modifié(e) : Hale le 7 Juil 2013

http://s1286.photobucket.com/user/Hale110/media/data_zps7cabd012.png.html

There are only blank lines between the previous and the new time step as you can on the screen shot above.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Connectez-vous pour suivre l’activité

Answer 1

dpb le 6 Juil 2013

Ouvrir dans MATLAB Online

0 votes

OK, I don't have time to work thru a clever way at the moment but the following scans your sample file ok...

MATL
fid=fopen('hale.txt','rt');
t=[]; fw=[]; fa=[]; fo=[];
while ~feof(fid)
l=fgetl(fid);
if isempty(l),continue, end
if strfind(l,'Time'),      t=[ t;sscanf(l,'Time = %f')];end
if strfind(l,'Flux at w'),fw=[fw;sscanf(l,'Flux at waterFlow = %f')];end
if strfind(l,'Flux at a'),fa=[fa;sscanf(l,'Flux at airFlowIn = %f')];end
if strfind(l,'Flux at o'),fo=[fo;sscanf(l,'Flux at outlet = %f')];end
end
fid=fclose(fid);
%
dat=[t fw fa fo];
clear t fw fa fo;

To improve performance on large files if this is too slow preallocate a reasonable size for the accumulating arrays and increment the indices w/ a counter. Either make the size larger than any file you'll want to read and then truncate when done to final sizes or you'll have to check and reallocate if exceed the initial size.

You might also help the above just a little if you were to return the index of the strfind() and only parse the string pieces needed...oh! can do that anyway since is a fixed format--just count the location and put the proper start point in the sscanf string...let's see--as an example for Time it would look like

MATL
if strfind(l,'Time'), t=[t;sscanf(l(7:end),'Time = %f')];end

Looks like for the fluxes you can't count on the same number of digits so that you would need to use the location past the '=' as start and then find the 'm' of 'm^3' and use the location one shorter than that as the substring end. That would eliminate the internal error that happens now when the i/o conversion scans until it fails by giving it a fixed string to convert that is a valid fp number. I suspect that would be noticeable on large files.

Salt to suit... :)

regexp() can undoubtedly also be made to work; how it'll be on performance in comparison I don't know, I'm too weak w/ regexp that I'm not even agonna' try.

2 commentaires
Afficher Aucune Masquer Aucune

Hale le 7 Juil 2013

Modifié(e) : Hale le 7 Juil 2013

Thanks a lot for your detailed answer. My file contains about 17000 rows and the first way you suggested works actually very well. It takes about 4 seconds to get the data sorted.

dpb le 7 Juil 2013

Ouvrir dans MATLAB Online

Good...yeah, oftentimes on finds that the "deadahead" solution works well enough. I suspect if you were to preallocate you could get it down quite a bit more but 4 sec if that's the typical file size you'll be dealing with is probably acceptable.

But, it's pretty simple to implement...

MATL
...
N=20000;             % initial alloc size
d=zeros(N,4);
ix=0;
while ~feof(fid)
  l=fgetl(fid);
  if isempty(l),continue, end
  ix=ix+1; if ix>N, d=[d; zeros(N,4)]; N=N+N; end  
  if strfind(l,'Time'), d(ix,1)=[ t;sscanf(l,'Time = %f')];end
  if strfind(l,'Flux at w'),d(ix,2)=[fw;sscanf(l,'Flux at waterFlow = %f')];end
  ....etc...
end
d(ix+1:end,:)=[];  % clean up empty end...

Connectez-vous pour commenter.

Answer 2

the cyclist le 6 Juil 2013

0 votes

If you have a relatively recent release of MATLAB, you can use the Import Data tool that is found on the Home tab of the Command Window.

You can read about it (and all kinds of other options for importing data) here:

http://www.mathworks.com/help/matlab/import_export/recommended-methods-for-importing-data.html

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Miroslav Balda le 6 Juil 2013

0 votes

The prwvious answer gives a possible solution, however the function fgetl is rather slow. Maybe, the alternative way is in application of the function

ffread www.mathworks.com/matlabcentral/fileexchange/9034

The function serves for free-format reading of ascii files. The read lines can be analyzed after the file is read. Good luck.

Mira

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

dpb le 7 Juil 2013

I'm not sure what that particular FEX submission actually does, but one can read the whole file in one big slurp (assuming will all fit in memory) w/ fread() as character array and then only loop thru the records in memory is desired.

Connectez-vous pour commenter.

Answer 4

per isakson le 6 Juil 2013

Modifié(e) : per isakson le 6 Juil 2013

Ouvrir dans MATLAB Online

0 votes

If the file fits in memory this is one way to read it.

Maybe, '\r\n', needs to be replaced by '\n'. That depends the source of the file. Or replace '\r\n' by '[\r]*\n' to handle both cases with the same code.

Next step is to decide what data shall be kept and in what data structures.

Replace disp( ca2{jj} ) by code that parses one line at a time. See dpb's answer.

Try

    function cssm()
        str = fileread( 'blocks.txt' );
        ca1 = regexp( str, '\r\n(?=Time)', 'split' );
        len = length( ca1 );
    % use len to allocate memory for variables to store data.
        for ii = 1 : length( ca1 )
           ca2 = regexp( ca1{ii}, '\r\n', 'split' );
           for jj = 1 : length( ca2 )
               disp( ca2{jj} )
           end
        end
    end

returns

    Time = 0.0494352
    Patch: waterFlow found on 1/1 processor(s)
    Flux at waterFlow = -0.0125m^3/s [-750 l/min]
    Patch: airFlowIn found on 1/1 processor(s)
    Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
    Patch: outlet found on 1/1 processor(s)
    Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
    Time = 0.0496235
    Patch: waterFlow found on 1/1 processor(s)
    Flux at waterFlow = -0.0125m^3/s [-750 l/min]
    Patch: airFlowIn found on 1/1 processor(s)
    Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
    Patch: outlet found on 1/1 processor(s)
    Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]
    Time = 0.0498117
    Patch: waterFlow found on 1/1 processor(s)
    Flux at waterFlow = -0.0125m^3/s [-750 l/min]
    Patch: airFlowIn found on 1/1 processor(s)
    Flux at airFlowIn = 0.0125345m^3/s [752.073 l/min]
    Patch: outlet found on 1/1 processor(s)
    Flux at outlet = -3.45519e-05m^3/s [-2.07311 l/min]

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

per isakson le 8 Juil 2013

Modifié(e) : per isakson le 8 Juil 2013

Ouvrir dans MATLAB Online

The answers to a question will ideally provide a little "smorgasbord". I offer one small dish, without too much thought.

I hope that more than one reader will benefit from the "smorgasbord".

The doc of R2012a says:

    [...]To open files in text mode, attach the letter 't' to the permission,
     such as 'rt' or 'wt+'. For better performance, do not use text mode.[...]

A long time ago I ceased using the 't' because of the performance penalty. I've kind of forgotten that it exists.

dpb le 8 Juil 2013

Modifié(e) : dpb le 9 Juil 2013

Ouvrir dans MATLAB Online

Hmmm...R2012b (doc) says

To open files in text mode, attach the letter 't' to the permission, such as 'rt' or 'wt+'.

For better performance, do not use text mode. The following applies on Windows systems, in text mode: ...

This additional processing is unnecessary for most cases. All MATLAB import functions, and most text editors (including Microsoft Word and WordPad), recognize both '\r\n' and '\n' as newline sequences. However, when you create files for use in Microsoft Notepad, end each line with '\r\n'. ...

I have only recently been blessed by TMW w/ an update to 2012b (from R12) which doesn't have anything specific about the performance hit and has the warning

... To open in text mode, add "t" to the permission string, for example 'rt' and 'wt+'. (On Unix, text and binary mode are the same so this has no effect. But on PC systems this is critical.)

I'm of the age when it was indeed the case that much Windows software including my favorite programmers' editor didn't deal w/ the non-Windows \n sequence at all gracefully so I just continue to operate in that mode.

I guess I'll have to update my thinking/advice for Matlab specifically and let users run into their own quirks w/ other packages if they still aren't graceful.

I do see that TMW ought then to update the help text for fopen() to be more consistent as it still has the same verbiage as does R12.1 and no real indication of any real performance hit.

From R2012b session...

MATL
>> help fopen
fopen  Open file.
 ...
  You can open files in binary mode (the default) or in text mode.
  In binary mode, no characters get singled out for special treatment.
  In text mode on the PC, the carriage return character preceding
  a newline character is deleted on input and added before the newline
  character on output.  To open a file in text mode, append 't' to the
  permission string, for example 'rt' and 'w+t'.  (On Unix, text and
  binary mode are the same, so this has no effect.  On PC systems
  this is critical.)

So, I'll modify my warnings if TMW will fix help... :)

Connectez-vous pour commenter.

Importing file with text and numbers

2 commentaires
Afficher Aucune Masquer Aucune

Réponse acceptée

2 commentaires
Afficher Aucune Masquer Aucune

Plus de réponses (3)

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Catégories

Tags

Community Treasure Hunt

Importing file with text and numbers

2 commentaires Afficher Aucune Masquer Aucune

Réponse acceptée

2 commentaires Afficher Aucune Masquer Aucune

Plus de réponses (3)

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

5 commentaires Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens

Catégories

Tags

Voir également

Community Treasure Hunt

2 commentaires
Afficher Aucune Masquer Aucune

2 commentaires
Afficher Aucune Masquer Aucune

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

5 commentaires
Afficher 3 commentaires plus anciens Masquer 3 commentaires plus anciens