Using TEXTSCAN to import an ASCII file with a header and blank lines between different data sets

Question

Kristia le 27 Mar 2013

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/68846-using-textscan-to-import-an-ascii-file-with-a-header-and-blank-lines-between-different-data-sets

Réponse apportée : Gabriel Felix le 24 Mai 2020

I have several text files that represent a house and each file has several data sets that represent a room within the house.

The text file looks similar to the following but a majority of the data has been deleted. Each zone has 1440 lines of data and each house has a different number of zones:

 project:  House1_1                              Tue Mar 19 12:30:42 2013
 description:  
 date   time    time       Ozone
       of day    [s]     [kg/kg]
 level: firstfloor    zone: bedroom1
 Jan01 00:00:00     0  0.000e+000
 Jan01 00:01:00    60  1.487e-009
 Jan01 00:02:00   120  5.330e-009
 Jan01 00:03:00   180  1.084e-008
 Jan01 23:57:00 86220  1.575e-007
 Jan01 23:58:00 86280  1.575e-007
 Jan01 23:59:00 86340  1.575e-007
 Jan01 24:00:00 86400  1.575e-007
 level: firstfloor    zone: kitchen
 Jan01 00:00:00     0  0.000e+000
 Jan01 00:01:00    60  1.483e-009
 Jan01 00:02:00   120  5.315e-009
 Jan01 00:03:00   180  1.081e-008
 Jan01 23:57:00 86220  1.564e-007
 Jan01 23:58:00 86280  1.564e-007
 Jan01 23:59:00 86340  1.564e-007
 Jan01 24:00:00 86400  1.564e-007
 level: firstfloor    zone: bedroom2
 Jan01 00:00:00     0  0.000e+000
 Jan01 00:01:00    60  1.486e-009
 Jan01 00:02:00   120  5.321e-009
 Jan01 00:03:00   180  1.081e-008
 Jan01 23:57:00 86220  1.549e-007
 Jan01 23:58:00 86280  1.549e-007
 Jan01 23:59:00 86340  1.549e-007
 Jan01 24:00:00 86400  1.549e-007

The final goal is to generate a graph of ozone concentration versus time for each house that contains all of the zones for that house. Presently I am having trouble importing the data. I can use the following code to open the first zone in one file. I only need the data from the fourth column. I do not need the first 9 lines (header info) or the 3 lines in between zones but I need the data for each zone to be its own data set.

fid=fopen('House1-1.txt');
temp=textscan(fid,'%*s %*s %*d %f','Headerlines',9);
fclose(fid);

I can not figure out how to create a loop to read to the end of each file and get the data for each zone into its own array. I also need the loop to read each house file within the folder. Any help would be appreciated.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Cedric le 27 Mar 2013

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/68846-using-textscan-to-import-an-ascii-file-with-a-header-and-blank-lines-between-different-data-sets#answer_80186

Modifié(e) : Cedric le 27 Mar 2013

Ouvrir dans MATLAB Online

An alternative could be to use REGEXP to get blocks of data, e.g. in a struct array, and then post-process the content. To illustrate using the content that you gave:

 >> buffer  = fileread('myData.txt') ;
 >> pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
 >> blocks = regexp(buffer, pattern, 'names' )
 blocks = 
 1x3 struct array with fields:
    level
    zone
    data
 >> blocks(1)
 ans = 
    level: 'firstfloor'
     zone: 'bedroom1'
     data: [1x282 char]
 >> blocks(2)
 ans = 
    level: 'firstfloor'
     zone: 'kitchen'
     data: [1x282 char]
 >> blocks(3)
 ans = 
    level: 'firstfloor'
     zone: 'bedroom2'
     data: [1x277 char]

So, using a simple loop, you can process all blocks already parsed:

 for k = 1 : length(blocks)
    fprintf('Level = %s, zone = %s\n', blocks(k).level, blocks(k).zone) ;
    ... do something, e.g. with textscan, on blocks(k).data
 end

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

per isakson le 27 Mar 2013

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/68846-using-textscan-to-import-an-ascii-file-with-a-header-and-blank-lines-between-different-data-sets#answer_80181

Modifié(e) : per isakson le 27 Mar 2013

Ouvrir dans MATLAB Online

Here is one of many alternate solutions.

    >> [ header, block_head, block_data ] = cssm()
    header = 
        ' project:  House1_1                              Tue Mar 19 12:30:42 2013'
        ''
        ' description:  '
        ''
        ' date   time    time       Ozone'
        '       of day    [s]     [kg/kg]'
        ''
    block_head = 
        ' level: firstfloor    zone: bedroom1'
        'zone: kitchen'
        'zone: bedroom2'
    block_data = 
        [8x1 double]
        [8x1 double]
        [8x1 double]
    >>

The values of block_head are obviously corrupted.

where cssm is

    function [ header, block_head, block_data ] = cssm()
        fid = fopen( 'cssm.txt' );
    %   cac = textscan( fid, '%[^\n]' ); swallows empty lines
        cac = textscan( fid, '%s', 'Delimiter', '\n' );
        fclose( fid );
        ixs = find( strncmp( 'level:', cac{:}, 6 ) );
        fid = fopen( 'cssm.txt' );
        header = cell( ixs(1)-1, 1 ); 
        for ii = 1 : ixs(1)-1
            header{ii} = fgetl( fid );
        end
        nnblock     = numel( ixs );
        ixs(end+1)  = size( cac{:}, 1 );
        block_head  = cell( nnblock, 1 );
        block_data  = cell( nnblock, 1 );
        for iib = 1 : nnblock
            block_head{iib} = fgetl( fid );
            block_data(iib) = textscan(fid,'%*s%*s%*d%f', ixs(iib+1)-ixs(iib) );   
        end
        fclose( fid );
    end

and cssm.txt consist of the data line in your question.

.

Next try without reading block_head:

    >> [ header, block_head, block_data ] = cssm()
    header = 
        ' project:  House1_1                              Tue Mar 19 12:30:42 2013'
        ''
        ' description:  '
        ''
        ' date   time    time       Ozone'
        '       of day    [s]     [kg/kg]'
    block_head = 
        []
        []
        []
    block_data = 
        [8x1 double]
        [8x1 double]
        [8x1 double]

where cssm is

    function [ header, block_head, block_data ] = cssm()
        fid = fopen( 'cssm.txt' );
    %   cac = textscan( fid, '%[^\n]' ); swallows empty lines
        cac = textscan( fid, '%s', 'Delimiter', '\n' );
        fclose( fid );
        ixs = find( strncmp( 'level:', cac{:}, 6 ) );
        fid = fopen( 'cssm.txt' );
        header = cell( ixs(1)-2, 1 ); 
        for ii = 1 : ixs(1)-2
            header{ii} = fgetl( fid );
        end
        nnblock     = numel( ixs );
        ixs(end+1)  = size( cac{:}, 1 ) + 2;
        block_head  = cell( nnblock, 1 );
        block_data  = cell( nnblock, 1 );
        for iib = 1 : nnblock
            block_data(iib) = textscan( fid, '%*s%*s%*d%f'      ...
                                    ,   ixs(iib+1)-ixs(iib)-3   ...
                                    ,   'Headerlines', 3        );   
        end
        fclose( fid );
    end

.

One more try:

    >> [ header, block_head, block_data ] = cssm()
    header = 
        ' project:  House1_1                              Tue Mar 19 12:30:42 2013'
        ''
        ' description:  '
        ''
        ' date   time    time       Ozone'
        '       of day    [s]     [kg/kg]'
    block_head = 
        {3x1 cell}
        {3x1 cell}
        {3x1 cell}
    block_data = 
        [8x1 double]
        [8x1 double]
        [8x1 double]
    >> block_head{1}
    ans = 
        ''
        'level: firstfloor    zone: bedroom1'
        ''
    >> block_head{2}
    ans = 
        ''
        ''
        'level: firstfloor    zone: kitchen'
    >> block_head{3}
    ans = 
        ''
        ''
        'level: firstfloor    zone: bedroom2'

block_head contains two successive empty "lines" in block_head 2 and 3. However, the data file does nowhere display an empty line after another empty line. I find this strange.

where

    function [ header, block_head, block_data ] = cssm()
        fid = fopen( 'cssm.txt' );
    %   cac = textscan( fid, '%[^\n]' ); swallows empty lines
        cac = textscan( fid, '%s', 'Delimiter', '\n' );
        fclose( fid );
        ixs = find( strncmp( 'level:', cac{:}, 6 ) );
        fid = fopen( 'cssm.txt' );
        header = cell( ixs(1)-2, 1 ); 
        for ii = 1 : ixs(1)-2
            header{ii} = fgetl( fid );
        end
        nnblock     = numel( ixs );
        ixs(end+1)  = size( cac{:}, 1 ) + 2;
        block_head  = cell( nnblock, 1 );
        block_data  = cell( nnblock, 1 );
        for iib = 1 : nnblock
            block_head(iib) = textscan( fid, '%s', 3, 'Delimiter', '\n' ); 
            block_data(iib) = textscan( fid, '%*s%*s%*d%f'      ...
                                    ,   ixs(iib+1)-ixs(iib)-3   ...
                                    ,   'Headerlines', 0        );   
        end
        fclose( fid );
    end

.

Discussion:

There must be a better way to handle empty lines.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 3

Kristia le 27 Mar 2013

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/68846-using-textscan-to-import-an-ascii-file-with-a-header-and-blank-lines-between-different-data-sets#answer_80188

Ouvrir dans MATLAB Online

Thanks for the suggestions but I am really confused by both of the answers posted so far. The first answer from Per Isakson keeps giving me error messages such as unexpected MATLAB operator or Function definitions are not permitted in this context and the second answer from Cedric Wannaz gives me the unexpected MATLAB operator error as well. All of the errors are occurring from the first line of code.

I am really new to MATLAB so I am not completely sure what either of these codes are doing so I really don't know what I am doing wrong when I try to put it into my MATLAB but I can't get past the first line of either suggestion.

I did find that I can also use dlmread to import the first data set but again I don't know how to get to the rest of the data using a loop. The following is what I did using dlmread:

M=dlmread('House1-1.txt','',[9 3 1449 3]);

Doing this gives me all of the column 4 data for zone 1. I am not sure if there is a way to loop this and tell it to skip the next 3 rows and then import the next 1440 rows to give me the data for zone 2 and repeat again until the end of the file but something like this is what I need. I only need the 4th column of data (the ozone concentration). I do not need any of the header information imported.

From everything that I have read it seems dlmread should not work at all and that I need to use textscan but I got the above line to import the data. I was also able to change the range to get the 4th column data for the other zones as well but I can't put it into a loop.

Thanks for the help!

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

Cedric le 27 Mar 2013

Modifié(e) : Cedric le 29 Mar 2013

Ouvrir dans MATLAB Online

Don't copy the >&gt from our code; they represent the prompt in the command window. For my code, execute the following

 buffer  = fileread('House1-1.txt') ;
 pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
 blocks  = regexp(buffer, pattern, 'names') ;

If it works, the variable blocks is a struct array, which is an array of structs (variables with fields).

length(blocks)

will give you the number of structs present in the array, and you can do as follows for accessing e.g. the field level of struct 1:

blocks(1).level

There are two other fields: zone and data. EDIT: You can process data as follows:

D = textscan(blocks(1).data, '%s %d:%d:%d %d %f') ;

and you will see that D is a cell array that contains the data parsed.

The issue if you are just beginning with MATLAB is that you are dealing with a file that has a 2 levels structure, which is not the easiest thing to manage.

Per's solution is the standard approach I would say for files with some structure. My approach is based on pattern matching (using regular expressions); it is less standard for files with some structure, but I thought that the outcome of REGEXP would be simpler for you to process (I'm not sure about that though).

=== EDIT ===

Here is a more complete (working) example..

 buffer  = fileread('House1-1.txt') ;
 pattern = 'level:\s*(?<level>\S+)\s+zone:\s*(?<zone>\S+)\s*(?<data>.*?)(?=($|level))' ;
 blocks  = regexp(buffer, pattern, 'names') ;
 for k = 1 : length(blocks)
    D = textscan(blocks(k).data, '%s %d:%d:%d %d %f') ;
    figure(k) ;
    plot(D{5}, D{6}) ;
    grid on ;
    title(sprintf('Level = %s, zone = %s\n', blocks(k).level, blocks(k).zone));
    xlabel('Time [s]') ;
    ylabel('Ozone [kg/kg]') ;
 end

But again, it won't be simple if you just started MATLAB, as it mixes regular expressions, struct arrays, cell arrays, etc.

Kristia le 1 Avr 2013

I did copy your edit and it works great! Thanks again!!

Cedric le 1 Avr 2013

Modifié(e) : Cedric le 1 Avr 2013

You're welcome! Don't forget to [ Accept ] one of the answers if it helped, and if you accept mine, don't forget to /\ vote for Per Isakson's answer as well, because he took time to write and test a quite complete answer that is indeed the standard way for processing this kind of file structure (my answer is more compact, but less standard).

Connectez-vous pour commenter.

Answer 4

Gabriel Felix le 24 Mai 2020

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/68846-using-textscan-to-import-an-ascii-file-with-a-header-and-blank-lines-between-different-data-sets#answer_437753

Ouvrir dans MATLAB Online

I had to use \n at the end of each line. Without it I couldn't make textscan() work properly, even thoug the "HeaderLines" was configured according to the text file lines. This was the only solution I found after struggling with the code for an intire day.

This was the text:

!
!
! alfa (graus) =  5.0
!
! Id.      x/s       z/s     alfai       cl     c*cl/cmed    cdi      cmc/4
!                           (graus)
   1      .246      .050    -1.209      .255      .332     .00538     .0170
   2      .292      .150    -1.098      .259      .319     .00496     .0545
   3      .339      .250     -.925      .254      .297     .00410     .0944
   4      .385      .350     -.741      .243      .268     .00315     .1341
   5      .432      .450     -.561      .227      .235     .00223     .1714
   6      .479      .550     -.393      .206      .199     .00141     .2034
   7      .525      .650     -.238      .181      .163     .00075     .2266
   8      .572      .750     -.101      .152      .126     .00027     .2362
   9      .619      .850      .014      .116      .089    -.00003     .2236
  10      .659      .938      .103      .074      .052    -.00013     .1693
!
! CL asa    =  .208
! CDi asa   =  .00258
! e (%)     =  88.9
! CMc/4 asa =  .1339

My code:

%! alfa (graus) =  5.0
P = textscan(fid,'! alfa (graus) = %f','Delimiter',' ','MultipleDelimsAsOne',true,'headerLines',2,'CollectOutput',1);
alpha(1) = P{1};
%! CL asa    =  .208
P = textscan(fid,'! CL asa = %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerLines',4+n);
CL(1) = P{1};
%! CDi asa   =  .00258
P = textscan(fid,'! CDi asa   =  %f\n','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'headerlines',0);
CDi(1) = P{1};
%! CMc/4 asa =  .1339
P = textscan(fid,'! CMc/4 asa = %f','Delimiter',' ','MultipleDelimsAsOne',true,'CollectOutput',1,'HeaderLines',2);
Cmc4(1) = P{1};

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Using TEXTSCAN to import an ASCII file with a header and blank lines between different data sets

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (3)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

Using TEXTSCAN to import an ASCII file with a header and blank lines between different data sets

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (3)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

6 commentaires Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

6 commentaires
Afficher 4 commentaires plus anciensMasquer 4 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens