extracting the lines of interest to a matrix from a text

Question

Homayoon le 11 Juin 2015

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/223379-extracting-the-lines-of-interest-to-a-matrix-from-a-text

Modifié(e) : per isakson le 22 Juin 2015

Dear All, I have tried for about two hours but I could not figure what the problem is with the code. So sorry to repost it to the forum!

I have a huge text file in the following format:

 *********************************
 timestep           455
 No_Specs             3
 H2                  49
 H2O2                 1
 O2                  49
 *********************************
 timestep           460
 No_Specs             3
 H2                  49
 H2O2                 1
 O2                  49
 *********************************
 timestep           465
 No_Specs             2
 H2                  50
 O2                  50
 *********************************

As you can see the text file includes a lot of loops, each consisting of 4-10 lines. What I want is simply report the number written in front of timestep to the first column of a matrix. Also, I need to find 'HO2 ' [ To avoid confusion the extra space is needed ] for any of the loops and report the number in front of it in the second column of that matrix! Obviously if there is not any 'HO2 ' in a loop the associated number to that that row is zero!

Here is the code:

fid=fopen('fic.txt');
l=fgetl(fid);
k=1;
while ischar(l)
  r{k}=l;
  k=k+1;
  l=fgetl(fid);
end
fclose(fid);
idx=find(~cellfun(@isempty,regexp(r,'(?=timestep).+')));
a=regexp(r(idx),'\d+','match');
b=str2double([a{:}]);
ii=diff([idx numel(r)+1])-1;
for k=1:numel(b);
  s=r(idx(k)+1:ii(k));
  jj=find(~cellfun(@isempty,regexp(s,'(?=HO2 ).+')));
  c=regexp(s(jj),'\d+','match');
  if isempty(c)
      f(k)=0;
  else
      f(k)=str2double(c{1});
  end
end
M=[b' f']

Problem with the code is , the elements of the second column are all zero !!! I hope you might be able to help me! I appreciate your helps! Best

2 commentaires
Afficher AucuneMasquer Aucune

per isakson le 11 Juin 2015

Modifié(e) : per isakson le 11 Juin 2015

"number written in front of timestep to" &nbsp is that the number to the right of the string, timestep ?
What has xlswrite and fprintf to do with the question?

Why don't you provide a sample of how you want the result?

Homayoon le 11 Juin 2015

Ouvrir dans MATLAB Online

Dear Sir, Okay! for the example in the problem , the result for H2O2 should be in the following format:

     1
     1
     0

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

per isakson le 12 Juin 2015

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/223379-extracting-the-lines-of-interest-to-a-matrix-from-a-text#answer_182414

Modifié(e) : per isakson le 22 Juin 2015

Ouvrir dans MATLAB Online

An alternate approach. The function cssm transfers the entire content of the text file to a structure array. This structure is then used for reporting.

"a huge text file" this approach requires that the string content of the text file together with the structure fits in memory.

>> out = cssm()
out = 
3x1 struct array with fields:
    H2
    H2O2
    No_Specs
    O2
    timestep
>> for jj = 1 : 3, fprintf( '%8d%8d\n', out(jj).timestep, out(jj).H2O2 ), end
     455       1
     460       1
     465       0
>> permute( [ out.timestep; out.H2O2 ], [2,1] )
ans =
   455     1
   460     1
   465     0

where

function    out = cssm() 
      str = fileread( 'H2O2.txt' );
      section_separator = '[\*]{30,}';    % a row of at least 30 "*"
      cac = strsplit( str, section_separator              ...
                  ,   'DelimiterType', 'RegularExpression' );
      cac( cellfun( @isempty, cac ) ) = [];
      len   = length( cac );
      names = create_list_of_names_( cac );
      out   = initiate_structure_( len, names, 0 );
      for jj = 1 : len
          out(jj) = parse_one_section_( cac{jj}, out(jj) );
      end
  end
  function    sas = parse_one_section_( str, sas )
      cac = textscan( str, '%s%f' );
      for jj = 1 : length( cac{1,1} )
          sas.( cac{1,1}{jj} ) = cac{1,2}(jj);
      end
  end
  function    cac = create_list_of_names_( sections )
      str = cat( 2, sections{:} );
      cac = textscan( str, '%s%*f' );
      cac = permute( unique( cac{1} ), [2,1] );
  end
  function    sas = initiate_structure_( len, names, val )
      cell_values = num2cell( val( ones(len,length(names)) ) );    
      sas = cell2struct( cell_values, names, 2 );
  end

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Guillaume le 11 Juin 2015

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/223379-extracting-the-lines-of-interest-to-a-matrix-from-a-text#answer_182343

Modifié(e) : Guillaume le 11 Juin 2015

Ouvrir dans MATLAB Online

You're way overcomplicating it:

content = fileread('fic.txt'); %read all file at once:
tsteps = regexp(content, 'timestep\s+(\d+)\s+[^*]*?(HO2\s+\d+|\*)', 'tokens');
out = cell2mat(cellfun(@(tstep) [str2double(tstep{1}) str2double(regexp(tstep{2}, '\d+', 'match', 'once'))], tsteps', 'UniformOutput', false))

The first regular expression capture the first number after 'timestep, then matches anything but '*' until it finds 'HO2' followed by a number or a '*'. The 'HO2' with number or the '*' is the second capture. (Unfortunately you can't capture just the number due to limitations of matlab regular expression engine. You can't have a capture within a non-capturing group). In the end, for each timestep you get a cell containing a 1x2 cell array whose 1st cell is the timestep, and 2nd cell is the 'HO2' line if present or '*' if not.

The 2nd regular expression extract the number from the 'HO2' line and pass it to str2double (along with the timestep). If there's no 'HO2' line, then regexp return empty which str2double converts to NaN.

Note that your example does not have an HO2 line!

2 commentaires
Afficher AucuneMasquer Aucune

Homayoon le 11 Juin 2015

Thanks Guillaume! However, the code failed in capturing the real values for HO2! The code has no problem in reporting time steps and you were right if there is no HO2 the cell would be empty! But once there is a line started with HO2 then the cell always showed number 2!!! No matter what the real value is

Guillaume le 11 Juin 2015

Ouvrir dans MATLAB Online

Oh! of course, it's capturing the '2' of 'HO2'. Just replace the second regular expression by

regexp(tstep{2}, '(?<=HO2\s+)\d+', 'match', 'once')

Connectez-vous pour commenter.

extracting the lines of interest to a matrix from a text

2 commentaires
Afficher AucuneMasquer Aucune

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (1)

2 commentaires
Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

extracting the lines of interest to a matrix from a text

2 commentaires Afficher AucuneMasquer Aucune

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (1)

2 commentaires Afficher AucuneMasquer Aucune

Voir également

Catégories

Tags

Community Treasure Hunt

2 commentaires
Afficher AucuneMasquer Aucune

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

2 commentaires
Afficher AucuneMasquer Aucune