How to extract numeric data between string lines?

3 vues (au cours des 30 derniers jours)
Federico Geser
Federico Geser le 27 Jan 2021
Modifié(e) : Stephen23 le 27 Jan 2021
Hi MATLAB Community
I'm trying to solve this problem, which for sure is not new, but I haven't been able to find a proper solution.
I have a file with several headlines, and then a lot of information in the following way:
Binning n: 1, "De19 ", Event #: 150, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 151, Primary(s) weight 1.0000E+00
Number of hit cells: 1
1 7.185244612628594E-05
Binning n: 1, "De19 ", Event #: 152, Primary(s) weight 1.0000E+00
Number of hit cells: 0
Binning n: 1, "De19 ", Event #: 153, Primary(s) weight 1.0000E+00
Number of hit cells: 0
As shown, sometimes after the "Number of hit cells" line, there are numbers. I would like to extract them in a matrix or array. Is there a way to do this?
I attached an example file, that usually contains a lot more of data, that I erased for weight questions.
Thank you very much in advance

Réponse acceptée

Stephen23
Stephen23 le 27 Jan 2021
Modifié(e) : Stephen23 le 27 Jan 2021
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)';
tmp = regexp(str,rgx,'match')
tmp = 1x2 cell array
{'1 7.185244612628594E-05'} {'1 2.547905314713717E-04'}
vec = cellfun(@(s)sscanf(s,'%f',[1,Inf]),tmp,'uni',0) % convert to numeric
vec = 1x2 cell array
{1×2 double} {1×2 double}
mat = vertcat(vec{:}) % optional merge into one numeric matrix
mat = 2×2
1 7.1852e-05 1 0.00025479
  4 commentaires
Federico Geser
Federico Geser le 27 Jan 2021
Hi Stephen!
I think it works, but the test file has 12 MB of info to filter, so it might take a while. I don't know if this will work when I get the real results (that may weight ca. 100 MB).
Nevertheless, very helpful solution! Thank you!
Stephen23
Stephen23 le 27 Jan 2021
Modifié(e) : Stephen23 le 27 Jan 2021
If there are always exactly two numbers on each of those lines, then this is probably more efficient:
str = fileread('02-2021-Clearance-Box005_fort72.txt');
rgx = '(?<=Number of hit cells:\s+\d+\s+)(\d+[^\n]*)'; % unchanged
tmp = regexp(str,rgx,'match'); % unchanged
mat = sscanf(sprintf(' %s',tmp{:}),'%f',[2,Inf]).'
mat = 2×2
1 7.1852e-05 1 0.00025479

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Data Type Conversion dans Help Center et File Exchange

Produits


Version

R2019b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by