regexp: Extract optional named tokens

Question

Hau Kit Yong le 2 Juil 2019

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/469930-regexp-extract-optional-named-tokens

Modifié(e) : Akira Agata le 3 Juil 2019

trip-data.txt

Ouvrir dans MATLAB Online

I would like to extract some information from the following text:

There are 3 groups in the text. I want to extract the genders (enclosed in brackets), the group names (the text following 'Name:') and the student IDs for each group (the numbers following 'ID XX =').

My desired output is as follows:

The issue is that not all groups have a header line (the lines starting with '#'), e.g. for group 3.

My code is as follows

str = fileread('trip-data.txt');
expr = 'Student group.+?\((?<Gender>\w+?)\).*?Name:(?<Name>.+?)\nGROUP.+?=(?<IDs>.+?(,\s*\n.+?)*)(?=(\n|$))';
groups = regexp(str, expr, 'names');

The returned struct array ignores group 3:

I have also tried enclosing the header line in an optional bracket, e.g. '()?', like so

expr = '(Student group.+?\((?<Gender>\w+?)\).*?Name:(?<Name>.+?))?\nGROUP.+?=(?<IDs>.+?(,\s*\n.+?)*)(?=(\n|$))';

The returned struct captures the 'ID' fields but not the 'Gender' and 'Name' fields for all 3 groups:

2 commentaires
Afficher AucuneMasquer Aucune

Rik le 2 Juil 2019

Do you absolutely need to use a regexp? Because it might be easier with other tools (if maybe slightly less efficient).

Hau Kit Yong le 2 Juil 2019

I would like to, yes, because the text is a small snippet of a much larger file with varying formats that I am already parsing with other expressions.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Akira Agata le 3 Juil 2019

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/469930-regexp-extract-optional-named-tokens#answer_381868

Modifié(e) : Akira Agata le 3 Juil 2019

Ouvrir dans MATLAB Online

How about extracting 'Name', 'Gender' and 'ID' one-by-one?

The following is an example.

% Read the file
str = fileread('trip-data.txt');
% Remove newline in ID
str = regexprep(str,'\r\n\s+','');
% Remove newline after 'Name: XX'
str = regexprep(str,'(Name:\s+\w+)\r\n','$1, ');
% Store each line as a cell array
c = strsplit(str,'\r\n')';
% Extract one-by-one
Name = erase(regexp(c,'Name:\s(\w+)','match','once'),'Name: ');
Gender = regexp(c,'(male|female)','match','once');
ID = strtrim(extractAfter(c,'='));
% Summarize as a table
tbl = table(Name,Gender,ID);

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

regexp: Extract optional named tokens

2 commentaires
Afficher AucuneMasquer Aucune

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

regexp: Extract optional named tokens

2 commentaires Afficher AucuneMasquer Aucune

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Community Treasure Hunt

2 commentaires
Afficher AucuneMasquer Aucune

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens