problems with a regex

6 vues (au cours des 30 derniers jours)
Thomas
Thomas le 9 Juil 2013
Hi.
I'm trying to create a regular expression to match and extract some information. Two examples of the source string
example one: 10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv
example two: 10/2/leaf.nr.2 is a Projection error - 3D points.csv
I want to extract the string between "is a " and " - touches edge" OR " - 3D" In both example strings this would be "Projection error" but this can be something else.
Currently I have the pattern:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)?(?:\s\-\s3D).*.csv'
for example one this returns (not expected):
'Projection error - touches edge'
but for example two it returns(expected):
'Projection error'
IF I change the pattern to:
'.*is\sa\s(?<type>.*)(?:\s\-\stouches\sedge)(?:\s\-\s3D).*.csv'
so I require the (?:\s\-\stouches\sedge) to be matched it returns (correctly):
'Projection error'
for example one but now example two (that dont have the the "touches edge" part ) will not match(of cause).
I dont get why example one also contains the " - touches edge" in the result using the first pattern when I ask it to match this pattern 0 or 1 times.
Any help will be highly appreciated.
Best regards, Thomas
  1 commentaire
Thomas
Thomas le 9 Juil 2013
My current solution is to use this pattern instead:
'.*is\sa\s(?<type>[\w\s]*)(?:\s\-\s)?.*'
It results in the needed information except an extra space character are added. So the result for both example one and two are now:
"Projection error "

Connectez-vous pour commenter.

Réponses (2)

Muthu Annamalai
Muthu Annamalai le 9 Juil 2013
A simple solution to parse the string with rule
"is a " and ( " - touches edge" OR " - 3D" )
is to use sequential regexp().
That way you know "is a" bit of your source is split out, and then you can search for which of 2 alternatives are present in your case.
Also see the 'NOT' exclusion class operators in regexp, and 'split' mode of regexp.
http://www.mathworks.com/help/matlab/ref/regexp.html
  1 commentaire
Thomas
Thomas le 9 Juil 2013
Thanks for your response.
My task is not to match either of the two cases - its simply to extract the string between "is a " and the first " - " (This is a new, shorter, formulation of my problem that I just realized)
Splitting would be a way to go but I would like to know if its possible to create a regex for it.

Connectez-vous pour commenter.


per isakson
per isakson le 9 Juil 2013
Modifié(e) : per isakson le 9 Juil 2013
to extract the string between "is a " and the first " - " This formulation is close to a pseudo-code for the expression we search.
ex1 = '10/0/leaf.nr.0 is a Projection error - touches edge - 3D points.csv';
ex2 = '10/2/leaf.nr.2 is a Projection error - 3D points.csv';
regexp( ex1, '(?<=is a )[^\-]+(?= \- )', 'match' )
regexp( ex2, '(?<=is a )[^\-]+(?= \- )', 'match' )
returns
ans =
'Projection error'
ans =
'Projection error'
Search the doc for "Lookaround Assertions" or just "Lookaround". Lookahead Assertions in Regular Expressions
PS. '\-' or just '-' ; a backslash (escape) too many seldom hurts and I've problems to remember when it's needed.
.
OR according to the requirement of the OP
regexp( ex1, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
regexp( ex2, '(?<=is a ).+?(?= ((\- touches edge)|(\- 3D)))', 'match' )
The extra parentheses, (), makes the expression more readable - imo.
The "?" in ".+?" is the
Lazy expression: match as few characters as necessary.

Tags

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by