get unknow Strings(Text) out of other strings ?

Hey Guys, can u give me an example, that shows how to get information out of a long String ? If......
the Word before the Information u want is known. Example
name: Benny age:23 ( I want the Info Age)
Out of this code
<HTML><FONT color="0000FF">Used Amplification(Hidden)</FONT></HTML>
I want the Information:Used Amplification(Hidden)
I guess the KEy is regexp again.....but i think i am wrong....sorry for these stupid questions.

 Réponse acceptée

Guillaume
Guillaume le 10 Sep 2014
If you don't want to learn the regex syntax, you can use strfind:
before = '<HTML><FONT color="0000FF">';
after = '</FONT></HTML>';
start = strfind(str, before) + length(before); %or just length(before)+1 if str always starts with before.
end = strfind(str, after) - 1;
result = str(start:end); %assumes there's only ever one match
It's of course a lot more flexible and shorter with regexes:
result = regexp(str, '<HTML><FONT color="[0-9A-F]+">(.*?)</FONT></HTML>', 'tokens', 'once'); %with added bonus it will work with any color, not just 0000FF.

5 commentaires

Max Müller
Max Müller le 10 Sep 2014
I try to learn more about regexp...but I cant which of singes i have to use...../w means every letter of the alphabet ... where is the rest
Not sure what you're saying because of the typo.
\w (not /w) is a character class that not only matches every letter but also number and underscore. I find matlab character classes ill defined and rarely use them. To match any letter of the alphabet I would use [a-zA-Z]
There are many ways I could have built the regex in my answer. Let's parse it:
'<HTML><FONT color="'
There are no special character in that bit, so it just matches it exactly
'[0-9A-F]'
Means match any character between 0-9 or A-F (basically any hex character)
'+'
means match the expression just before (hex character) one or more time. Hence it will match any series of hex characters and stop as soon as a character differs from 0-9A-F
'">"'
Not special character. matches exactly
'('
Begins the definition of a token. Anything between '(' and ')' is a token
'.'
Matches any character
'*?'
Matches the previous expression ('.') zero or more times. Hence it will match any character zero or more time. I use the non-greedy version of '*' here which means it will only match as many character as needed for the whole regex to succeed.
')'
Marks the end of the token. I extract the token with the 'token' option of regexp.
'</FONT></HTML>'
No special character matches exactly.
---
There are plenty of resources on the web to learn regular expression. Matlab helps is probably not the best reference.
Max Müller
Max Müller le 10 Sep 2014
Hence Matlab uses C++, can i take the c++ regular expression help ?
Guillaume
Guillaume le 11 Sep 2014
Matlab regular expression engine is slightly different than C++ std::regex and other posix compliant regexes, it's not as good on some things (e.g. captures) but the basics are the same so, yes, you can use tutorial for C++ or any other language.
This one seems to cover the basics and is language agnostic.
Star Strider
Star Strider le 11 Sep 2014
@Guillaume — That has to be one of the best explanations of regexp I’ve read!
+1

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Characters and Strings dans Centre d'aide et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by