Extract regexp tokens with regexpPattern

3 vues (au cours des 30 derniers jours)
Jan Kappen
Jan Kappen le 29 Fév 2024
Commenté : Jan Kappen le 29 Fév 2024
With regexp I could extract the tokens of my capture groups via
regexp("abcd3e", "\w+(\d)+\w", "tokens")
ans = 1×1 cell array
{["3"]}
The result is a cell array. With the new regexpPattern and extract functions, the return values usually are string (arrays) which is something I prefer.
Question: Is there an analogon of the above regexp using something like extract("abcd3e", regexpPattern("\w+(\d)+\w"), "tokens")? This syntax obviously does not work in R2023b, but are there standard ways to rewrite these patterns to return my tokens?
Thanks,
Jan
EDIT: this is just a toy example, I do not only want to extract digits which could be done with digitsPattern. Ideally, I'd like to understand how directly translate the regexps.
To show a more realistic example:
str = [
"42652Z_HEX"
"42652X"
"42652Y"
"42652Z"
"42652GYRO-X_HEX"
"42652GYRO-Y_HEX"
"42652GYRO-Z_HEX"
"42351Temp_HEX"
"42652Temp_HEX"
"42652GYRO-X"
"42652GYRO-Y"
"42652GYRO-Z"
"42351Temp"
"42652Temp"
];
res = string(regexp(str, "\d+(?:GYRO-)?([XYZ])?.*", "tokens"))
res = 14×1 string array
"Z" "X" "Y" "Z" "X" "Y" "Z" "" "" "X" "Y" "Z" "" ""
% how to get the same result with matches and regexpPattern?
  2 commentaires
Dyuman Joshi
Dyuman Joshi le 29 Fév 2024
Déplacé(e) : Dyuman Joshi le 29 Fév 2024
If you just want to extract numbers between letters -
str = "abcd3e57xyz";
out = extract(str, digitsPattern)
out = 2×1 string array
"3" "57"
Jan Kappen
Jan Kappen le 29 Fév 2024
Déplacé(e) : Dyuman Joshi le 29 Fév 2024
Thanks for your answer.
No, I do not only want to extract numbers, it's a toy example. I'd like to translate the regexps which already exist into the new regexpPattern - if possible. The regexp might get more complicated than the shown one. I'll edit my question accordingly.

Connectez-vous pour commenter.

Réponses (1)

the cyclist
the cyclist le 29 Fév 2024
I realize that this is not really an answer to your question, but I just wanted to make sure you are aware that one option is to wrap the string function around the regexp:
string(regexp("abcd3e fghi4j", "\w+(\d)+\w", "tokens"))
ans = 1×2 string array
"3" "4"
Also, if you are guaranteed to have only one match, you could do
regexp("abcd3e", "\w+(\d)+\w", "tokens","once")
ans = "3"
but that's somewhat fragile coding, I would say.
I'm not yet sure if there is a more "direct" way with more recent functions.
  2 commentaires
the cyclist
the cyclist le 29 Fév 2024
Your updated question clarifies that my answer is not what you are looking for, but I'll leave it here anyway. :-)
Jan Kappen
Jan Kappen le 29 Fév 2024
Thank you very much for your answer.
Yes, I updated the question to clarify a bit, sorry.
There were cases in the past where I could not cast to string, I'll need to check why. In fact that's not a terrible solution, but I'm simply wondering how to use the new regexpPattern properly and maybe I'm missing something.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Characters and Strings dans Help Center et File Exchange

Produits


Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by