How to use Unicode numeric values in regexprep?
Afficher commentaires plus anciens
How can "Häagen-Dasz" be converted to "Haagen-Dasz" using Uincode numeric values? For example,
regexprep('Häagen-Dasz','ä','A')
works fine, but
regexprep('Häagen-Dasz','\x{C4}','a')
does not. Here, the hexadecimal \x{C4} stands for [latin capital letter a] with diaeresis, i.e. [ä].
1 commentaire
VBBV
le 28 Mar 2024
I am not sure if i understand your question right, but Read this answer below
Réponse acceptée
Plus de réponses (2)
inp = 'Häagen-Dasz';
baz = @(v)char(v(1)); % only need the first decomposed character.
out = arrayfun(@(c)baz(py.unicodedata.normalize('NFKD',c)),inp) % remove diacritics.
Read more:
https://docs.python.org/3/library/unicodedata.html
https://stackoverflow.com/questions/16467479/normalizing-unicode
regexprep('Häagen-Dasz','ä','A')
regexprep('Häagen-Dasz','ä','\x{C4}')
2 commentaires
regexprep('Häagen-Dasz','\x{e4}','a')
VBBV
le 28 Mar 2024
The unicode character for small a is \x{e4}
Catégories
En savoir plus sur App Building dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!