Matlab does not recognise hyphen
15 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Philipp Braeuninger
le 2 Déc 2018
Commenté : Walter Roberson
le 4 Déc 2018
Hi all,
I'm trying to remove a simple hyphen "-" from a string array. But matlab does not seem to recognise the hyphen. I'm sourcing the text from a website and storing the text in a string array. Then by using strrep(myStringArray,'-','_') I'm trying to remove the hyphen. The weired thing is matlab does not remove it but when I stop the program in the debugger and locally execute this command again it works. Any thoughts on this highly appretiated!
9 commentaires
Christopher Creutzig
le 4 Déc 2018
MATLAB uses UTF-16, correct. (And I was assuming the string was read as a string. native2unicode is only useful if you read binary data or otherwise got raw numbers. MATLAB strings are in Unicode, in UTF-16 encoding.)
Walter Roberson
le 4 Déc 2018
For unicode code points U+10000 and above, ideally it would be nice to see the codepoint itself, perhaps as a uint32, but uint16(char(s)) and char(s)+0 and s+0 cannot give that to you.
It gets kinda confusing... if you see 55296 (hex D800), are you seeing an actual code-point U+D800, or are you seeing Surrogate High Byte 0 ? According to the documentation for char() numeric inputs are treated as unicode code points, so char(55296) should have to be encoded into multiple positions encoded in UTF16. But if you are going to bother doing that, then why restrict inputs to 65535 ? The user-visible interface is as-if UTF16 is not used internally, and that instead a "character" header is tossed onto uint16() of the numeric values.
>> foo = char(55296)
foo =
'?'
>> whos foo
Name Size Bytes Class Attributes
foo 1x1 2 char ?
(It is not a ? that shows up, it is an empty box)
Evidence that UTF16 was not used: look at bytes: UTF16 encoding of U+D800 is more than 2 bytes.
>> D800DC00 = uint8([216 0 220 0])
D800DC00 =
1×4 uint8 row vector
216 0 220 0
>> bar = native2unicode(D800DC00, 'UTF16')
bar =
'?'
>> bar+0
ans =
55296 56320
>> whos bar
Name Size Bytes Class Attributes
bar 1x2 4 char
Actual unicode code point: U+10000 .
This all tends to suggest that UTF16 is not the internal representation in MATLAB, and that uint16(char(s)) will not show the unicode code points.
Réponse acceptée
Plus de réponses (1)
Jan
le 3 Déc 2018
Modifié(e) : Jan
le 3 Déc 2018
if I stop in the debugger and execute the command it does work
Then there must be another problem. The debugger can influence the result, if you create variables dynamcially by eval, e.g. in called scripts. Otherwise the code must do exactly the same in debug and non-debug mode. So if you observe, that your code does not consider the command, which is executed successfully during debugging, maybe the result is overwritten anywhere in the following code. Perhaps you use myStringArray instead of myNewArray after this line:
myNewArray= replace(myStringArray, ...
["-" "–" char(8211) "-" char(8212) "—" "—" "–"],'_');
0 commentaires
Voir également
Catégories
En savoir plus sur Data Type Conversion dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!