Issue with native2unicode and windows-1252 encoding

15 vues (au cours des 30 derniers jours)
Borja Heriz
Borja Heriz le 14 Jan 2022
Commenté : Walter Roberson le 17 Jan 2022
Hi all,
I'm trying to encode some bytes into a character set using the windows-1252 encoding and I've checked that native2unicode
  1 commentaire
Rik
Rik le 14 Jan 2022
Most of your question seems to be missing.

Connectez-vous pour commenter.

Réponses (3)

Walter Roberson
Walter Roberson le 14 Jan 2022
source = char(0:511)
source =
' !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿĀāĂ㥹ĆćĈĉĊċČčĎďĐđĒēĔĕĖėĘęĚěĜĝĞğĠġĢģĤĥĦħĨĩĪīĬĭĮįİıIJijĴĵĶķĸĹĺĻļĽľĿŀŁłŃńŅņŇňʼnŊŋŌōŎŏŐőŒœŔŕŖŗŘřŚśŜŝŞşŠšŢţŤťŦŧŨũŪūŬŭŮůŰűŲųŴŵŶŷŸŹźŻżŽžſƀƁƂƃƄƅƆƇƈƉƊƋƌƍƎƏƐƑƒƓƔƕƖƗƘƙƚƛƜƝƞƟƠơƢƣƤƥƦƧƨƩƪƫƬƭƮƯưƱƲƳƴƵƶƷƸƹƺƻƼƽƾƿǀǁǂǃDŽDždžLJLjljNJNjnjǍǎǏǐǑǒǓǔǕǖǗǘǙǚǛǜǝǞǟǠǡǢǣǤǥǦǧǨǩǪǫǬǭǮǯǰDZDzdzǴǵǶǷǸǹǺǻǼǽǾǿ'
bytes = unicode2native(source, 'windows-1252')
bytes = 1×512
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
backport = char(bytes)
backport =
' !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~ ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ'
whichdiffer = find(source(1:256) ~= backport(1:256) )
whichdiffer = 1×27
129 131 132 133 134 135 136 137 138 139 140 141 143 146 147 148 149 150 151 152 153 154 155 156 157 159 160
source(whichdiffer)
ans = ''
bytes(whichdiffer)
ans = 1×27
26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26 26
backport(whichdiffer)
ans = ''
What this is telling us is that Unicode 129 to 141 are not represented in Windows 1252
bytes2 = uint8(129:141)
bytes2 = 1×13
129 130 131 132 133 134 135 136 137 138 139 140 141
encodes_as = native2unicode(bytes2, 'windows-1252')
encodes_as = '‚ƒ„…†‡ˆ‰Š‹Œ'
double(encodes_as)
ans = 1×13
129 8218 402 8222 8230 8224 8225 710 8240 352 8249 338 141
Looks about right.
  2 commentaires
Borja Heriz
Borja Heriz le 17 Jan 2022
Thanks for the asnwer.
But what about unicode 26 and 157? These are also encoded with the square symbol in Windows 1252.
Thanks
Walter Roberson
Walter Roberson le 17 Jan 2022
code point 26 is the standard value to substitute for codepoints that cannot be represented
https://en.m.wikipedia.org/wiki/Substitute_character

Connectez-vous pour commenter.


Borja Heriz
Borja Heriz le 17 Jan 2022
Hi,
Sorry for not having completed the post...
My question is about why unicode2native returns the same symbol for different numerical values.
native2unicode(26,'windows-1252')
native2unicode(157,'windows-1252')
native2unicode(129,'windows-1252')
All of them return the square symbol in R2020b.

Borja Heriz
Borja Heriz le 17 Jan 2022
Hi there,
Definetely, there must be something I'm missing. I don't understand why numercial numbers 153 and 156 are equally encoded with independence of the method I use.
char(153)
ans = ''
char(156)
ans = ''
native2unicode(153,'ISO-8859-1')
ans = ''
native2unicode(156,'ISO-8859-1')
ans = ''
native2unicode(153,'utf-8')
ans = '�'
native2unicode(156,'utf-8')
ans = '�'
native2unicode(153,'US-ASCII')
ans = '�'
native2unicode(156,'US-ASCII')
ans = '�'
native2unicode(153,'latin1')
ans = ''
native2unicode(156,'latin1')
ans = ''
What I'm doing wrong?
Thanks,
  1 commentaire
Rik
Rik le 17 Jan 2022
This is an answer, but it looks like a comment. Please use the comment sections to post comments. The order of answers can change, which will make reading back confusing.
Please post this as a comment and delete the answer.
When you do, I (or Walter) will post something along these lines:
Why do you think 153 and 156 are encoded as the same character? They are displayed as the same character, but that is probably due to a limitation in the display, as this could very well encode a control character without a proper symbol.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Data Type Conversion dans Help Center et File Exchange

Produits


Version

R2020b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by