Effacer les filtres
Effacer les filtres

textscan or import of unicode encoded textfile

5 vues (au cours des 30 derniers jours)
Hyung-Sik Kim
Hyung-Sik Kim le 22 Sep 2011
Question 1: Are textscan and importdata supposed to work with unicode encoded text file?
Question 2: After UTF-8 encoded file is opened with the correct encoding spec in the fopen argument, textscan output puts the following three characters  preceding the very first valid data I have in the file. Is this expected behavior undocumented?

Réponses (2)

Anne
Anne le 5 Déc 2011
I have the same problem with my old MATLAB 7.3.0. Textscan won't read correctly unicode files, but it can deal with unicode formatted strings.
Thus a simple (but slow) workaround is to read text first with scanf and run textscan on the text.
[f,msg]=fopen(nomfic,'r','n','UTF-8');
LIGNES=textscan(f,'%[^\n]','delimiter','\n');
won't work with unicode encoded characters but
[f,msg]=fopen(nomfic,'r','n','UTF-8');
txt=fscanf(f,'%c');
LIGNES=textscan(txt,'%[^\n]','delimiter','\n');
will.

Walter Roberson
Walter Roberson le 22 Sep 2011
Answer 1: textscan() is; I do not know about importdata
Answer 2: When you explicitly specify one of the UTF-* as the encoding, the MATLAB code will not look for a Byte Order Mark, and will leave any Byte Order Mark in the file stream. If you do not explicitly specify the encoding, then the byte stream will be examined for a Byte Order Mark and if found the encoding will be determined by that.
It is not recommended that a Byte Order Mark be used with UTF-8, but some Windows editors insert it anyhow. The Byte Order Mark represented in UTF-8 is 0xEF,0xBB,0xBF which show up exactly as the characters you notice. See reference
I have not examined to see whether it makes a difference as to whether you opened the file with 'r' or 'rt' . I use 'rt' when referring to text files, as it can make a difference in some instances.

Catégories

En savoir plus sur Data Import and Export dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by