Encoding problem reading data using fread

14 vues (au cours des 30 derniers jours)
Michael Liedlgruber
Michael Liedlgruber le 24 Mai 2023
Hi,
I'm using the following code to read in data from a file which contains text as well as binary data (European Data Format, to be more specific):
fid = fopen('test.edf', 'r', 'l');
fileType = fread(fid, 1, 'uint8');
id = char(fread(fid, [1 7], 'char'));
fclose(fid);
On my machine (Windows 10, MATLAB R2020a Update 6) this code runs fine and the values returned (i.e. fileType and id) are correct.
However, when this code is run on a different machine (one of our customers; also running Windows 10 but using MATLAB 2020a Update 1) using the same input file, the value of id seems to be read in incorrectly (the encoding used seems to be UTF-16BE. In fact, I get the same incorrect results on my machine if I specify UTF-16BE as the file encoding in the fopen call.
More interestingly, if I open the file on my machine without specifying an encoding and determine the used encoding using
[filename, permission, machineformat, encoding] = fopen(fid);
then the encoding UTF-16BE is returned.
And the default encoding in Windows is the same across the machines compared.
So, to me it seems like MATLAB on my machine detects an incorrect encoding because the file contains the BOM somewhere in the data but nevertheless returns the correct values. On the customers machine, however, it seems like the detected encoding is used, yielding different results.
My question is now: how is it possible that MATLAB obviously detects a wrong encoding but reads in the data correctly on my machine? And why do I get incorrect data if I explicitly specify the incorrect encoding (which is detected by MATLAB)? And why does the customer get different results although the same input file is used and although MATLAB detects the same (incorrect) encoding?
Is it possible that something has changed between Update 1 and Update 6 of MATLAB R2020a which causes MATLAB to behave differently? Unfortunately, I did not find any hint in the release notes of the updates with respect to the behavior of fopen.
Best,
Michael
  2 commentaires
Mathieu NOE
Mathieu NOE le 25 Mai 2023
you may want to contact TMW support for that
Michael Liedlgruber
Michael Liedlgruber le 25 Mai 2023
Thank you. Yes, if nobody in the community has an idea what may cause these inconsistencies, I will contact TMW support.
Fortunately, a fix is quite easy: by specifying UTF-8 encoding explicitly, everything works as expected on all machines.
But I'm still curious what's going on here.
Best,
Michael

Connectez-vous pour commenter.

Réponses (1)

Ayush
Ayush le 4 Sep 2023
  1 commentaire
Michael Liedlgruber
Michael Liedlgruber le 6 Sep 2023
Thank you. But this does not really answer my question. And, funnily, the page you linked to says "For more information, see ."
So, I already know that MATLAB defaults to UTF-8. But as you can see in my original post, the behavior is inconsistent between Update 1 and Update 6.And I have no explanation why on my machine the incorrect encoding is returned by fopen(fid), while the correct encoding is used when reading the data.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Low-Level File I/O dans Help Center et File Exchange

Produits


Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by