How to write ASCII to byte file?

Question

Jan le 4 Oct 2024

2
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2157400-how-to-write-ascii-to-byte-file

Modifié(e) : Stefanie Schwarz le 21 Oct 2024

In MATLB versions until R2021a I could write ASCII values stored in a CHAR array into a string as unsigned char by:

data = char([126, 129]);     % Exceeds 7 bit ASCII
file = fullfile(tempdir, 'test.dat');
[fid, msg] = fopen(file, 'w');
assert(fid ~= -1, msg);
fwrite(fid, data);           % Write uchar until R2021b
fclose(fid);

This wrote the unsigned chars [126, 129] untill R2021b:

[fid, msg] = fopen(file, 'r');
bytes      = fread(fid, [1, inf], 'uint8');  % < R2021b: [126, 129]

Since R2021b the bytes are converted to UTF-8 and the file contains [126, 194, 129], so a specification is required:

fwrite(fid, data, 'uchar'); % < R2024b: [126, 129], R2024b: [126, 194, 129]

In R2024b this writes [126, 194, 129] again. Intuitively I've tried:

fwrite(fid, data, 'uint8'); % R2024b: [126, 194, 129] also

Playing with the encoding type in fopen does not help also. My questions:

Is there a low level method to write UCHARs stored in a CHAR vector using fopen / fwrite without a Unicode conversion?
This change of behaviour breaks a lot of my codes. Is this considered to be useful?
What is the best way to write unsigned bytes without dependency to the platform?

A workaround is the casting to UINT8:

fwrite(fid, uint8(data), 'uint8');

But duplicating data in memory without a reason is a waste of time.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Bruno Luong le 4 Oct 2024

2
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2157400-how-to-write-ascii-to-byte-file#answer_1526795

Modifié(e) : Bruno Luong le 6 Oct 2024

Ouvrir dans MATLAB Online

bytes = uint8([126, 129])
data = char(bytes);     % Exceeds 7 bit ASCII

Not sure about the encoding standards but this seems to do what you want

file = fullfile(tempdir, 'test.dat');
[fid, msg] = fopen(file, 'w', 'n', 	"windows-874"); % "ISO-8859-1" is even better see Andres"s comment
assert(fid ~= -1, msg);
fwrite(fid, data);           % Write uchar until R2021b
fclose(fid);
[fid, msg] = fopen(file, 'r');
assert(fid ~= -1, msg);
bytes      = fread(fid, [1, inf], 'uint8');
fclose(fid);
bytes % [126, 129]

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

Jan le 5 Oct 2024

@Bruno Luong: You wrote "fopen defaults to using UTF-8 in order to provide interoperability between all platforms". I understand and expand: "all plattforms except older Matlab versions".

'ISO-8859-1' works with older Matlab versions also, since the encoding-type was added to fopen. Therefore this is my prefered solution.

Bruno Luong le 5 Oct 2024

Yes and it is less commands to change in the code with fopen than fwrite so less risk of error.

Ascii encoding seems always messy and mysterious to me. For example one have to specify the encoding in writing but not in reading.

Please let us know what causes the change of behavior.

Connectez-vous pour commenter.

Answer 2

Andres le 4 Oct 2024

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2157400-how-to-write-ascii-to-byte-file#answer_1526915

Modifié(e) : Andres le 4 Oct 2024

Ouvrir dans MATLAB Online

Seemingly you have two options:

Either use the ISO-8859 encoding with fopen (see my comment on Bruno's answer) or use 'ubit8' precision with fwrite:

bytes_write = uint8([0, 126, 129, 150, 215, 255]);
data = char(bytes_write);
file = fullfile(tempdir, 'test.dat');
fid = fopen(file, 'w');
fwrite(fid, data, 'ubit8'); 
fclose(fid);
fid = fopen(file, 'r');
bytes_read = fread(fid, [1, inf], '*uint8');
fclose(fid);
l = dir(file);
l.bytes % 6
isequal(bytes_write, bytes_read) % true

Both variants did the job in R2024a and R2024b.

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Jan le 5 Oct 2024

@Bruno Luong: I've asked the support about this difference now.

@Andres: 'ubit8' works. Setting the encoding type seems to be more intuitively. Thank you.

Stefanie Schwarz le 21 Oct 2024

ubitn is meant to write "any" bit length, n. So it should be uint8 == ubit8 as long as the data array is uint8. We will look into this, but the recommendation is to specify the encoding when working with char as pointed out by Bruno.

Connectez-vous pour commenter.

Answer 3

Stefanie Schwarz le 21 Oct 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2157400-how-to-write-ascii-to-byte-file#answer_1534945

Modifié(e) : Stefanie Schwarz le 21 Oct 2024

Ouvrir dans MATLAB Online

To write exact bytes, create the data using the uint8 data type and write the bytes with fwrite and the uint8 output type:

data = uint8([126, 129]);    
file = fullfile(tempdir, 'test.dat');
[fid, msg] = fopen(file, 'w');
fwrite(fid, data, 'uint8');           
fclose(fid);

char is subject to unicode conversion in fwrite. You can write specific byte patterns for a specific encoding, like:

data = char([126, 129]);    
file = fullfile(tempdir, 'test.dat');
[fid, msg] = fopen(file, 'w','n','windows-1252');
fwrite(fid, data);           
fclose(fid);

In older versions of MATLAB, there was a bug where we weren't doing the correct unicode conversions on char values above the ASCII range. However, the bug wasn't noticed because the default encoding allowed characters in the extended-ascii range [128-255], and so the output was the same for all values in [0-255] range.

We fixed this bug in R2021b and ported the bugfix back as far back as R2019b. Here is the official bug report:

fwrite does not write UTF-8 character data correctly (2443431)

The default output encoding of fopen changed to UTF-8 in R2020a which should have resulted in different output file results, however the bug in fwrite meant that it continued to produce the same file. After fixing the bug, any time the input type is a text type (string/char), it's converted to the output encoding before being written to the file. Most users wouldn't have noticed the bug unless they specified the encoding as UTF-8 (or were on Linux). The bug remained undetected for a long time since, despite the code being wrong, it produced the correct output for the default encoding for extended ASCII range characters.

So using the char datatype to store bytes was never correct, but for many cases, coincidentally produced the expected file output due to the default encoding and then because of the bug in fwrite.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

How to write ASCII to byte file?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (3)

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

How to write ASCII to byte file?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (3)

5 commentaires Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens