read specified data range for fread a large binary file

28 vues (au cours des 30 derniers jours)
Jun-Hui Huang
Jun-Hui Huang le 8 Sep 2023
I have a large binary data file that’s around 70 GB. Unfortunately, my laptop doesn’t have enough RAM to read all the elements in this data file. The data L is a matrix with dimensions 28500 x 8031. To mitigate the RAM usage, I’m wondering if it’s possible to just read a specific range of data instead of the whole file. Specifically, I’d like to read only the 1338th, 1339th, and 1340th columns.
Here is my function,
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
fclose all;
ncol = 8031;
formatype = 'float32'; % format type: 'float32' for .bin file and 'single' for .tda file
% Load the whole data and then transpose it
fileID = fopen (filename);
if fileID < 0
error ('This result file does not exist.');
else
L = fread (fileID, [ncol, inf], formatype);
L = transpose (L);
end
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~%
Thank you.
Best,
  1 commentaire
Stephen23
Stephen23 le 8 Sep 2023
Modifié(e) : Stephen23 le 8 Sep 2023
"The data L is a matrix with dimensions 28500 x 8031"
You forgot to tell us the most important information: is it saved column-major or row-major?
There are a few approaches you could use, but the details depend on the data format.

Connectez-vous pour commenter.

Réponses (2)

recent works
recent works le 8 Sep 2023
Yes, it is possible to read a specific range of data from a binary file in MATLAB. You can use the fread() function to read data from a file, and specify the start and end indices of the range you want to read.
In your case, you want to read the 1338th, 1339th, and 1340th columns.
fileID = fopen(filename);
L = fread(fileID, [ncol, 3], formatype, 1338, 1340);
This code will open the file, read the data from the 1338th to the 1340th column, and store it in the variable L.
The fread() function has a number of other parameters that you can use to control how the data is read.
  3 commentaires
recent works
recent works le 8 Sep 2023
yes
Walter Roberson
Walter Roberson le 8 Sep 2023
This answer is incorrect.
The full syntax for fread is
A = fread(FID,SIZE,PRECISION,SKIP,MACHINEFORMAT)
There is no option at all for specifying indices.

Connectez-vous pour commenter.


Walter Roberson
Walter Roberson le 8 Sep 2023
There are two common ways that blocks of binary data can be arranged in a file.
If the data is stored so that for a given location in the file, the next location in the file generally is for the same column but the next row in that column, then that arrangement is called "Row Major Order". This is the arrangement that MATLAB uses internally for its arrays, and is the order that MATLAB would use when asked to write binary files.
If the data is stored so that for a given location in the file, the next location in the ffile is generally for the same row but the next column in that row, then that arrangement is called "Column Major Order". This is the arrangment that C and C++ and a number of other programming languages use internally, so it is common to find files that are stored this way.
If the file is Row Major Order, then in order to read a single column, then the steps are:
  • [Row Major Only!] Use fseek to seek to the file location of the beginning of the column. Multiply (column number minus 1) by the number of rows in the array, and multiply that by the number of bytes per entry (4 bytes for float32) to get the byte offset to seek relative to the beginning of the file. Then use fread() with size [number of rows in array, number of columns to read now] and precision 'single' . If the columns were not adjacent, then you would repeat this using fseek() to get to the begining of each non-contiguous column
If the file is in Column Major Order, then reading a column is a bit more of a nuisance:
  • [Column Major Only!] Use fseek() to seek to the file location of the beginning of the column. Multiply (row number minus 1) by the number of columns in the array, and multiply that by the number of bytes per entry (4 bytes for float32) to get the byte offset relative to the beginning of the file. Then use fread() with size [number of rows in array, 1] and precision 'single'. Use a skip equal to (number of columns in file) times number of bytes per entry (4 bytes for float32). This can only read one column at a time; to get the other columns you will need to fseek() again. (If the number of adjacent columns were to increase relative to the number of rows in the file, then a different reading strategy would become viable.)
But... you should have a look to see whether multibandread can do what you want.

Catégories

En savoir plus sur Low-Level File I/O dans Help Center et File Exchange

Produits


Version

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by