Is matfile read speed affected by how file is constructed?

Question

Cameron Lee le 2 Nov 2018

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/427628-is-matfile-read-speed-affected-by-how-file-is-constructed

Modifié(e) : Cameron Lee le 5 Déc 2018

I have a dataset that is 259000x94000x6 of int16 data. Obviously, this is way too big to fit into memory (about 276 GB) or load at once. The main issue is that the data can only be downloaded in 94000 separate chunks that are 259000x6 each, but I need to analyze the data in 259000 separate chunks of 94000x6 arrays.

For the past two weeks I have been trying various big data techniques in Matlab to optimize the way to read all of this data. The fastest way seems to be to turn it into one large file with all the data, which MUST be built by appending 94000 files of 259000x6 arrays (and not the other way around, due to the native structure of the data). However, one very peculiar thing that I have found is that no matter how I build my giant .mat file (e.g. 259000x94000x6 or 94000x259000x6) the read speed using matfile is ALWAYS an order of magnitude quicker when reading it in 259000x6 chunks rather than 94000x6 chunks. I've tried using '-v7.3' with and without compression, I've tried chunking it into smaller files of 3GB each and for-looping through these files, I've tried turning it into a fileDataStore, and nothing seems to allow me to read the data in 94000x6 chunks as fast as I can in 259000x6 chunks! Has anyone else experienced this, know why this is, and/or know a workaround?

Thanks!

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Rik le 2 Nov 2018

Is it possible to either share some of the data or to write some code that generates representative data?

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Cameron Lee le 5 Déc 2018

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/427628-is-matfile-read-speed-affected-by-how-file-is-constructed#answer_350757

Modifié(e) : Cameron Lee le 5 Déc 2018

Ouvrir dans MATLAB Online

I thought I'd follow this up... the short answer to the question is that read speed must be impacted by the way the file is constructed. However, I found a way around this... First, I had to build data files in chunks (I did about 80 chunks/files) that were 1175x259000x6 each. After all of these were finished, I then used the matfile command in a for-loop to bring the data in and permute the dimensions:

% Run permute function on the 80 chunk files (takes some time, cannot parfor)
for x=1:80
    xstr=num2str(x)
    filename=strcat('location\ChunkFolder\AlldataCH',xstr,'.mat');
    m=matfile(filename,'Writable',true);
    m.alldata=permute(m.alldata,[2 1 3]);
end

I was then able to read it in, and analyze it in a more timely fashion...

%% Build m (cell array of matfile connections to use repeatedly below)
xnum=0;
for x=1:80
    xnum=xnum+1;
    xstr=num2str(x);
    filename=strcat('location\ChunkFolder\AlldataCH',xstr,'.mat');
    m{xnum}=matfile(filename);
end
% Read data into MatLab in 94000x6 form & in optimized time, and analyze
parfor y=1:259920
    newdata={1};
    xxnum=0;
    for x=1:80
        xxnum=xxnum+1;
        newdata{xxnum}=squeeze(m{x}.alldata(y,:,:));
    end
    finaldata=vertcat(newdata{:})';
    
    %%%% DO ALL ANALYSIS HERE %%%%
    
end

For whatever reason, this is the only way I could find that allowed me to read the data into Matlab the way that I needed to, and in a timely manner (about a 30x improvement vs. reading it without permuting the dimensions).

As a side note, I tried to do the permute BEFORE I saved the original chunks... and that still did not work (and as I mentioned in my original post I tried just saving it as a 259000x94000x6 (and in 259000x1175x6 chunks) and that did not work). Only after I made the chunks, closed the file, brought the file back into Matlab and permuted it, did it then work. Anyway, I hope this helps anyone out there with a similar problem. Also, if anyone can find an even speedier way to do this, please just let me know.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Is matfile read speed affected by how file is constructed?

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Is matfile read speed affected by how file is constructed?

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens