Is matfile read speed affected by how file is constructed?
1 vue (au cours des 30 derniers jours)
Afficher commentaires plus anciens
I have a dataset that is 259000x94000x6 of int16 data. Obviously, this is way too big to fit into memory (about 276 GB) or load at once. The main issue is that the data can only be downloaded in 94000 separate chunks that are 259000x6 each, but I need to analyze the data in 259000 separate chunks of 94000x6 arrays.
For the past two weeks I have been trying various big data techniques in Matlab to optimize the way to read all of this data. The fastest way seems to be to turn it into one large file with all the data, which MUST be built by appending 94000 files of 259000x6 arrays (and not the other way around, due to the native structure of the data). However, one very peculiar thing that I have found is that no matter how I build my giant .mat file (e.g. 259000x94000x6 or 94000x259000x6) the read speed using matfile is ALWAYS an order of magnitude quicker when reading it in 259000x6 chunks rather than 94000x6 chunks. I've tried using '-v7.3' with and without compression, I've tried chunking it into smaller files of 3GB each and for-looping through these files, I've tried turning it into a fileDataStore, and nothing seems to allow me to read the data in 94000x6 chunks as fast as I can in 259000x6 chunks! Has anyone else experienced this, know why this is, and/or know a workaround?
Thanks!
1 commentaire
Rik
le 2 Nov 2018
Is it possible to either share some of the data or to write some code that generates representative data?
Réponses (1)
Voir également
Catégories
En savoir plus sur Gaussian Process Regression dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!