MATLAB Answers

0

accessing large MAT file

Asked by Sean Little on 11 Nov 2019
Latest activity Commented on by Sean Little on 11 Nov 2019
I am trying to access data stored in a large MAT file. The file is 72G of Simulink sim data.
Now, obviously I cannot use the LOAD command on my laptop with 16G of RAM. I thought the reason MathWorks provides the MATFILE command was to allow for accessing large MAT files without loading them.
But that doesn't seem to be the case.
When I attempt to access the file using the MATFILE command, Matlab behaves as if it were loading all that data into memory. My memory utilization goes to 98%, I get an out of memory error, and then Matlab silently crashes and exits.
So I go back to my big linux machine that I used to run Simulink and create this file, and run the MATFILE command there. And indeed it looks like Matlab is loading the whole file into RAM. I am hoping to divide the file up there into separate MAT files, but it is taking a really long time to load this data, and also using all available RAM.
Which leads to my questions: What is the MATFILE command doing? Is this expected behavior? Am I stuck rerunning my simulations and putting all results into separate MAT files? How are truely huge datasets stored and manipulated in Matlab? Evidently it is not with MAT files...
Thanks.

  1 Comment

Sean Little on 11 Nov 2019
After about an hour and a half of waiting, the MATFILE command returned an object without any memory errors on my Linux machine. Now when I try to access individual elements in the MAT file, it appears that the whole file has to be loaded into memory again, even for small informational data fields in the file. I am going to need a different approach. Unless someone can suggest a workaround, I am going to have to abandon using the MATFILE command.

Sign in to comment.

2 Answers

Answer by Sara Nadeau on 11 Nov 2019
 Accepted Answer

I believe you are having trouble with the matfile function because of the format of the logged data.
If you logged the data in Simulink using Dataset format (default format for several releases), you can create Simulink.SimulationData.DatasetRef objects that reference the data in the file without loading it into memory. To access and manipulate data for individual signals, you can create matlab.io.datastore.SimulationDatastore objects.
These additional topics may be helpful for guiding you through creating and using DatasetRef and SimulationDatastore objects:
I hope this helps!

  1 Comment

Sean Little on 11 Nov 2019
I am glad there is a way to do this. I will take a look at those doc links and try this out. Thanks for the help!

Sign in to comment.


Guillaume
Answer by Guillaume
on 11 Nov 2019

See the limitations section of matfile to see what it can and can't do. In particular, the granularity of matfile is typically at the variable level. I.e you can select which variables to load, however apart from numerical matrices, if you load a variable you load all of it.
It's unclear what's in your mat file but it sounds like it's objects, perhaps just one object, in which case you won't benefit much from matfile.

  2 Comments

Sean Little on 11 Nov 2019
When I read the limitations section, I was assuming that "user defined objects" did not include MathWorks defined Simulink objects. Obviously that was a bad assumption on my part.
It is really surprising to me that there is so much overhead required to access data in a large MAT file. I thought that is exactly what this command was supposed to avoid.
Guillaume
on 11 Nov 2019
It's not designed for objects unfortunately, it's designed for accessing large numerical matrices.
Since you have such a large mat file I assume you're using the 7.3 format. This format is based on HDF5, which you can read using various functions. I've no idea if that would make reading the file easier and you'd have to figure out the data structure yourself as mathworks do not document their format.

Sign in to comment.