Iterating Through Data to Create Tall Table

3 vues (au cours des 30 derniers jours)
Shane Smith
Shane Smith le 30 Juil 2018
I have a script that is working through files, comparing signals in those files, generating 'hits' where those signals are different from each other, then trying to analyze those hits to explain them. The results are written to a table, which looks like it will have ~2000 results per file. I'd like to append each file's results to the table, so that I can generate a more overall report and have better access to all the results afterwards. However, I am running these scripts on about 500 files, so this will be a very large table.
I've tried to read up on using tall objects in Matlab to work with larger files, but all the examples seem to be based on reading in from large input files using a datastore. In my case, the input files aren't large, there are just a lot of them. I thought that tall would be an excellent fit to my application, because during the analysis I only care about the specific file I'm working on, so I don't need access to all the previous results.
Is there a way to initially create a tall table, then just append as needed during the iteration? So I'm creating a table like this:
% Create initial table from columns in procReport
reportTable = cell2table(procReport(2:end,:));
reportTable.Properties.VariableNames = {'signalName','genesisSignalIndex',...
'motohawkSignalIndex','testNumber','testResults','genesisHitIndex',...
'motohawkHitIndex','genesisAvgInterval','motohawkAvgInterval'};
I pass that out from this function to the main function, and was hoping to append each file's results as they are done. But I know that it will be too large for memory, so I was hoping to use a tall table instead.

Réponse acceptée

Guillaume
Guillaume le 30 Juil 2018
Since a tall table/array can't fit in memory it needs to have a backend from which to fetch its content as required. That backend is a datastore of some form. Matlab is fairly flexible in where the data is physically stored (local disk, amazon S3, Hadoop, etc.) and which format (text, excel, etc.) but in all cases, that backend must already exist and is read only.
So, no you can't write to a tall array.
Depending on your ultimate goal, what you can do is create a custom datastore that fetch and process the data as appropriate from your 500 files. This would allow you to easily get at the process data without having to write the file handling code. It still won't help you to save the result into a different backend.

Plus de réponses (0)

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits


Version

R2017a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by