Help reducing huge time overhead when executing system() or unix() commands from Matlab function
Afficher commentaires plus anciens
I have a data processing program that goes through gigabytes of .csv text. The approach I've been taking is to use grep (Mac or Linux) to break up my massive data files into digestible files.
Here's the catch: I have to do it a few hundred times. No biggie, grep is fast, I'll knock this out in no time. A test from bash:

Sweet! 1.6s per file and I'll have this done before I finish my coffee break.
Well....

Holy Toledo, that makes a huge difference. My pre-processing is now taking multiple hours per data file. I have tried to look at Matlab environment variables and even dipped my toe into calling through java (checkout jsystem on the exchange!)
Still no luck. I can have my function generate a bash script and then I can run the script manually, I suppose, but I need this to work for other people in the office who are less command line savvy (that's saying something!)
Can anyone shed some light on what's going on or point me towards an improved time?
Thanks a bunch!
% I ran this example on a 3.23 gig text file.
% 16,706,909 lines
% 3,225,789,919 characters
[~, r] = unix('time LC_ALL=C grep -F "My Data Tag String" "/Path/To/Giant/Data/File.csv" > output.csv')
2 commentaires
Walter Roberson
le 17 Nov 2017
You could have it generate a script that had all of the commands in it to do all of the splitting, and then do a single system() -- thus getting the overhead only once.
Nick Counts
le 17 Nov 2017
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur Startup and Shutdown dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
