Code for saving files as .mat is very very slow

Hi,
I am loading multiple .csv files into matlab and re-saving them as .mat files. The code takes way too long (hours) to save each file as a .mat file. Code is below:
myfiles = dir('*.txt') ;
p = length(myfiles) ;
for i = 1:p
thisfile = myfiles(i).name ;
y = importdata(thisfile) ;
matfile = strcat(thisfile,'.mat')
save(matfile,'y') ;
clearvars y
end
Any hint or help?
Thank you

10 commentaires

Rik
Rik le 30 Juin 2020
Are these many tiny files on a hard drive? For small files the file system overhead is relatively large per file, and hard drives are fairly slow compared to SSDs.
In other words: how did you determine the save function is to blame? And why are you clearing a variable that you overwrite anyway?
Curious Mind
Curious Mind le 30 Juin 2020
For small number of files (say 100), it takes a few seconds to save each file as .mat file. If the number of csv files increases to say 1000, its takes much much longer
Rik
Rik le 30 Juin 2020
Did you use the profiler? Did you check your task manager (called resource monitor on Mac and Ubuntu if I recall correctly) to see if the drive is working hard?
Curious Mind
Curious Mind le 30 Juin 2020
It takes lots of memory.
dpb
dpb le 30 Juin 2020
What is "it"?
What is typical value for p?
As another said, eliminate the clearvars line; it does nothing.
How large are the .csv files?
What is the content of the .csv files? The bottleneck just might be in importdata instead.
Curious Mind
Curious Mind le 30 Juin 2020
Thanks for your response. Each file is about 40KB and they contain 3 headers and numeric data
If they are all the same structure, then if I recall correctly, importdata() can be used to generate code, and the generate code should be faster than importdata by itself.
If you have a header, then is importdata() returning a struct with one field per column? Have you tried save -struct ? Especially if you are defaulting to -v7.3, saving compound variables such as struct is slower than numeric variables.
Curious Mind
Curious Mind le 30 Juin 2020
Yes I get a struct like you described. How would I incorporate save struct into the code? will this make it faster?
dpb
dpb le 30 Juin 2020
"it takes a few seconds to save each file as .mat file. If the number of csv files increases to say 1000, its takes much much longer"
Yes, overall time, but have you profiled to prove it's actually the save operation that's the culprit?
Which OS? Olden FAT32 days large numbers of files in a subdirectory would really bog things down; NTFS is better in that regard but I don't know whether the problem really goes away--or is mostly just not observed by not running things that emphasize the problems that might be.
I wonder if there's any chance it could have anything to do with acessing/releasing system resources like file handles, etc..so one ends up with the slowness owing to waits inside the system calls in the i/o routines?
All just conjecture...
save(matfile, '-struct', 'y') ;

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Large Files and Big Data dans Centre d'aide et File Exchange

Produits

Version

R2019b

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by