Optimising my data importer for large datasets

1 vue (au cours des 30 derniers jours)
HC98
HC98 le 26 Mar 2023
Modifié(e) : Matt J le 26 Mar 2023
So I have this
txtFiles = dir('*.txt') ; %loads txt files
N = length(txtFiles) ;
Numit = N;
[~, reindex] = sort( str2double( regexp( {txtFiles.name}, '\d+', 'match', 'once' ))); % sorts files
txtFiles = txtFiles(reindex);
for i = 1:N
data = importdata(txtFiles(i).name);
x = data(:,1);
udata(:,i) = data(:,2) ;
end
I have quite a large dataset (well over 200 files) and it takes ages to load things. How can I speed this up? Is there some sort of prepocessing I can do like merge all the files into one or something? I don't know...
  1 commentaire
Walter Roberson
Walter Roberson le 26 Mar 2023
I wonder if using a datastore would be appropriate for your work?

Connectez-vous pour commenter.

Réponses (1)

Matt J
Matt J le 26 Mar 2023
Modifié(e) : Matt J le 26 Mar 2023
I don't see any pre-allocation of udata. Also, nothing is being done with x, so it will cut down on time if you don't create it.
udata=cell(1,N);
for i = 1:N
data = importdata(txtFiles(i).name);
%x = data(:,1);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);
  1 commentaire
Matt J
Matt J le 26 Mar 2023
Modifié(e) : Matt J le 26 Mar 2023
If the data files have many columns, it will also go faster if you read in only the first two columns, maybe using textscan.
udata=cell(1,N);
for i = 1:N
fileID = fopen(txtFiles(i).name);
data = textscan(fileID,'%f %f %*[^\n]');
fclose(fileID);
udata{i} = data(:,2) ;
end
udata=cell2mat(udata);

Connectez-vous pour commenter.

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits


Version

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by