Save a large array into equal length .csv files?
16 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Hi Guys, I am trying to save an adjusted very large data set into equal length .csv files. I am using the following script from this link with my own database:
%%Step 1 - create a tall table
varnames = {'ArrDelay', 'DepDelay', 'Origin', 'Dest'};
ds1 = datastore('airlinesmall.csv', 'TreatAsMissing', 'NA', ...
'SelectedVariableNames', varnames);
tt = tall(ds1);
%%Step 2 - operate on tall table
tt.TotalDelay = tt.ArrDelay + tt.DepDelay;
%%Step 3 - use tall/write to emit .mat files
writeDir = tempname
mkdir(writeDir);
write(writeDir, tt);
%%Step 4 - use parfor to parallelise the writetable loop
ds = datastore(writeDir);
N = numpartitions(ds, gcp);
csvDir2 = tempname
mkdir(csvDir2);
parfor idx1 = 1 : N
idx2 = 0;
subds = partition(ds, N, idx1);
while hasdata(subds)
idx2 = 1 + idx2;
fname = fullfile(csvDir2, sprintf('out_%06d_%06d.csv', idx1, idx2));
writetable(read(subds), fname);
end
end
I am adapting the script in step 4 to the following in order to specify that each .csv file has 20000 rows:
RequiredDataRowsPerFile = 20000;
ds = datastore(writeDir,'ReadSize',RequiredDataRowsPerFile);
It works to some degree as there is an impact; however, the outcome does not generate an equal distribution of .csv files in terms of number of rows (of course the last file will always be different).
I would appreciate any help. Thanks
Tim
0 commentaires
Réponses (0)
Voir également
Catégories
En savoir plus sur Large Files and Big Data dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!