How can I pull only a portion of a CSV into memory?

1 vue (au cours des 30 derniers jours)
MathWorks Support Team
MathWorks Support Team le 29 Mai 2019
I have a very large CSV file that is continually updating to add new rows of data.
How can I pull only the last 300 lines of the CSV into memory so that it can be plotted.

Réponse acceptée

MathWorks Support Team
MathWorks Support Team le 13 Juin 2019
There are a couple of ways to achieve this goal.
The first approach is to use the "SelectedVariableNames" and "DataLines" options in conjunction with "readtable" to only pull in the desired columns and rows. This approach has a relatively short run time, but there are a couple of limitations. You would need to know how many rows are present in the CSV file in order to set your "DataLines" option correctly. Additionally, the "DataLines" option for "readtable" was added in MATLAB R2018a, so you would need to ensure you were running that release or a later one.
All together this approach would look something like as follows:
opts = detectImportOptions('airlinesmall.csv');
opts.SelectedVariableNames = {'ArrDelay', 'DepDelay'};
opts.DataLines = [123225 inf];
T = readtable('airlinesmall.csv', opts);
The second approach utilizes datastores and tall arrays to only pull into memory the parts of the CSV that you would like to manipulate. With this approach, you do not need to know how many rows are in your CSV prior to running the script, but creating and evaluating the tall array can take a slightly longer run time. It would look something like as follows:
ttds = tabularTextDatastore('airlinesmall.csv');
ttds.SelectedVariableNames = {'ArrDelay', 'DepDelay'};
ttds.TreatAsMissing = 'NA';
tt = tall(ttds);
val = 300;
TT = gather(tail(tt, val));
Note: If you have Parallel Computing Toolbox, by default the 'tall' function will start a parpool. You can turn this setting off from the Parallel Computing Toolbox Preferences panel by deselecting:
"Automatically create a parallel pool (if one doesn't already exist) when parallel keywords (e.g. parfor) are executed.

Plus de réponses (0)

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits


Version

R2017b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by