Reading large .csv data with 12 million rows

50 vues (au cours des 30 derniers jours)
Ganesh Naik
Ganesh Naik le 10 Nov 2021
Commenté : Mathieu NOE le 16 Déc 2021
Hi all, I have .CSV file with 14 columns and around 12 million rows. I would like to read the data and use it for further analysis. When I use "readtable" command it reads only first two columns with all the information but fails to read other column data. Is there anyway I can read the entire dataset once?
Thanks in advance.
  3 commentaires
Ganesh Naik
Ganesh Naik le 16 Déc 2021
Hi Mathieu thanks for your message. I have solved the problem using the following method:
1) I have used "Split CSV" program to split the original CSV files into 12 sub-CSV files (each one million rows)
2) Read each CSV data as:
data1 = readtable('Data-1.csv');
data2 = readtable('Data-2.csv');
.
.
.
data12 = readtable('Data-12.csv');
3) Combine the final file as:
Final_data = [data1; data2;data3; data4;data5; data6;data7; data8;data9; data10;data11; data12];
This created a large table for me (on the fly) at workspace and each time loading the above files dont take much memory. It worked for me, but I believe there maybe better alternative methods.
Mathieu NOE
Mathieu NOE le 16 Déc 2021
ok glad you havefound a workaround ! :)

Connectez-vous pour commenter.

Réponses (1)

Sivani Pentapati
Sivani Pentapati le 1 Déc 2021
Hi Ganesh,
You can try using readmatrix in place of readtable. Other workaround is to convert the 'csv' files into 'mat' files and save them with '-v7.3' option. Please refer to this answer for more information.
  1 commentaire
Ganesh Naik
Ganesh Naik le 16 Déc 2021
Dear Sivani, I have solved the problem using SplitCSV method, reading each file and combining the data (I have explained my method above). It worked for me.

Connectez-vous pour commenter.

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by