How can I load a large csv file?

I have a large csv file (6GB) and try to load it into MATLAB and save it into structure file.
I am currrently using textscan, and MATLAB freeze and the computer stopped responding after a certain time.
The file has 54,200,000 lines with 10 data in each line. I tried loading only few columns at a time, and it is still not working.
Is there a way I can load them all at once?
Thank you in advance~~~

Réponses (1)

Cedric
Cedric le 16 Avr 2013
Modifié(e) : Cedric le 16 Avr 2013

0 votes

Did you try using CSVREAD and DLMREAD? The latter would allow you loading the file by block.
Also, what type of data is stored in the file? Could you copy/paste the first two rows here? Storing and array of size 54,200,000 x 10 as double requires a little more than 4GB RAM. What kind of system are you working with? If it can't handle this, you could read by block and convert into a smaller type/class for storing.

6 commentaires

zheng
zheng le 16 Avr 2013
the file contains only numbers like:
1357014085000 58609 41.4002021500 -88.0114677748 10035.88200 196.20920 151.66576 227.40741 151.66576 ZAU B737 1357014090000 58609 41.3954721678 -88.0080549443 10141.70400 203.76750 151.66603 236.45369 151.66603 ZAU B737 1357014095000 58609 41.3905620328 -88.0045124423 10255.33700 211.04308 151.66644 245.22408 151.66644 ZAU B737
I have divided the file into several smaller .csv files, but then I need to do comparison between these data in different files.
I am working on a MAC with 12GB RAM~
I will try csvread for the time being.
Thank you ~
Cedric
Cedric le 16 Avr 2013
Modifié(e) : Cedric le 16 Avr 2013
I see, the fact that you have text in the last column will prevent you from using CSVREAD or DLMREAD efficiently.
Have you looked at this thread yet?
zheng
zheng le 30 Avr 2013
Sorry that I get back to you this late. I have looked at that thread and it seems like my MATLAB will turn into a freezing mode and I can't really track the status and whether it will eventually dead.
For each .csv piece translated into structure file, I get 1.2G per file. I have total 20 of them. Which means that, even if I have them loaded all at once and translated into structure file, I would still probably encounter the huge file problem of my structure file.
Is there a way I can fix this?
Thank you so much
Cedric
Cedric le 30 Avr 2013
No problem! What structure does your structure file have? Do you need all the data from all files present in memory for treatment before you can start building this file, or could you treat the whole by smaller chunks (i.e. import a CSV file, export part of the structure file, import the second CSV file, export the next part of the structure file, etc)? Also, do you need all the columns of the input files or only a few of them?
zheng
zheng le 30 Avr 2013
The problem is, I need to do a comparison among all the data in that big CSV file. The smaller chunks strategy would require me to do cross file comparison. I tried loading only few columns of the data with the entire 6G, my MATLAB doesn't seem happy and stopped working :[
Cedric
Cedric le 30 Avr 2013
But which columns do you need? all? And what kind of processing/comparison do you have to perform?

Connectez-vous pour commenter.

Question posée :

le 16 Avr 2013

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by