Read in specific range of large .csv
75 views (last 30 days)
I have very large .csv files that I am trying to work with, e.g. 7000 * 72000.
In each file the first column is a time vector. By saving these time vectors in separate files, I can load them in, get the row range of the dates of interest, and then use that to read in the rows of interest from the larger .csv?
However, I can't figure out how to apply this last step. Here is what I have so far...
%get time period of interest
%load in time vector
PSD_tvec=readtable(fullfile(tvec_folder,tvecfile1)); %read tvec and get times
%get row range of interest
idx=PSD_tvec_t>timeperiod(1) & PSD_tvec_t<timeperiod(end); %find rows in tvec
%which correspond to date range of interest
x=find(idx(:,1)>0); %get row numbers for reading in PSD
PSDfolder=('H:\SoundTrap\Boats\PSD Output\Duty cycle data'); %folder where PSD output files are
%PSDfile1=readtable(fullfile(PSDfolder,PSDfile1)); %read in PSD file
How can I select a range of interest as I read the .csv?
In addition to selecting specific rows, I could also cut the data down by selecting different columns. I have tried that this way:
...but for some reason, whilst this does select the desired column range, it doesn't read the full number of rows in the file and there are no error messages.
Alternative ways of solving the problem would be equally appreciated. I need to read in these large files but since it is time consuming and I don't always need all of the data, I am looking to be more efficient. Thanks
Raunak Gupta on 14 Nov 2020
From the question I understand that you want to import only a chunk of ‘.csv’ file for analyzing. The readmatrix with ‘Range’ should return the mentioned range of values for you. The only crux here is the number of columns that you want to import is greater than what a normal csv file can be displayed with in Excel, which is XFD and that corresponds to 16384 columns. Since you are using columns from 1 to 24000 the specified error is thrown.
So, if you can somehow store the transpose of the original matrix to the csv file that will resolve the current error as ranges for row and columns will reverse and fall into the limits.
If that is not possible you can use tall array to import the whole file but not in memory. And then you can choose the size using logical indexing and then finally get the desired matrix using gather.
Hope this helps!