Sift through large amounts of data from csv
2 views (last 30 days)
Show older comments
Diango GOo on 6 May 2021
Answered: Shadaab Siddiqie on 17 Jun 2021
How am I suppose to open a csv with large amount of data?
By the way I need to randomly select by certain options
Can anyone help me open the csv and teach me how to select randomly?
Shadaab Siddiqie on 17 Jun 2021
From my understanding you want to import and operate on large csv file. If the csv file too large for your system to handle, I workaround would be that you can break it down into block, e.g. 500MB. Here is the code which might help you:
blockSize = 500e6 ; % Choose large enough so not too many blocks.
tailSize = 100 ; % Choose large enough so larger than one value representation.
% - Open file.
fId = fopen( 'largeFile.csv' ) ;
% - Read first line, convert to double, determine #columns.
line = fgetl( fId ) ;
data = sscanf( line, '%f,' ) ;
nCols = numel( data ) ;
lastBit = '' ;
while ~feof( fId )
% - Read and pre-process block.
buffer = fread( fId, blockSize, '*char' ) ;
isLast = length( buffer ) < blockSize ;
buffer(buffer==10) = ',' ;
buffer(buffer==13) = '' ;
% - Pre-pend last bit of last block.
if ~isempty( lastBit )
buffer = [lastBit; buffer] ; %#ok<AGROW>
% - Truncate to last ',' and keep last bit for next iteration.
n = find( buffer(end-tailSize:end)==',', 1, 'last' ) ;
cutAt = length(buffer) - tailSize + n ;
lastBit = buffer(cutAt:end) ;
buffer(cutAt:end) =  ;
% - Parse.
data = [data; sscanf( buffer, '%f,' )] ; %#ok<AGROW>
% - Close file.
fclose( fId ) ;
% - Reshape data vector -> array.
data = reshape( data, nCols,  )' ;
There might be many other approaches, you can refer readmatrix and testscan for more information.
Find more on Low-Level File I/O in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!Start Hunting!