Sift through large amounts of data from csv

2 vues (au cours des 30 derniers jours)
Diango GOo
Diango GOo le 6 Mai 2021
How am I suppose to open a csv with large amount of data?
By the way I need to randomly select by certain options
Can anyone help me open the csv and teach me how to select randomly?

Réponses (1)

Shadaab Siddiqie
Shadaab Siddiqie le 17 Juin 2021
From my understanding you want to import and operate on large csv file. If the csv file too large for your system to handle, I workaround would be that you can break it down into block, e.g. 500MB. Here is the code which might help you:
blockSize = 500e6 ; % Choose large enough so not too many blocks.
tailSize = 100 ; % Choose large enough so larger than one value representation.
% - Open file.
fId = fopen( 'largeFile.csv' ) ;
% - Read first line, convert to double, determine #columns.
line = fgetl( fId ) ;
data = sscanf( line, '%f,' ) ;
nCols = numel( data ) ;
lastBit = '' ;
while ~feof( fId )
% - Read and pre-process block.
buffer = fread( fId, blockSize, '*char' ) ;
isLast = length( buffer ) < blockSize ;
buffer(buffer==10) = ',' ;
buffer(buffer==13) = '' ;
% - Pre-pend last bit of last block.
if ~isempty( lastBit )
buffer = [lastBit; buffer] ; %#ok<AGROW>
end
% - Truncate to last ',' and keep last bit for next iteration.
if ~isLast
n = find( buffer(end-tailSize:end)==',', 1, 'last' ) ;
cutAt = length(buffer) - tailSize + n ;
lastBit = buffer(cutAt:end) ;
buffer(cutAt:end) = [] ;
end
% - Parse.
data = [data; sscanf( buffer, '%f,' )] ; %#ok<AGROW>
end
% - Close file.
fclose( fId ) ;
% - Reshape data vector -> array.
data = reshape( data, nCols, [] )' ;
There might be many other approaches, you can refer readmatrix and testscan for more information.

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by