Fastest way to add string

5 vues (au cours des 30 derniers jours)
Alan
Alan le 23 Sep 2014
Réponse apportée : Alan le 23 Sep 2014
I'm dealing with very large csv files. I'm having little to no problem with speed in reading from them with readtable. However, I have found (and reported) a bug in readtable where a blank value in the first column (the line starts with the delimiter, e.g. ',') throws off all the data. A lot of my files have blank values in the first column (due to the way the equipment I'm using records the data)
So, I have to "preprocess" the files and look for these blank columns in the csv file. The most efficient method I've found is the following:
fprintf('Reading File...');
ch = fread(YGID, [1,chunksize], 'int8=>char');
%cch = char(ch');
fprintf('Getting Number Of Lines...');
nol = sum(ch == sprintf('\n')); % number of lines
fprintf('%i\n',nol);
fprintf('Replacing final commas...\n');
cch = regexprep(ch,',(\r|\n)+','$1');
clear ch;
fprintf('Getting line locations...\n');
hlocs = regexp(cch,'\n');
fprintf('Writing Header File...\n');
fwrite(HDID,cch(hlocs(2)+1:hlocs(10)));
fprintf('Replacing Initial Commas\n');
ccch = regexprep(cch,'(\r|\n)+,','$1 ,');
YGID is the file pointer from an fopen. Note that I'm purposely making new variables (not memory efficient) as I have 16 GB of RAM available on my machine and I find making a completely new variable is faster. However, once the file is of a sufficient size (>20 MB, I have some over 200MB), even this becomes very slow. The line it is getting stuck on is "ccch = regexprep(cch,'(\r|\n)+,','$1 ,');" I suspect it's because with each additional space being added (there are hundreds of thousands) it's reallocating memory for the variable. I've tried to "preallocate" the new variable with "ccch = blanks(chunksize + nol);" before it and it didn't seem to make a difference.
Is there any more efficient way to do this task?

Réponse acceptée

Alan
Alan le 23 Sep 2014
Found my own answer. strrep is surprisingly faster than regexprep I had to add a conditional to check the OS, though:
if ispc || isunix
fpatt = sprintf('\n,');
rpatt = sprintf('\n, ');
else
fpatt = sprintf('\r,');
rpatt = sprintf('\r, ');
end
ccch = strrep(cch,fpatt,rpatt);

Plus de réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Produits

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by