Make this script faster

8 vues (au cours des 30 derniers jours)
samy rima
samy rima le 9 Déc 2015
Modifié(e) : Colin Edgar le 17 Déc 2015
Dear all,
I have a txt file (eyetracker log) that has 12 columns and 2398068 rows and this code to import it:
The first line is the header with variable names, and only column number 9 is strings, the rest is double
Is there a way to make this script run faster?
Thanks for the insight
filename = 'file.txt' ;
% - Get structure from first line.
fid = fopen( filename, 'r' ) ;
line = fgetl( fid ) ;
fclose( fid ) ;
% - Build formatSpec for TEXTSCAN.
fmt = {'%f%f%f%f%f%f%f%f%s%f%f%f'} ;
% - Read full file.
fid = fopen( filename, 'r' ) ;
data = textscan( fid, fmt, Inf, 'Delimiter', ';' ) ;
fclose( fid ) ;
data = ([data{:}]) ;
data(2:end,9)=num2cell((strcmp(data(2:end,9),'Event 1 > Stimulation')));
data=cellfun(@str2double,data(2:end,[1:8 10:end]),'un',0);
  5 commentaires
jgg
jgg le 17 Déc 2015
I had a similar issue. I ended up doing the initial data cleaning in Stata or R since it was easier to reformat the columns.
Colin Edgar
Colin Edgar le 17 Déc 2015
I can't make fscanf ignore the first "" string, for example:
frmt = '%*s%s%s%s%s%s%s%s%s%s%s%s%s%[^\n\r]';
A = fscanf(fid, frmt, [12, inf]);
A = "
Unless I do this:
A = fscanf(fid, '%s', [12, inf]);
A = 12 x 16833 (Char)
What I want is:
A = 12 x 16833 double

Connectez-vous pour commenter.

Réponses (1)

Colin Edgar
Colin Edgar le 17 Déc 2015
Modifié(e) : Colin Edgar le 17 Déc 2015
Here is my solution, takes only ~1sec to run per file (~2MB 12 x 18000). This is for the example data I posted above, but with the initial "timestamp" removed. I believe this answers the OP issue as well, since data was very similar.
formatSpec = '%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f,%f\n'%
fid = fopen(flnm,'r');
t1 = fgetl(fid); %reads past heading, I know it's a hack but...
t1 = fgetl(fid);
t1 = fgetl(fid);
t1 = fgetl(fid);
mat = fscanf(fid, formatSpec, [12,inf]);
mat = mat'; %transpose to correct layout
fclose(fid);
Versus my old version which took ~15sec (similar to approach of OP)
formatSpec = '%s%s%s%s%s%s%s%s%s%s%s%s'
fid = fopen(flnm,'r');
C = textscan(fid,formatSpec,'HeaderLines',4,'Delimiter',',');
mat = cell2mat(cellfun(@str2double,C,'UniformOutput',false));
fclose(fid);

Catégories

En savoir plus sur Workspace Variables and MAT Files dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by