Question about optimizing reading data from text file

2 vues (au cours des 30 derniers jours)
Brian
Brian le 23 Mai 2013
Hello, thanks for reading this,
I currently have a reader that reads in mesh files, and it works, but depending on the size of the file it can take a very long time. I was hoping I can optimize it for speed.
What I do first is read in a text file and change every line into a matrix of characters using the lines:
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A = char( cac{1} );
where A is my character matrix. I then search through the text file for identifiers for data I need. How I accomplish this is by setting start of data indices and end of data indices. I basically read this line by line, and at the moment, I assume it will always be formatted in a certain way.
After I have these indices, I use sscanf functions to read the characters as %f or %x numbers and store them into matrices. This is the part where the profiler says it takes the longest to complete.
I posted the MATLAB reader function here: http://pastebin.com/FFtgXzg4, since it is a bit long to post here. My specific questions are: do I have to convert the whole text import into a character matrix, and is there any way I can do this without needing a for loop? The loops using sscanf take a very long time.
It works, but just barely so. I can send a test import file if needed.
  1 commentaire
Cedric
Cedric le 24 Mai 2013
Could you post e.g. 20 lines of your data file, and define these identifiers that are are referring to?

Connectez-vous pour commenter.

Réponses (1)

Jonathan Sullivan
Jonathan Sullivan le 23 Mai 2013
Modifié(e) : Jonathan Sullivan le 23 Mai 2013
You may want to use fread and regexp.
Without seeing your file, I can't say for sure this will produce the same result, but it should give you a good starting point.
% Using regexp and fread
fid = fopen(filename,'r');
tic;
A = regexp(fread(fid,'*char')','\n','split');
A = char( A{:} );
toc
fclose(fid);
% Using textscan
fid = fopen(filename,'r');
tic;
B = textscan(fid,'%[^\n]');
B2 = char(B{1});
toc
fclose(fid);
  1 commentaire
Brian
Brian le 23 Mai 2013
It seems that the text scan I have goes slightly faster than the regexp/fread combination. There is one last part of the code that seems to be giving me problems:
When I have my start and end indices, I use sscanf line by line to give me the real data I need. However, some of my character matrices can be very large: sometimes spanning hundreds of thousands of rows (depending on the number of tetrahedra I have).
Is it possible to read this in any kind of intelligent fashion using sscanf line by line, or use it as a vector component, or should I look into exporting the matrix to a formatted text file and re-importing it using textread and hex2dec?
In these areas, I will always have the following combination of characters:
xxx xxx xxxx x x,
where I believe it can be split by a space delimiter. That leaves me with five hexadecimal values per row.

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by