Question about reading in text files: alternatives

Hello, thanks for reading this.
I wrote a reader for importing ANSYS mesh files, but in my opinion its a bit inelegant. What I do is read the file, write all lines as strings, and then parse through the file for identifiers (like point and connectivity information). It works, but it is slow. Any file around 1 MB loads slowly, and anything larger loads exponentially slower.
Is there a better way of doing this? I currently open the files and parse every line into a string with the commands:
function [Points, vFaceMx] = getPointsAndFacesforMESH(fileName)
wb2 = waitbar(0,'Loading Mesh');
filename=fileName;
fid = fopen(filename, 'rt');
nLines = 0;
while (fgets(fid) ~= -1),
nLines = nLines+1;
end
fclose(fid);
fid = fopen(filename, 'rt');
A=[];
ct = 0;
%%Write all lines as strings
while feof(fid) == 0
tline = fgetl(fid);
A_c=size(A, 2);
t_c=size(tline, 2);
if A_c > t_c
tline=[tline, NaN(size(tline, 1), A_c-t_c)];
end
if A_c < t_c
A=[A, NaN(size(A, 1), t_c-A_c)];
end
A = [A; tline];
end
fclose(fid);
And from there, I parse through using strcmp commands. I load the data I want into data arrays of strings, then I use sscanf commands to bring it back into numerical data.
Any advice would be appreciated.

6 commentaires

Cedric
Cedric le 21 Fév 2013
Modifié(e) : Cedric le 21 Fév 2013
Could you give us a sample of the content of your ANSYS file please? If there isn't already a function on FEX for importing this kind of mesh, your options will be to read the file char by char, line by line, or the whole file in one shot, and then process the content using sscanf or regexp. The former is good when there is a regular structure and the latter is good when you have to match patterns.
Brian
Brian le 21 Fév 2013
Modifié(e) : Brian le 21 Fév 2013
It is an ANSYS .msh formatting. I'm not sure if there's online documentation for it, but its Ansys's version of GAMBIT's .msh formatting. I checked, and there isn't an import for it.
It is an ASCII file format that has a point section (proceeded by a (10(xxxx)) header, and this section is a x y and z coordinate list of all the points. Next is the connectivity (face) section denoted by a (13(xxx)) header, which has the hexadecimal point indices of the tetrahedral and surface triangles. Each line denotes a triangular or tetrahedral element. That's all I'm worried about at the moment. There are other parts of the file, but I'm not worried about them at the moment.
Is it possible to read it line by line, and look for file headers without converting it to strings?
Cedric
Cedric le 21 Fév 2013
You are reading lines as strings. The only conversion that you have to do is towards whatever numerical type you need. The easiest way to get an answer is probably to open your .msh file using a text editor, and to cut and paste e.g. the first 50 lines under your question.
Brian
Brian le 22 Fév 2013
This is a possibility, especially since all I'm only manipulating numerical data at the moment. It may change in the future, though.
Is the only way this can be done is by translating the entire line into strings and parsing through the strings? Or is there another way of doing this?
Cedric
Cedric le 25 Fév 2013
Modifié(e) : Cedric le 25 Fév 2013
As mentioned above, the best way to discuss the method is certainly to paste part of the file (e.g. 20-40 first rows) below the original question. When you have a text file, what you read is most often strings, so there is no need perform a translation to string.If you look at the class of tline right after the call to fget(), you will see that it is char. The only thing that you need to do in principle is parsing and extracting content as string/integer/double/etc from the lines that you read. There are several ways to achieve this. As mentioned, for most simple cases were lines have a simple, regular structure, f/scanf() will be fine; for more complicated cases, regular expressions [regexp()] are usually an invaluable tool when available.
Morteza
Morteza le 25 Fév 2013
Modifié(e) : Morteza le 25 Fév 2013
str2doubleq.cpp
this function is really fast to converting string data to numerical data. you can download it here and use according it's description.

Connectez-vous pour commenter.

 Réponse acceptée

per isakson
per isakson le 25 Fév 2013
Modifié(e) : per isakson le 25 Fév 2013
Some comments:
  • I assume it is a text file that resembles the example below
  • I guess that line-breaks are not really significant
  • the first while-loop counts the lines - is that needed?
  • in the second while-loop A is growing, which is bad for performance
  • the lines are padded with char(0) - space char(32) is "more standard"
  • I assume your file fits in memory (ram)
  • the example code below with textscan returns A, which is identical to A returned by getPointsAndFacesforMESH - with the exception of padding with char(32).
tic,
str = fileread( filespec );
et = toc;
tic,
fid = fopen( filespec, 'r' );
cac = textscan( fid, '%[^\n]' );
fclose(fid);
A1 = char( cac{1} );
et = [ et, toc ];
tic,
[ A2, ~ ] = getPointsAndFacesforMESH( filespec );
et = [ et, toc ];
.
Sample text file
(0 "GAMBIT to Fluent File")
(0 "Dimension:") (2 2)
(10 (0 1 10 1 2)) (10 (1 1 10 1 2)(
0.0000000000e+000 1.0000000000e+000
1.0000000000e+000 1.0000000000e+000
0.0000000000e+000 0.0000000000e+000
1.0000000000e+000 0.0000000000e+000
1.0000000000e+000 3.3333333333e-001
1.0000000000e+000 6.6666666667e-001
0.0000000000e+000 6.6666666667e-001
0.0000000000e+000 3.3333333333e-001
3.3333333333e-001 1.0000000000e+000
6.6666666667e-001 1.0000000000e+000
3.3333333333e-001 0.0000000000e+000
6.6666666667e-001 0.0000000000e+000
6.6666666667e-001 3.3333333333e-001
6.6666666667e-001 6.6666666667e-001
3.3333333333e-001 3.3333333333e-001
3.3333333333e-001 6.6666666667e-001 ))
(0 "Faces:") (13(0 1 18 0))
(13(3 1 9 3 0)
( 2 1 7 9 0 2 7 8 6 0 2 8 3 3 0 2 3 b 3 0 2 b c 2 0 2 c ... 6 4 0 2 6 2 7 0 ))
(13(4 a c 14 0)( 2 1 9 0 9 2 9 a 0 8 2 a 2 0 7 ))
(13(6 d 18 2 0)
( 2 d c 1 2 2 5 d 1 4 2 f b 2 3 2 d f 2 5 2 f 8 3 6 2 e ... 7 8 2 9 10 8 9 ))
(0 "Cells:") (12 (0 1 9 0)) (12 (2 1 9 1 3))
(0 "Zones:") (45 (2 fluid fluid)())
(45 (3 wall new_wall.4)())
(45 (4 mass-flow-inlet wall.4)())
(45 (6 interior default-interior)())

3 commentaires

Brian
Brian le 25 Fév 2013
Modifié(e) : Brian le 25 Fév 2013
I like this. I'll try it out in a few minutes.
Just to answer a few of your questions:
The first loop is not needed. It was used for error checking, but I can get rid of it at this point. As for the other points, I understand.
What do you mean by: A in the second while-loop A is growing, which is bad for performance?
I have enough RAM (2 GB right now allocated to java heap memory).
I will also try your example.
Thanks a lot!
Brian
Brian le 25 Fév 2013
Wow, I just tried this, and its amazing how much faster this is. Thanks, a lot. I'm going to look more into these lines in my own time:
fid = fopen( filespec, 'r' ); cac = textscan( fid, '%[^\n]' ); fclose(fid); A = char( cac{1} );
because these seem to contain all the magic. My code is now benchmarked by the visualization of the mesh, which is to be expected of MATLAB.
Thanks a lot!
per isakson
per isakson le 26 Fév 2013
Modifié(e) : per isakson le 26 Fév 2013
"A in the second while-loop A is growing," [sic]
Search for "preallocating memory" in the help. Doc says:
Preallocating Memory
Repeatedly expanding the size of an array over time, (for example, adding more
elements to it each time through a programming loop), can adversely affect the
performance of your program. This is because
MATLAB has to spend time allocating more memory each time you increase the
size of the array.
This newly allocated memory is likely to be noncontiguous, thus slowing down
any operations that MATLAB needs to perform on the array.
.
enough RAM
when working with files it makes a big difference if the file fits in the system cache. See the Windows Task Manager.

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by