Find line containing word in a mixed format txt file

I have a FORTRAN code that generate a mixed format output text file of various lengths (depending on code run parameters). I am seeking to write a Matlab script that finds a table located somewhere in that text file, which will always contain the same number of columns and the same column variables.
The data might look something like this (this is representative of the formatting)
...
xourbwofigefifhtiryefsldtldcrpewicbqocyttbwloiopauyveapmwvkylxepftamjocccgpnybtubzhnqqnacyihnyhdxcvshiyusemrerqceebxylcqtpgksivchcrsxbcnggypsysnjdwtkbdzptffyvsesvqwpsnzhkphrftuykjogzpyqhqmuhenqulupujukqsyfqrxmtzxomaojjerczyrqmqhyoihhcoixxkyouqxumkefltqsuraaapoxourbwofigefifhtiryefsldtldcrpewicbqocyttbwloiopauyveapmwvkylxepftamjocccgpnybtubzhnqqnacyihnyhdxcvshiyusemrerqceebxylcqtpgksivchcrsxbcnggypsysnjdwtkbdzptffyvsesvqwpsnzhkphrftuykjogzpyqhqmuhenqulupujukqsyfqrxmtzxomaojjerczyrqmqhyoihhcoixxkyouqxumkefltqsuraaapo
$$$$$$$$$$$$$$$$$$$$$$$$$$$ output $$$$$$$$$$$$$$$$$$$$$$$$$$$
A B C D E F G H I J K
------ --------- ------ ------------------------------------------------------------------------------ ---------------------------------- --------------
s m kg A K mol
2000.0 7.744E-02 5.000 3.098E+01 3.098E+01 0.000E+00 0.000E+00 3.098E+01 3.87194E-01 3.87194E-01 1.000E+00
2005.0 4.646E-02 4.988 1.868E+01 1.868E+01 0.000E+00 0.000E+00 1.868E+01 6.19481E-01 6.19481E-01 1.000E+00
2010.0 6.827E-02 4.975 2.758E+01 2.758E+01 0.000E+00 0.000E+00 2.758E+01 9.60817E-01 9.60817E-01 1.000E+00
2015.0 1.038E-01 4.963 4.213E+01 4.213E+01 0.000E+00 0.000E+00 4.213E+01 1.47961E+00 1.47961E+00 1.000E+00
2020.0 9.099E-02 4.950 3.713E+01 3.713E+01 0.000E+00 0.000E+00 3.713E+01 1.93456E+00 1.93456E+00 1.000E+00
2025.0 9.283E-02 4.938 3.806E+01 3.806E+01 0.000E+00 0.000E+00 3.806E+01 2.39869E+00 2.39869E+00 1.000E+00
2030.0 5.814E-02 4.926 2.396E+01 2.396E+01 0.000E+00 0.000E+00 2.396E+01 2.68937E+00 2.68937E+00 1.000E+00
(table might continue an arbitrary number of rows down)
....
xourbwofigefifhtiryefsldtldcrpewicbqocyttbwloiopauyveapmwvkylxepftamjocccgpnybtubzhnqqnacyihnyhdxcvshiyusemrerqceebxylcqtpgksivchcrsxbcnggypsysnjdwtkbdzptffyvsesvqwpsnzhkphrftuykjogzpyqhqmuhenqulupujukqsyfqrxmtzxomaojjerczyrqmqhyoihhcoixxkyouqxumkefltqsuraaapoxourbwofigefifhtiryefsldtldcrpewicbqocyttbwloiopauyveapmwvkylxepftamjocccgpnybtubzhnqqnacyihnyhdxcvshiyusemrerqceebxylcqtpgksivchcrsxbcnggypsysnjdwtkbdzptffyvsesvqwpsnzhkphrftuykjogzpyqhqmuhenqulupujukqsyfqrxmtzxomaojjerczyrqmqhyoihhcoixxkyouqxumkefltqsuraaapo
....

2 commentaires

Please give an example of what the data looks like, and how does the table (that is to be found) looks like.
Even better if you can attach a sample data.
Dyuman, I've edited my inquiry to include representative txt file data with table format.

Connectez-vous pour commenter.

 Réponse acceptée

Star Strider
Star Strider le 11 Déc 2023

0 votes

I am not certrain how the table actually exists in the file (and it would help significantly to have the actual file rather than an imitation of it to work with). That aside, for FORTRAN files, using readtable with fixedWidthImportOptions woud likely work. You will probably need to experiment.

6 commentaires

I've attached the txt file, and I'm looking to extract this table located about midway through:
This seems to work —
% type('file.txt') % Examine File (Optional)
VT = cellstr(repmat("double", 1, 11)); % 'VariableTypes' Cell Array
opts = fixedWidthImportOptions('NumVariables',11, 'VariableWidths',[8 11 9 10 11 11 11 11 13 13 13], 'DataLines',223, 'VariableTypes',{VT{:}}, 'VariableNamesLine',220, 'VariableUnitsLine',222);
T1 = readtable('file.txt', opts)
T1 = 668×11 table
m rip lam ripl riapl riahl rial ril riapi rii taua ____ _______ _____ _____ _____ _____ ____ _____ _______ _______ ____ 2000 0.07744 5 30.98 30.98 0 0 30.98 0.38719 0.38719 1 2005 0.04646 4.988 18.68 18.68 0 0 18.68 0.61948 0.61948 1 2010 0.06827 4.975 27.58 27.58 0 0 27.58 0.96082 0.96082 1 2015 0.1038 4.963 42.13 42.13 0 0 42.13 1.4796 1.4796 1 2020 0.09099 4.95 37.13 37.13 0 0 37.13 1.9346 1.9346 1 2025 0.09283 4.938 38.06 38.06 0 0 38.06 2.3987 2.3987 1 2030 0.05814 4.926 23.96 23.96 0 0 23.96 2.6894 2.6894 1 2035 0.04384 4.914 18.15 18.15 0 0 18.15 2.9085 2.9085 1 2040 0.1015 4.902 42.24 42.24 0 0 42.24 3.4161 3.4161 1 2045 0.0615 4.89 25.72 25.72 0 0 25.72 3.7236 3.7236 1 2050 0.07926 4.878 33.31 33.31 0 0 33.31 4.1199 4.1199 1 2055 0.03941 4.866 16.64 16.64 0 0 16.64 4.3169 4.3169 1 2060 0.05896 4.854 25.02 25.02 0 0 25.02 4.6117 4.6117 1 2065 0.1107 4.843 47.22 47.22 0 0 47.22 5.1654 5.1654 1 2070 0.0512 4.831 21.94 21.94 0 0 21.94 5.4214 5.4214 1 2075 0.05672 4.819 24.42 24.42 0 0 24.42 5.705 5.705 1
VN = T1.Properties.VariableNames;
figure
plot(T1.m, real(T1{:,2:end}), '-')
hold on
% plot(T1.m, imag(T1{:,2:end}), '--')
hold off
grid
xlabel(VN{1})
set(gca, 'YScale','log')
legend(VN{2:end}, 'Location','bestoutside')
For fun (and to get some idea of what is in the file), I plotted it as well.
.
T1 has 668 rows, but the table in the file has 268 rows.
VT = cellstr(repmat("double", 1, 11)); % 'VariableTypes' Cell Array
opts = fixedWidthImportOptions('NumVariables',11, 'VariableWidths',[8 11 9 10 11 11 11 11 13 13 13], 'DataLines',223, 'VariableTypes',{VT{:}}, 'VariableNamesLine',220, 'VariableUnitsLine',222);
T1 = readtable('file.txt', opts);
size(T1)
ans = 1×2
668 11
T1(264:273,:) % mostly NaNs after row 268
ans = 10×11 table
m rip lam ripl riapl riahl rial ril riapi rii taua ____ _______ _____ _____ ________ _____ ____ _____ ______ ______ ____ 3315 0.07032 3.017 77.28 77.28+0i 0 0 77.28 71.353 71.353 1 3320 0.1078 3.012 118.8 118.8+0i 0 0 118.8 71.892 71.892 1 3325 0.1412 3.008 156.1 156.1+0i 0 0 156.1 72.599 72.599 1 3330 0.08589 3.003 95.24 95.24+0i 0 0 95.24 73.028 73.028 1 3335 0.1272 2.999 141.5 141.5+0i 0 0 141.5 73.664 73.664 1 NaN NaN NaN NaN 0+1i NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN+0i NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN+0i NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN+0i NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN+0i NaN NaN NaN NaN NaN NaN
This works thank you!
Voss
Voss le 14 Déc 2023
@Matthew: It's not a problem for you that the table T1 contains 400 extra rows relative to the table in the file? See my answer for a method that produces a table without any extra rows.
@Matthew — As always, my pleasure!

Connectez-vous pour commenter.

Plus de réponses (2)

hello
you could do this
out = readcell('Doc1.txt');
eof = size(out,1);
ind1 = find(contains(out,'output'));
% extract valid portion of cell array
out = out(ind1+1:eof-1,:)
header = (split(out(1,:)))'
units = (split(out(3,:)))'
values = str2num(char(out(4:end,:)))

7 commentaires

this is the result in my command window , based on the attached txt file (copy paste from your post)
out =
10×1 cell array
{'A B C D E F G H I J K' }
{'------ --------- ------ ------------------------------------------------------------------------------ ---------------------------------- --------------' }
{'s m kg A K mol'}
{'2000.0 7.744E-02 5.000 3.098E+01 3.098E+01 0.000E+00 0.000E+00 3.098E+01 3.87194E-01 3.87194E-01 1.000E+00' }
{'2005.0 4.646E-02 4.988 1.868E+01 1.868E+01 0.000E+00 0.000E+00 1.868E+01 6.19481E-01 6.19481E-01 1.000E+00' }
{'2010.0 6.827E-02 4.975 2.758E+01 2.758E+01 0.000E+00 0.000E+00 2.758E+01 9.60817E-01 9.60817E-01 1.000E+00' }
{'2015.0 1.038E-01 4.963 4.213E+01 4.213E+01 0.000E+00 0.000E+00 4.213E+01 1.47961E+00 1.47961E+00 1.000E+00' }
{'2020.0 9.099E-02 4.950 3.713E+01 3.713E+01 0.000E+00 0.000E+00 3.713E+01 1.93456E+00 1.93456E+00 1.000E+00' }
{'2025.0 9.283E-02 4.938 3.806E+01 3.806E+01 0.000E+00 0.000E+00 3.806E+01 2.39869E+00 2.39869E+00 1.000E+00' }
{'2030.0 5.814E-02 4.926 2.396E+01 2.396E+01 0.000E+00 0.000E+00 2.396E+01 2.68937E+00 2.68937E+00 1.000E+00' }
header =
1×11 cell array
Columns 1 through 10
{'A'} {'B'} {'C'} {'D'} {'E'} {'F'} {'G'} {'H'} {'I'} {'J'}
Column 11
{'K'}
units =
1×6 cell array
{'s'} {'m'} {'kg'} {'A'} {'K'} {'mol'}
values =
1.0e+03 *
Columns 1 through 9
2.0000 0.0001 0.0050 0.0310 0.0310 0 0 0.0310 0.0004
2.0050 0.0000 0.0050 0.0187 0.0187 0 0 0.0187 0.0006
2.0100 0.0001 0.0050 0.0276 0.0276 0 0 0.0276 0.0010
2.0150 0.0001 0.0050 0.0421 0.0421 0 0 0.0421 0.0015
2.0200 0.0001 0.0050 0.0371 0.0371 0 0 0.0371 0.0019
2.0250 0.0001 0.0049 0.0381 0.0381 0 0 0.0381 0.0024
2.0300 0.0001 0.0049 0.0240 0.0240 0 0 0.0240 0.0027
Columns 10 through 11
0.0004 0.0010
0.0006 0.0010
0.0010 0.0010
0.0015 0.0010
0.0019 0.0010
0.0024 0.0010
0.0027 0.0010
someone very clever may find a readtable based solution, but you may have to play with a lot of parameters to get it to work. I preferred to load everything in a cell array and then do a bit of uncomplicated post processing , but it's personnal opinion.
Mathieu,
When I try to run this, I get "Error using contains; First argument must be text." Do you have any advice?
Thanks.
hello Matthew
maybe I should work with your data file and not something I generated myself
can you share one file ?
Hello Mathieu,
I have uploaded my file. I am looking to extract this table data, located about midway through the txt file.
Can you help?
hello again
I have a one line code solution for you (with the help of the attached function, of course)
[outdata,head] = readclm('file.txt',11,219);
outdata 268x11 double
head = 4×119 char array
' m rip lam ripl riapl riahl rial ril riapi rii taua '
' ------ --------- ------ ----------------------------------------------------- ------------------------ ---------'
' m m/J/m A m/J/A m/J t '
' '
if you want to access the individual labels and units from head , you can do that
header = (split(strtrim(head(1,:))))'
units = (split(strtrim(head(3,:))))'
header = 1×11 cell array
{'m'} {'rip'} {'lam'} {'ripl'} {'riapl'} {'riahl'} {'rial'} {'ril'} {'riapi'} {'rii'} {'taua'}
units = 1×6 cell array
{'m'} {'m/J/m'} {'A'} {'m/J/A'} {'m/J'} {'t'}

Connectez-vous pour commenter.

filename = 'file.txt';
fid = fopen(filename,'r');
data = fread(fid,[1 Inf],'*char');
fclose(fid);
idx = strfind(data,'$ output $');
idx = idx(1);
n = 1 + nnz(data(1:idx) == newline());
T = readtable(filename,'NumHeaderLines',n+2);
T(any(isnan(T{:,:}),2),:) = [];
head(T)
m rip lam ripl riapl riahl rial ril riapi rii taua ____ _______ _____ _____ _____ _____ ____ _____ _______ _______ ____ 2000 0.07744 5 30.98 30.98 0 0 30.98 0.38719 0.38719 1 2005 0.04646 4.988 18.68 18.68 0 0 18.68 0.61948 0.61948 1 2010 0.06827 4.975 27.58 27.58 0 0 27.58 0.96082 0.96082 1 2015 0.1038 4.963 42.13 42.13 0 0 42.13 1.4796 1.4796 1 2020 0.09099 4.95 37.13 37.13 0 0 37.13 1.9346 1.9346 1 2025 0.09283 4.938 38.06 38.06 0 0 38.06 2.3987 2.3987 1 2030 0.05814 4.926 23.96 23.96 0 0 23.96 2.6894 2.6894 1 2035 0.04384 4.914 18.15 18.15 0 0 18.15 2.9085 2.9085 1
tail(T)
m rip lam ripl riapl riahl rial ril riapi rii taua ____ _______ _____ _____ _____ _____ ____ _____ ______ ______ ____ 3300 0.1136 3.03 123.7 123.7 0 0 123.7 70.035 70.035 1 3305 0.07579 3.026 82.79 82.79 0 0 82.79 70.414 70.414 1 3310 0.1176 3.021 128.9 128.9 0 0 128.9 71.002 71.002 1 3315 0.07032 3.017 77.28 77.28 0 0 77.28 71.353 71.353 1 3320 0.1078 3.012 118.8 118.8 0 0 118.8 71.892 71.892 1 3325 0.1412 3.008 156.1 156.1 0 0 156.1 72.599 72.599 1 3330 0.08589 3.003 95.24 95.24 0 0 95.24 73.028 73.028 1 3335 0.1272 2.999 141.5 141.5 0 0 141.5 73.664 73.664 1

Catégories

Produits

Version

R2023b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by