MATLAB Answers

Using Read Table with a predefined Table size

2 views (last 30 days)
Lorcan Conlon
Lorcan Conlon on 10 Jan 2020
Edited: Stephen Cobeldick on 12 Jan 2020
Hi there,
I am trying to read data into a table using MATLAB. However the number of columns is not consistent in each row, i.e. some rows have 9 elements whereas others have 11 in this first example:
617803209 150000 C 51 9 0011000000000000 00000000 -95377380.68704939 -7906875.675565993
617803209 450000 C 51 9 0011000000000000 00000000 -95528144.22045627 -8107893.719792397
617803209 550000 C 51 9 1111000000000000 00000000 -95578398.73087561 -8174899.733608363 780 771
617803209 650000 C 51 9 0011000000000000 00000000 -95628653.24111354 -8241905.747220647
617803210 150000 C 51 9 0011000000000000 00000000 -95879925.78519529 -8576935.8062835
or in the second data set I want to read most of the rows have 16 elements but there are groups of rows with 19 elements.
617803209 0 C 19 0 0000111111111000 00000000 -1697651.517474127 -1697651.422455324 -1862428.491510178 679 412 544 15 16 17
617803209 0 C 22 0 0000111111111000 00000000 -411866.3831508817 -411866.2891460223 -576312.996122811 80 21 16 27 28 29
617803210 0 C 3 0 0000111111111111 00000000 22999470.13951635 22999469.86291851 22999471.60374566 -1075717.26276794 -1075717.167502564 -1240317.743928786 208 35 96 3 4 5-578717.5758491368 80 21 16 27 28 29
617803210 0 C 23 0 0000111111111111 00000000 21118070.54289924 21118069.80844691 21118067.11885317 -3174781.317712049 -3174781.22065765 -3339359.833580616 589 202 323 9 10 11
617803211 0 C 2 0 0000111111111000 00000000 -1897991.062354102 -1897990.966750627 -2062516.821723463 216 41 92 18 19 20
617803211 0 C 3 0 0000111111111000 00000000 -1078335.18024887 -1078335.090744338 -1242935.661877398 208 35 96 3 4 5
The problem is that when I use read table on this data it creates a table which has always 16/9 elements per row, i.e. it always picks the lower number of elements per row. When MATLAB then reads in rows with the higher number of elements it creates two rows, one with the first 16/9 elements and a second row with the remaining 2/3 elements of the previous row. The rest of the row is filled out with NAN, e.g.:
617803220 0 'C' 2 0 111111111111.000 0 22859262.8404678 22859262.2050950 22859259.3977525 -1929159.97480669 -1929159.87880769 -2093685.71832130 221 42 97
18 19 '20' NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
I need some way to avoid this. So I was wondering if it possible that I specify to MATLAB beforehand that I want to force the table to have 11/19 elements per row, so that when MATLAB comes across the rows with less elements than this it simply fills them in with zeros. Is something like this possible?
Many Thanks,
Lorcan
p.s. it is not possible for me to go through and do this manually as the data sets are very very large.

  0 Comments

Sign in to comment.

Accepted Answer

Meg Noah
Meg Noah on 12 Jan 2020
Processing difficult text files - here the text file is space delimited with one or more spaces, but has a variable number of columns. The processing first creates a csv file with nans for missing columns which are presumed at the end of the line. This process does not work for data that have commas in the text file.
% first reformat the data as a csv file
filename = 'challengingData.txt';
str = fileread(filename);
lines = regexp(str, '\r\n|\r|\n', 'split');
lines = lines';
maxCommas = 0;
for iline = 1:length(lines)
lines{iline} = strrep(regexprep(lines{iline},'\s+',' '),' ',',');
if ( length(strfind(lines{iline},',')) > maxCommas)
maxCommas = length(strfind(lines{iline},','));
end
end
for iline = 1:length(lines)
if ( length(strfind(lines{iline},',')) < maxCols)
nAdd = maxCols - length(strfind(lines{iline},','));
lines{iline} = horzcat(lines{iline},repmat(',nan',1,nAdd));
end
end
% output as a csv file
fid = fopen(strrep(filename,'.txt','.csv'),'w');
fprintf(fid,'%s\n',lines{:});
fclose(fid);
myTable = readtable(strrep(filename,'.txt','.csv'));

  1 Comment

Sign in to comment.

More Answers (1)

Stephen Cobeldick
Stephen Cobeldick on 12 Jan 2020
Edited: Stephen Cobeldick on 12 Jan 2020
Accessing files via the OS is slow, note that creating an intermediate file requires accessing a file minimum three times. A simpler and more efficient solution is to only import the file just once, adjust the columns in MATLAB memory, and then use cell2table or similar to convert to table.
For example:
>> str = fileread('test1.txt');
>> spl = regexp(str,'[\r\n]+','split');
>> spl = regexp(spl,'\s+','split');
>> mxl = max(cellfun('length',spl'));
>> spl = cellfun(@(c)[c,repmat({'0'},1,mxl-numel(c))],spl,'uni',0);
>> spl = vertcat(spl{:});
>> idx = [1:2,4:5,8:mxl]; % optional: convert to numeric
>> spl(:,idx) = num2cell(str2double(spl(:,idx)));
>> tbl = cell2table(spl)
tbl =
spl1 spl2 spl3 spl4 spl5 spl6 spl7 spl8 spl9 spl10 spl11
_________ _______ ____ ____ ____ __________________ __________ ___________ ___________ _____ _____
6.178e+08 1.5e+05 'C' 51 9 '0011000000000000' '00000000' -9.5377e+07 -7.9069e+06 0 0
6.178e+08 4.5e+05 'C' 51 9 '0011000000000000' '00000000' -9.5528e+07 -8.1079e+06 0 0
6.178e+08 5.5e+05 'C' 51 9 '1111000000000000' '00000000' -9.5578e+07 -8.1749e+06 780 771
6.178e+08 6.5e+05 'C' 51 9 '0011000000000000' '00000000' -9.5629e+07 -8.2419e+06 0 0
6.178e+08 1.5e+05 'C' 51 9 '0011000000000000' '00000000' -9.588e+07 -8.5769e+06 0 0
And the output of your other sample file (note that '5-578717.5758491368' is converted to NaN):
tbl =
spl1 spl2 spl3 spl4 spl5 spl6 spl7 spl8 spl9 spl10 spl11 spl12 spl13 spl14 spl15 spl16 spl17 spl18 spl19 spl20 spl21 spl22 spl23 spl24 spl25
_________ ____ ____ ____ ____ __________________ __________ ___________ ___________ ___________ ___________ ___________ ___________ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____ _____
6.178e+08 0 'C' 19 0 '0000111111111000' '00000000' -1.6977e+06 -1.6977e+06 -1.8624e+06 679 412 544 15 16 17 0 0 0 0 0 0 0 0 0
6.178e+08 0 'C' 22 0 '0000111111111000' '00000000' -4.1187e+05 -4.1187e+05 -5.7631e+05 80 21 16 27 28 29 0 0 0 0 0 0 0 0 0
6.178e+08 0 'C' 3 0 '0000111111111111' '00000000' 2.2999e+07 2.2999e+07 2.2999e+07 -1.0757e+06 -1.0757e+06 -1.2403e+06 208 35 96 3 4 NaN 80 21 16 27 28 29
6.178e+08 0 'C' 23 0 '0000111111111111' '00000000' 2.1118e+07 2.1118e+07 2.1118e+07 -3.1748e+06 -3.1748e+06 -3.3394e+06 589 202 323 9 10 11 0 0 0 0 0 0
6.178e+08 0 'C' 2 0 '0000111111111000' '00000000' -1.898e+06 -1.898e+06 -2.0625e+06 216 41 92 18 19 20 0 0 0 0 0 0 0 0 0
6.178e+08 0 'C' 3 0 '0000111111111000' '00000000' -1.0783e+06 -1.0783e+06 -1.2429e+06 208 35 96 3 4 5 0 0 0 0 0 0 0 0 0

  0 Comments

Sign in to comment.

Sign in to answer this question.

Tags


Translated by