How to read variables from multiple '.tab' file?

16 vues (au cours des 30 derniers jours)
Tez
Tez le 18 Juil 2017
Commenté : Yuanzheng Wen le 18 Juin 2021
I have sequence of data files('.tab' file) each having 11900 rows and 236 columns. I have to read some of the variables from each file. For that I opened some of the files from the folder. But I can't read the variables. The variable columns have both NaN and numerical values. Only NaN values are shown instead of numerical values.
clear all;
clc;
files=dir(fullfile('C:\Users\Documents\2015\02\*.tab'));
for i=1:2
fid(i)=fopen(files(i).name);
files(i).values=textscan(fid(i), '%s','delimiter','','HeaderLines',296,'MultipleDelimsAsOne',1);
formatSpec = '%19s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%16s%s%[^\n\r]';
dataArray = textscan(files(i).name,formatSpec, 'Delimiter', '', 'WhiteSpace', '', 'ReturnOnError', false);
raw = repmat({''},length(dataArray{1}),length(dataArray)-1);
for k=1:11900;l=1:236;
raw(k,l) = raw(length(dataArray{1}),length(dataArray)-1);
n(k,l) = str2double(raw(:, 2));
h(k,l) = str2double(raw(:, 197));
end
fclose('all');
end
How can I read multiple files and the variables in the file in MATLAB? I am attaching one of the file here.
  3 commentaires
Stephen23
Stephen23 le 18 Juil 2017
Modifié(e) : Stephen23 le 18 Juil 2017
@Tez: please edit your question and upload a sample file by clicking the paperclip button.
Based on such a vague description the best advice you could expect is something like "try the file import tools, e.g. dlmread or importdata, reading their help carefully". Once you give us a sample file then we can test it ourselves.
You should also read these:
Tez
Tez le 18 Juil 2017
I tried to read data using import tool. But the whole data is reading as single variable including the headerlines in single cell instead of 236.

Connectez-vous pour commenter.

Réponses (2)

Stephen23
Stephen23 le 18 Juil 2017
Modifié(e) : Stephen23 le 24 Oct 2019
This code automatically locates the header lines by checking for the # character at the start of the line, the variable mywant specifies which columns you want from the matrix: all other numeric columns are simply ignored and are not read into MATLAB memory. The code also ignores all string columns, although you could easily extend the code to import them too.
The variable mywant lets you request any numeric column/s (e.g. cX, cY, etc) by entering from minimum one row to maximum six rows of data header (starting from the top, i.e. row1, row2, etc.):
mywant = {{cXrow1text,cXrow2text,...}, {cYrow1text,cYrow2text,...}, ...}
If you are going to call this in a loop I would strongly suggest that you put this code into a function and call the function in the loop.
mywant = {{'Electron','Density',''},{'Spacecraft','Altitude','Aeroid'}};
mypath = '';
myname = 'mvn_kp_insitu_20150202_v10_r01.tab';
%myname = 'mvn_kp_insitu_20150512_v15_r01.tab';
myfull = fullfile(mypath,myname);
%
[fid,msg] = fopen(myfull,'rt');
assert(fid>=3,msg);
%
% Read lines until last '#':
vec = NaN(1,8);
str = '#';
while ~feof(fid) && strncmp(str,'#',1)
vec([2:end,1]) = vec;
vec(1) = ftell(fid);
str = fgetl(fid);
end
fseek(fid,vec(end),'bof');
str = fgetl(fid);
soh = ftell(fid); % start of header
vec = regexp(str,'\d+','end');
num = numel(vec); % number of columns
vec = diff([0,vec]); % column widths
%
% Read header lines:
fmt = sprintf('%%%dc',vec); % fixed-width columns
hdr = textscan(fid,fmt,6,'Whitespace','');
hdr = cellfun(@cellstr,hdr,'uni',0);
hdr = cellfun(@strtrim,hdr,'uni',0);
%
% Locate the requested headers:
fun = @(h)any(cellfun(@(w)all(strcmp(w(:),h(1:numel(w)))),mywant));
idx = cellfun(fun,hdr);
% Identify any string columns:
opt = {'MultipleDelimsAsOne',true, 'CollectOutput',true, 'HeaderLines',6};
fmt = repmat('%s',1,num);
fseek(fid,soh,'bof');
dat = textscan(fid,fmt,1,opt{:});
dat = dat{1};
ids = isnan(str2double(dat)) & ~strcmpi('NaN',dat);
%
% Generate format string:
fmt = repmat({'%*f'},1,num);
fmt(idx) = {'%f'}; % requested columns
fmt(ids) = {'%*s'}; % string columns all ignored, but this could be changed...
fmt = horzcat(fmt{:});
% Read requested data:
fseek(fid,soh,'bof');
dat = textscan(fid,fmt,opt{:});
dat = dat{1};
%
fclose(fid);
It produces this output:
>> size(dat)
ans =
4359 2
>> dat
dat =
NaN 222.850
196.000 224.600
186.000 226.370
170.000 228.170
142.000 229.990
99.300 231.830
58.100 233.700
102.000 235.580
103.000 237.490
162.000 239.420
101.000 241.380
148.000 243.350
185.000 245.350
151.000 247.370
178.000 249.420
118.000 251.480
128.000 253.570
145.000 255.670
156.000 257.800
166.000 259.950
171.000 262.130
173.000 264.320
148.000 266.540
166.000 268.780
171.000 271.030
190.000 273.310
134.000 275.610
162.000 277.940
105.000 280.280
113.000 282.640
109.000 285.030
etc
  5 commentaires
Stephen23
Stephen23 le 18 Juin 2021
"How can I rewrite your code to read a sequence of the tab files at a time instead of one at a time? "
Run the code inside a loop, just as the documentation shows:
Yuanzheng Wen
Yuanzheng Wen le 18 Juin 2021
Thanks for your reply, I will try it!

Connectez-vous pour commenter.


Jan
Jan le 18 Juil 2017
Notes: There is no need for fullfile, if you have one argument only. Storing the fileIDs in a vector by fid(i) is nopt useful, if you close all files by fclose('all') in each itereation. Better use fid=fopen(...) and fclose(fid).
fid(i)=fopen(files(i).name) does not consider the folder. Better:
fid(i)=fopen(fullfile(files(i).folder, files(i).name))
You import the file at first by textscan(fid(i)) and then again with textscan(files(i).name). Why do you do this? I'd expect this to fail.
I do not understand the purpose of
raw(k,l) = raw(length(dataArray{1}),length(dataArray)-1);
n(k,l) = str2double(raw(:, 2));
h(k,l) = str2double(raw(:, 197));
All three assignments access the same elements of raw in each iteration. The expression does not depend on k, so the loop is a waste of time. As long as it is not clear what you want to achieve, suggesting a modification would be based on guessing only.
Finally the output raw, n, h is overwritten in the iterations of the for i loop.
  1 commentaire
Tez
Tez le 18 Juil 2017
The folder has 30 files. What will be the code to read second and 197th columns of each file?

Connectez-vous pour commenter.

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by