Import data from multiple .dat files, remove headerlines, and read columns into array - but the number of headerlines differs across each .dat file

1 vue (au cours des 30 derniers jours)
I have a large number of .dat files in a folder sorted under names in the format "author_energy_radiationtype_cellline", and I am using the "dir" command to select the files that apply to particular energies, cell lines etc. Each .dat file has between 2 and 7 headerlines I want to skip. The files have the following format.
"SF(Dose (Gy)) created by Plot Digitizer 2.6.8"
"Date: 1/17/19 8:41:06 AM"
author year mod cell_line energy let
**** 2008 protons *** 6MV -$\mu$m
alpha alpha_err beta beta_err alpha_X alpha_X_err beta_X beta_X_err
0.291 0.000 0.041 0.000 0.291 0.000 0.041 0.000
Dose (Gy) SF Error
0.8312 0.7674 0.9121
1.8470 0.4560 0.5615
2.8600 0.2924 0.3457
4.8985 0.0761 0.1244
6.9425 0.0218 0.0344
I want to read the data under the 'Dose', 'SF', and 'Error' columns read into arrays, and I also need to extract the first and third values in the 6th row. Is there any way to do this when the number of headerlines changes from file to file?
This is my code so far. I can pick out the files with certain energies, etc. I can't seem to figure out how to actually extract the data in the way I described above.
% Specify the folder where files are located
myFolder = 'C:\Users\..\Desktop\CellSurvivalData';
% Check to make sure that folder actually exists. Warn user if it doesn't.
if ~isdir(myFolder)
errorMessage = sprintf('Error: The following folder does not exist:\n%s', myFolder);
uiwait(warndlg(errorMessage));
return;
end
% Get a list of all files in the folder with the desired file name pattern.
filePattern = fullfile(myFolder, '*235MeV*HSG*'); % Define desired parameters
theFiles = dir(filePattern); % List the files which satisfy these parameters
for k = 1 : length(theFiles)
baseFileName = theFiles(k).name;
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
% Read the data from each file
file{k} = readtable( fullFileName );
end
  1 commentaire
Stephen23
Stephen23 le 13 Nov 2019
"Is there any way to do this when the number of headerlines changes from file to file?"
Probably.
As long as you can provide rules that identify the lines that you want, then it is likely possible to read them in regardless of the other lines. The typical approach would be fopen, fgets with chekcs on the line content, then textscan for the table of data, and finally fclose.

Connectez-vous pour commenter.

Réponse acceptée

Turlough Hughes
Turlough Hughes le 13 Nov 2019
If 'Dose' appears as the first four elements of this line only, and similarly 'alpha' as the first five of the other line, you could do the following:
for k = 1 : length(theFiles)
baseFileName = theFiles(k).name;
fullFileName = fullfile(myFolder, baseFileName);
fprintf(1, 'Now reading %s\n', fullFileName);
% Read the data from each file
fid = fopen(fullFileName);
c = 0; store = false;
while true
l = fgetl(fid);
if l == -1 % Break loop
break
end
% Read Alpha and Beta
if strcmp(l(1:5),'alpha') %If first 5 characters of line are alpha
l = fgetl(fid); % get next line
first_data = str2num(l);
alpha_beta = [first_data(1) first_data(3)]; % store alpha and beta
end
% Read matrix of data below 'Dose'
store = strcmp(l(1:4),'Dose');
while store
l = fgetl(fid);
c=c+1;
if l == -1 % Exit and set store back to false
store = false;
break
end
data(c,:) = str2num(l);
end
end
file{k} = data;
clear data
end
  2 commentaires
Stephen23
Stephen23 le 14 Nov 2019
Rather than
while true
l = fgetl(fid);
if l == -1 % Break loop
break
end
simply
while ~feof(fid)
Use strncmp rather than indexing and strcmp.
Reading the matrix with textscan would be more efficient than doing it line-by-line.

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Large Files and Big Data dans Help Center et File Exchange

Produits


Version

R2019a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by