Effacer les filtres
Effacer les filtres

Remove intermittent text when reading in a table from a .dat file

2 vues (au cours des 30 derniers jours)
L. Borealis
L. Borealis le 19 Fév 2021
Commenté : L. Borealis le 25 Fév 2021
Hi,
I am trying to use readtable for read in a .dat file. The file looks like this, where there could be 1 to very many entries in the columns that start with a "1'" here.
# NetMHCIIpan version 4.0
# Input is in PEPTIDE format
# Prediction Mode: EL+BA
# Threshold for Strong binding peptides (%Rank) 2%
# Threshold for Weak binding peptides (%Rank) 10%
# Allele: HLA-DPA10103-DPB10101
--------------------------------------------------------------------------------------------------------------------------------------------
Pos MHC Peptide Of Core Core_Rel Identity Score_EL %Rank_EL Exp_Bind Score_BA Affinity(nM) %Rank_BA BindLevel
--------------------------------------------------------------------------------------------------------------------------------------------
1 HLA-DPA10103-DPB10101 AAAAAAAAAAAAAAA 3 AAAAAAAAA 0.380 Sequence 0.020745 81.44 NA 0.366182 951.24 32.45
--------------------------------------------------------------------------------------------------------------------------------------------
Number of strong binders: 2 Number of weak binders: 0
--------------------------------------------------------------------------------------------------------------------------------------------
# Allele: HLA-DPA10103-DPB10201
--------------------------------------------------------------------------------------------------------------------------------------------
Pos MHC Peptide Of Core Core_Rel Identity Score_EL %Rank_EL Exp_Bind Score_BA Affinity(nM) %Rank_BA BindLevel
--------------------------------------------------------------------------------------------------------------------------------------------
1 HLA-DPA10103-DPB10201 BBBBBBBBBBBBBBBB 2 BBBBBBBBB 0.960 Sequence 0.491911 1.02 NA 0.712020 22.55 0.27 <=SB
--------------------------------------------------------------------------------------------------------------------------------------------
Number of strong binders: 2 Number of weak binders: 0
--------------------------------------------------------------------------------------------------------------------------------------------
# Allele: HLA-DPA10103-DPB10202
--------------------------------------------------------------------------------------------------------------------------------------------
Pos MHC Peptide Of Core Core_Rel Identity Score_EL %Rank_EL Exp_Bind Score_BA Affinity(nM) %Rank_BA BindLevel
--------------------------------------------------------------------------------------------------------------------------------------------
1[.......]
These columns would then start 2,3,4,[...]. I successfully use
opts = detectImportOptions('filename.dat');
opts.DataLines = [16 Inf];
opts.VariableNamesLine = 14;
readtable(fullfile('path','filename.dat',opts,'ReadVariableNames', true);
for files with a large number of columns between the ----, i.e. e.g.
# Allele: HLA-DPA10103-DPB10101
--------------------------------------------------------------------------------------------------------------------------------------------
Pos MHC Peptide Of Core Core_Rel Identity Score_EL %Rank_EL Exp_Bind Score_BA Affinity(nM) %Rank_BA BindLevel
--------------------------------------------------------------------------------------------------------------------------------------------
1 HLA-DPA10103-DPB10101 AAAAAAAAAAAAAAA 3 AAAAAAAAA 0.380 Sequence 0.020745 81.44 NA 0.366182 951.24 32.45
2 HLA-....
3 ....
....
....
50 HLA....
--------------------------------------------------------------------------------------------------------------------------------------------
Number of strong binders: 2 Number of weak binders: 0
--------------------------------------------------------------------------------------------------------------------------------------------
However, this does not work for short "fillings" and my code very much depends on being robust in either scenario.
I tried playing with the opts but did not get it to work. I would be very grateful for any advice! Maybe a method other than readtable (readtext?) is needed and then a conversion to a table? In the end I will need a table like this:
Thank you very much for your advice! I have spent a long time deleoping the code around this and this is the final part that keeps breaking...

Réponse acceptée

Vimal Rathod
Vimal Rathod le 22 Fév 2021
  1 commentaire
L. Borealis
L. Borealis le 25 Fév 2021
Thanks, Vimal!
I had actually seen that question but even the question description was not particularly clear to me. So I had left it. Thanks for pointing me back to it. I did use part of it in the end to come up with a working (yet not elegant) solution. Maybe it is useful to someone in the future:
S = regexp(fileread('S:\scratch\cdr1pool2pep1\out_00.dat'), '\r?\n', 'split');
S = S(~cellfun('isempty',S));
if isempty(S{end}); S(end) = []; end %regexp split leaves empty at bottom if file ended in \n which is common
nonheader = cellfun(@isempty, regexp(S, '^\s*#|^\s*-|^\s*P|^\s*N' )); %permit space before #
starts = strfind([false nonheader], [false true]);
stops = strfind([nonheader false], [true false]);
num_blocks = length(starts);
lenRows = length(starts(1):stops(1));
S_temp = cell(num_blocks,lenRows);
for K = 1 : num_blocks
S_temp(K,:) = S(starts(K):stops(K));
end
S = reshape(S_temp,[num_blocks*lenRows,1]);
writecell(S,'data.dat')
opts = detectImportOptions('data.dat');
tbl=readtable(('data.dat'),opts);

Connectez-vous pour commenter.

Plus de réponses (0)

Catégories

En savoir plus sur Logical dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by