How can I save the beginning and end positions of each sequence in a cell array?
7 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
So I am looping through codons and recording them on a .txt file. The script works, but I need the sequence to begin at the starting codon position, stop at the end codon then continue through the cell array while recording all of the following start and end codon sequences. I would just like to know the best option I can use to tweak my code here. Thanks in advance!
fid = fopen("sequence_long2.txt",'r');
C = textscan(fid,'%3s');
x = C{1}
fclose(fid);
%Start sequence
ss = 1;
% end sequence
es = 183479;
seq_id = long_codon(x(ss:es));
function seq = long_codon(v)
seq = (v);
for pos = 1:length(seq)
if strcmp(seq{pos},'TAC')
index = find(strcmp(v,seq{pos}));
StartPos = index;
elseif (strcmp(seq{pos},'ACT') || strcmp(seq{pos},'ATT') || strcmp(seq{pos},'ATC'))
index = find(strcmp(v,seq{pos}));
EndPos = index;
end
end
fid2 = fopen('report_long.txt','w+');
fprintf(fid2,'Name: OP \n');
fprintf(fid2,'Lab 13: DNA Pattern Matching\n \n');
fprintf(fid2,'Start Position of Gene is: %d \n',StartPos);
fprintf(fid2, 'End Position of Gene is: %d \n',EndPos);
fclose(fid2);
end
14 commentaires
Rik
le 28 Nov 2020
I would urge you to change to strfind first. Then you can loop through all start codons, removing later start codons if they are inside the gene being read.
Réponse acceptée
Rik
le 29 Nov 2020
%Since your code is working fine you can keep it as is.
%I just used my own function to use your data.
x_conv=readfile('https://www.mathworks.com/matlabcentral/answers/uploaded_files/430218/sequence_long2.txt');
x_conv=x_conv{1};
%find all possible start codons and stop codons
Start_loc = strfind(x_conv,'TAC');
End_loc = cellfun(@(stopcodon)strfind(x_conv,stopcodon),{'ATC','ACT','ATT'},'UniformOutput',false);
End_loc = horzcat(End_loc{:});
n=0;
while n<numel(Start_loc)
n=n+1;
this_start=Start_loc(n);
%select all possible end codons
this_end=End_loc(End_loc>this_start);
%figure out which is the first end codon with an offset of 3
this_end=this_end(mod(this_end-this_start,3)==0);
this_end=this_end(1);
%now we need to remove elements in Start_loc that in the current gene
Start_loc(Start_loc>this_start & Start_loc<this_end)=[];
%store the end as well
End_loc(n)=this_end;
end
%remove extra values in End_loc
End_loc((n+1):end)=[];
genes=cell(size(End_loc));
for n=1:numel(End_loc)
genes{n}=x_conv(Start_loc(n):End_loc(n));
end
0 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Graph and Network Algorithms dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!