Effacer les filtres
Effacer les filtres

How do I make a cell with the following contents?

1 vue (au cours des 30 derniers jours)
Mohannad Abboushi
Mohannad Abboushi le 15 Jan 2017
Commenté : Guillaume le 16 Jan 2017
I am making a program that basically takes a string s as a single strand of DNA and returns the amino acid sequence of the longest gene it finds. Whereby, a gene is defined as a reading frame that: starts with AUG codon, ends with one of UAA,UAG, or UGA codon.
I tried making a cell of different "frames" but since they are not the same length I can't put them into an array. How do i work around this? Here's my code:
function [ptn]=Seq_transcribe2(x)
y=seq_transcribe1(x);
frames={};
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
starts=[];
stops=[];
allorfs={};
for i=1:3:numel(frames)-2
codon= frames([i i+1 i+2])
if codon=='AUG'
starts(end+1)=codon;
if strcmp(codon,'UAA') || strcmp(codon,'UAG') || strcmp(codon,'UGA')
stops(end+1)=codon;
end
stops= find(stops>starts,1)
lengthofthisstart=stops-starts
allorfs{end+1}=frame(starts:stops-1)
  2 commentaires
Niels
Niels le 15 Jan 2017
would be helpfull if you add the error message
Mohannad Abboushi
Mohannad Abboushi le 15 Jan 2017
Error using vertcat Dimensions of matrices being concatenated are not consistent.

Connectez-vous pour commenter.

Réponse acceptée

Guillaume
Guillaume le 15 Jan 2017
If I understood correctly, a simple way to find all genes would be:
[genesequences, starts, stops] = regexp(x, 'AUG.*?(UAA|UAG|UGA)', 'match', 'start', 'end');
And the longest sequence is of course:
[~, longestidx] = max(stops - starts);
longestsequence = genesequences{longestidx}
  2 commentaires
Arthur Goldsipe
Arthur Goldsipe le 16 Jan 2017
I think you need a slight change to account for the fact that all codons are 3 characters long:
[genesequences, starts, stops] = regexp(x, 'AUG(...)*?(UAA|UAG|UGA)', 'match', 'start', 'end');
Guillaume
Guillaume le 16 Jan 2017
Oh yes, as I know nothing about genes and codons, I didn't know that the number of characters between the start and end codon must be a multiple of three, but I should have inferred that from the original code.
Thanks.

Connectez-vous pour commenter.

Plus de réponses (1)

Niels
Niels le 15 Jan 2017
if i understood you right your problem is in one of the following lines:
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
allorfs{end+1}=frame(starts:stops-1)
if so, i cant replicate your problem, in cell arrays the length of the elements is irrelevant
>> a=1:3;
>> b=1:4;
>> c=1:5;
>> cell={a b c}
cell =
1×3 cell array
[1×3 double] [1×4 double] [1×5 double]
%=================================
>> a={}
a =
0×0 empty cell array
>> a{end+1}=1
a =
cell
[1]
>> a{end+1}=2
a =
1×2 cell array
[1] [2]
>> a{end+1}=[2 1]
a =
1×3 cell array
[1] [2] [1×2 double]
  2 commentaires
Mohannad Abboushi
Mohannad Abboushi le 15 Jan 2017
I feel like there's got to be an easier way to do this
Guillaume
Guillaume le 15 Jan 2017
If the following line
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end)
y(3:end)};
is indeed written on two lines, then yes matlab is going to issue a concatenation error since the line return is interpreted as a vertical concatenation.
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) y(3:end)};
or
frames={x(1:end) x(2:end) x(3:end) y(1:end) y(2:end) ...
y(3:end)};
would fix the error

Connectez-vous pour commenter.

Catégories

En savoir plus sur Bioinformatics Toolbox dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by