I want to convert a character series into numerical series using for loop

2 vues (au cours des 30 derniers jours)
S Kar
S Kar le 8 Juin 2022
Commenté : S Kar le 8 Juin 2022
I have a character sequence stored in variable DNA_SEQS = 'AGGTAT.....'. The sequence consists of four type of character 'A', 'C', 'T' & 'G', therefore I have used swith case to generate the numerical sequence. The code I have written is:
seqs = fastaread('AF0071891.fasta');
DNA_SEQS = seqs.Sequence;
len = length(DNA_SEQS);
for j = 1:5
x = [];
a = DNA_SEQS(j);
switch a
case 'A'
v = 0;
case 'C'
v = 1;
case 'G'
v = 2;
case 'T'
v = 3;
end
x(j+1) = [x(j) v];
end
By using this code I supposed to get a numerical array like [0,2,2,3,0] but I got an error as: Index exceeds matrix dimensions.
Please help

Réponse acceptée

dpb
dpb le 8 Juin 2022
Modifié(e) : dpb le 8 Juin 2022
for j = 1:5
x = [];
a = DNA_SEQS(j);
...
You wipe out what you put in x later every time you start through the loop again...don't do that!!! :)
x = [];
for j = 1:5
a = DNA_SEQS(j);
...
instead, although you should
  1. preallocate and asign into the array instead
  2. size x() based on the length of the string, not hardcode the loop count
N=strlength(DNA_SEQS);
x=zeros(1,N);
for j = 1:N
a = DNA_SEQS(j);
...
However, in MATLAB you don't need a loop; use a lookup table instead. One way (not necessarily the fastest, but pretty easy to code) would be
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS));
This would return for your sample above...
>> DNA_SEQS = 'AGGTAT';
DNA_VALS=interp1(double('ACGT'),0:3,double(DNA_SEQS))
DNA_VALS =
0 2 2 3 0 3
>>
  1 commentaire
S Kar
S Kar le 8 Juin 2022
Thank you but still got this error using the first method:
In an assignment A(:) = B, the number of elements in A and B must be the same.
The second method is working fine

Connectez-vous pour commenter.

Plus de réponses (1)

DGM
DGM le 8 Juin 2022
You can use ismember():
thisstr = 'AGGATATC';
charmap = 'ACGT';
[~,idx] = ismember(thisstr,charmap);
idx = idx-1
idx = 1×8
0 2 2 0 3 0 3 1
  4 commentaires
dpb
dpb le 8 Juin 2022
For exactly the reason I outlined above as a possibility -- it isn't a char() array --
>> DNA_SEQS='AGGTAT'; % assign as char() string (and array of char())
>> N=strlength(DNA_SEQS) % strlength() is same as length(x,2) here...
ans =
6
>> for i=1:N,disp(DNA_SEQS(i));end % works find for a char() array with () addressing
A
G
G
T
A
T
>> DNA_SEQS = cellstr('AGGTAT'); % redefine as a cellstr() instead...
>> N=strlength(DNA_SEQS) % strlength knows about what is in the cell
N =
6
>> for i=1:N,disp(DNA_SEQS(i));end % but it fails as you see...
{'AGGTAT'}
Index exceeds the number of array elements (1).
>>
WHY!!!???
>> size(DNA_SEQS) % because now the cellstr is a 1x1 CELL array, NOT 1x6 char() array...
ans =
1 1
>>
How to make work???
"Use the curlies, Luke!!!"
>> for i=1:N,disp(DNA_SEQS{1}(i));end
A
G
G
T
A
T
>>
NB: above the use of {1} to "dereference" the cell array back to the content of the char() array inside it -- the subsequent "smooth" parenstheses (i) then picks the ith element from that vector again, just as it did directly when it was "only" a char() array, not a char() array in a cell.
Strings behave similarly as cellstr(); you have to use {} (the "curlies") to reference inside the string to the individual characters that make up the string array element.
See cellstr and links there for addressing cell strings and cells in general.
S Kar
S Kar le 8 Juin 2022
Thank you so much for the elaboration.

Connectez-vous pour commenter.

Catégories

En savoir plus sur Cell Arrays dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by