Main Content

codoncount

Count codons in nucleotide sequence

Syntax

Codons = codoncount(SeqNT)
[Codons, CodonArray] = codoncount(SeqNT)
... = codoncount(SeqNT, ...'Frame', FrameValue, ...)
... = codoncount(SeqNT, ...'Reverse', ReverseValue, ...)
... = codoncount(SeqNT, ...'Ambiguous', AmbiguousValue, ...)
... = codoncount(SeqNT, ...'Figure', FigureValue, ...)
... = codoncount(SeqNT, ...'GeneticCode', GeneticCodeValue, ...)

Input Arguments

SeqNT

One of the following:

Examples: 'ACGT' or [1 2 3 4]

FrameValue

Integer specifying a reading frame in the nucleotide sequence. Choices are 1 (default), 2, or 3.

ReverseValue

Controls the return of the codon count for the reverse complement sequence of the nucleotide sequence specified by SeqNT. Choices are true or false (default).

AmbiguousValue

Character vector or string specifying how to treat codons containing ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default) — Skips codons containing ambiguous characters

  • 'bundle' — Counts codons containing ambiguous characters and reports the total count in the Ambiguous field of the Codons output structure.

  • 'prorate' — Counts codons containing ambiguous characters and distributes them proportionately in the appropriate codon fields containing standard nucleotide characters. For example, the counts for the codon ART are distributed evenly between the AAT and AGT fields.

  • 'warn' — Skips codons containing ambiguous characters and displays a warning.

FigureValue

Controls the display of a heat map of the codon counts. Choices are true or false (default).

GeneticCodeValue

Integer, character vector, or string specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'. You can also specify 'None'.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

Output Arguments

CodonsMATLAB structure containing fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, TTT), which contain the codon counts in SeqNT.
CodonArrayA 4-by-4-by-4 array containing the raw count data for each codon. The three dimensions correspond to the three positions in the codon, and the indices to each element are represented by 1 = A, 2 = C, 3 = G, and 4 = T. For example, the element (2,3,4) in the array contains the number of CGT codons.

Description

Codons = codoncount(SeqNT) counts the codons in SeqNT, a nucleotide sequence, and returns the codon counts in Codons, a MATLAB structure containing fields for the 64 possible codons (AAA, AAC, AAG, ..., TTG, TTT).

  • For sequences that have codons containing the character U, these codons are added to the corresponding codons containing a T.

  • If the sequence contains gaps indicated by a hyphen (-), then codons containing gaps are ignored.

  • If the sequence contains unrecognized characters, then codons containing these characters are ignored, and the following warning message appears:

    Warning: Unknown symbols appear in the sequence. These will be ignored.

[Codons, CodonArray] = codoncount(SeqNT) returns CodonArray, a 4-by-4-by-4 array containing the raw count data for each codon. The three dimensions correspond to the three positions in the codon, and the indices to each element are represented by 1 = A, 2 = C, 3 = G, and 4 = T. For example, the element (2,3,4) in the array contains the number of CGT codons.

... = codoncount(SeqNT, ...'PropertyName', PropertyValue, ...) calls codoncount with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

... = codoncount(SeqNT, ...'Frame', FrameValue, ...) counts the codons in the reading frame specified by FrameValue, which can be 1 (default), 2, or 3.

... = codoncount(SeqNT, ...'Reverse', ReverseValue, ...) controls the return of the codon count for the reverse complement sequence of SeqNT. Choices are true or false (default).

... = codoncount(SeqNT, ...'Ambiguous', AmbiguousValue, ...) specifies how to treat codons containing ambiguous nucleotide characters. Choices are:

  • 'ignore' (default)

  • 'bundle'

  • 'prorate'

  • 'warn'

... = codoncount(SeqNT, ...'Figure', FigureValue, ...) controls the display of a heat map of the codon counts. Choices are true or false (default).

... = codoncount(SeqNT, ...'GeneticCode', GeneticCodeValue, ...) controls the overlay of a grid on the heat map figure. The grid groups the synonymous codons according to GeneticCodeValue.

Examples

collapse all

seq = randseq(1000);
codons = codoncount(seq)
codons = struct with fields:
    AAA: 11
    AAC: 5
    AAG: 8
    AAT: 6
    ACA: 6
    ACC: 7
    ACG: 4
    ACT: 7
    AGA: 6
    AGC: 9
    AGG: 5
    AGT: 2
    ATA: 6
    ATC: 4
    ATG: 4
    ATT: 6
    CAA: 3
    CAC: 5
    CAG: 7
    CAT: 10
    CCA: 5
    CCC: 4
    CCG: 8
    CCT: 5
    CGA: 7
    CGC: 6
    CGG: 5
    CGT: 5
    CTA: 4
    CTC: 7
    CTG: 4
    CTT: 5
    GAA: 5
    GAC: 6
    GAG: 5
    GAT: 4
    GCA: 3
    GCC: 2
    GCG: 8
    GCT: 5
    GGA: 6
    GGC: 7
    GGG: 10
    GGT: 4
    GTA: 2
    GTC: 6
    GTG: 5
    GTT: 2
    TAA: 2
    TAC: 4
    TAG: 1
    TAT: 4
    TCA: 6
    TCC: 2
    TCG: 5
    TCT: 5
    TGA: 4
    TGC: 1
    TGG: 5
    TGT: 8
    TTA: 6
    TTC: 1
    TTG: 8
    TTT: 5

Count the codons in the second frame for the reverse complement of a sequence.

r2codons = codoncount(seq,'Frame',2,'Reverse',true)
r2codons = struct with fields:
    AAA: 5
    AAC: 2
    AAG: 5
    AAT: 6
    ACA: 8
    ACC: 4
    ACG: 5
    ACT: 2
    AGA: 5
    AGC: 5
    AGG: 5
    AGT: 7
    ATA: 4
    ATC: 4
    ATG: 10
    ATT: 6
    CAA: 8
    CAC: 5
    CAG: 4
    CAT: 4
    CCA: 5
    CCC: 10
    CCG: 5
    CCT: 5
    CGA: 5
    CGC: 8
    CGG: 8
    CGT: 4
    CTA: 1
    CTC: 5
    CTG: 7
    CTT: 8
    GAA: 1
    GAC: 6
    GAG: 7
    GAT: 4
    GCA: 1
    GCC: 7
    GCG: 6
    GCT: 9
    GGA: 2
    GGC: 2
    GGG: 4
    GGT: 7
    GTA: 4
    GTC: 6
    GTG: 5
    GTT: 5
    TAA: 6
    TAC: 2
    TAG: 4
    TAT: 6
    TCA: 4
    TCC: 6
    TCG: 7
    TCT: 6
    TGA: 6
    TGC: 3
    TGG: 5
    TGT: 6
    TTA: 2
    TTC: 5
    TTG: 3
    TTT: 11

Create a heat map of the codons and overlay a grid that groups the synonymous codons according to the standard genetic code.

codoncount(seq,'Figure', true);
AAA - 11     AAC -  5     AAG -  8     AAT -  6     
ACA -  6     ACC -  7     ACG -  4     ACT -  7     
AGA -  6     AGC -  9     AGG -  5     AGT -  2     
ATA -  6     ATC -  4     ATG -  4     ATT -  6     
CAA -  3     CAC -  5     CAG -  7     CAT - 10     
CCA -  5     CCC -  4     CCG -  8     CCT -  5     
CGA -  7     CGC -  6     CGG -  5     CGT -  5     
CTA -  4     CTC -  7     CTG -  4     CTT -  5     
GAA -  5     GAC -  6     GAG -  5     GAT -  4     
GCA -  3     GCC -  2     GCG -  8     GCT -  5     
GGA -  6     GGC -  7     GGG - 10     GGT -  4     
GTA -  2     GTC -  6     GTG -  5     GTT -  2     
TAA -  2     TAC -  4     TAG -  1     TAT -  4     
TCA -  6     TCC -  2     TCG -  5     TCT -  5     
TGA -  4     TGC -  1     TGG -  5     TGT -  8     
TTA -  6     TTC -  1     TTG -  8     TTT -  5     

Version History

Introduced before R2006a