Main Content

codonbias

Calculate codon frequency for each amino acid coded for in nucleotide sequence

Syntax

CodonFreq = codonbias(SeqNT)
CodonFreq = codonbias(SeqNT, ...'GeneticCode', GeneticCodeValue, ...)
CodonFreq = codonbias(SeqNT, ...'Frame', FrameValue, ...)
CodonFreq = codonbias(SeqNT, ...'Reverse', ReverseValue, ...)
CodonFreq = codonbias(SeqNT, ...'Ambiguous', AmbiguousValue, ...)
CodonFreq = codonbias(SeqNT, ...'Pie', PieValue, ...)

Input Arguments

SeqNT

One of the following:

  • Character vector or string specifying a nucleotide sequence

  • Row vector of integers specifying a nucleotide sequence

  • MATLAB® structure containing a Sequence field that contains a nucleotide sequence, such as returned by fastaread, fastqread, emblread, getembl, genbankread, or getgenbank

Valid characters include A, C, G, T, and U.

codonbias does not count ambiguous nucleotides or gaps.

GeneticCodeValue

Integer, character vector, or string specifying a genetic code number or code name from the table Genetic Code. Default is 1 or 'Standard'.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

FrameValue

Integer specifying a reading frame in the nucleotide sequence. Choices are 1 (default), 2, or 3.

ReverseValueControls the return of the codon frequency for the reverse complement sequence of the nucleotide sequence specified by SeqNT. Choices are true or false (default).
AmbiguousValue

Character vector or string specifying how to treat codons containing ambiguous nucleotide characters (R, Y, K, M, S, W, B, D, H, V, or N). Choices are:

  • 'ignore' (default) — Skips codons containing ambiguous characters

  • 'prorate' — Counts codons containing ambiguous characters and distributes them proportionately in the appropriate codon fields. For example, the counts for the codon ART are distributed evenly between the AAT and AGT fields.

  • 'warn' — Skips codons containing ambiguous characters and displays a warning.

PieValueControls the creation of a figure of 20 pie charts, one for each amino acid. Choices are true or false (default).

Output Arguments

CodonFreqMATLAB structure containing a field for each amino acid, each of which contains the associated codon frequencies as percentages.

Description

Many amino acids are coded by two or more nucleic acid codons. However, the probability that a specific codon (from all possible codons for an amino acid) is used to code an amino acid varies between sequences. Knowing the frequency of each codon in a protein coding sequence for each amino acid is a useful statistic.

CodonFreq = codonbias(SeqNT) calculates the codon frequency in percent for each amino acid coded for in SeqNT, a nucleotide sequence, and returns the results in CodonFreq, a MATLAB structure containing a field for each amino acid.

CodonFreq = codonbias(SeqNT, ...'PropertyName', PropertyValue, ...) calls codonbias with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

CodonFreq = codonbias(SeqNT, ...'GeneticCode', GeneticCodeValue, ...) specifies a genetic code. Choices for GeneticCodeValue are an integer, character vector, or string specifying a code number or code name from the table Genetic Code. If you use a code name, you can truncate the name to the first two characters of the name. Default is 1 or 'Standard'.

Tip

If you use a code name, you can truncate the name to the first two letters of the name.

CodonFreq = codonbias(SeqNT, ...'Frame', FrameValue, ...) calculates the codon frequency in the reading frame specified by FrameValue, which can be 1 (default), 2, or 3.

CodonFreq = codonbias(SeqNT, ...'Reverse', ReverseValue, ...) controls the return of the codon frequency for the reverse complement of the nucleotide sequence specified by SeqNT. Choices are true or false (default).

CodonFreq = codonbias(SeqNT, ...'Ambiguous', AmbiguousValue, ...) specifies how to treat codons containing ambiguous nucleotide characters. Choices are 'ignore' (default), 'prorate', and 'warn'.

CodonFreq = codonbias(SeqNT, ...'Pie', PieValue, ...) controls the creation of a figure of 20 pie charts, one for each amino acid. Choices are true or false (default).

Genetic Code

Code NumberCode Name
1Standard
2Vertebrate Mitochondrial
3Yeast Mitochondrial
4Mold, Protozoan, Coelenterate Mitochondrial, and Mycoplasma/Spiroplasma
5Invertebrate Mitochondrial
6Ciliate, Dasycladacean, and Hexamita Nuclear
9Echinoderm Mitochondrial
10Euplotid Nuclear
11Bacterial and Plant Plastid
12Alternative Yeast Nuclear
13Ascidian Mitochondrial
14Flatworm Mitochondrial
15Blepharisma Nuclear
16Chlorophycean Mitochondrial
21Trematode Mitochondrial
22Scenedesmus Obliquus Mitochondrial
23Thraustochytrium Mitochondrial

Examples

collapse all

Import a nucleotide sequence from the GenBank® database into the MATLAB software. For example, retrieve the DNA sequence that codes for a human insulin receptor.

S = getgenbank('M10051');

Calculate the codon frequency for each amino acid coded for by the DNA sequence, and then plot the results.

cb = codonbias(S.Sequence,'PIE',true)

Get the codon frequency for the alanine (A) amino acid.

cb.Ala
ans = 

    Codon: {'GCA' "GCC' "GCG' 'GCT'}
     Freq: [0.1600 0.3867 0.2533 02000]   

Version History

Introduced before R2006a