Main Content

swalign

Locally align two sequences using Smith-Waterman algorithm

Syntax

Score = swalign(Seq1, Seq2)
[Score, Alignment] = swalign(Seq1, Seq2)
[Score, Alignment, Start] = swalign(Seq1, Seq2)
... = swalign(Seq1,Seq2, ...'Alphabet', AlphabetValue)
... = swalign(Seq1,Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...)
... = swalign(Seq1,Seq2, ...'Scale', ScaleValue, ...)
... = swalign(Seq1,Seq2, ...'GapOpen', GapOpenValue, ...)
... = swalign(Seq1,Seq2, ...'ExtendGap', ExtendGapValue, ...)
... = swalign(Seq1,Seq2, ...'Showscore', ShowscoreValue, ...)

Input Arguments

Seq1, Seq2

Amino acid or nucleotide sequences. Enter any of the following:

  • Character vector or string of letters representing amino acids or nucleotides, such as returned by int2aa or int2nt

  • Vector of integers representing amino acids or nucleotides, such as returned by aa2int or nt2int

  • Structure containing a Sequence field

Tip

For help with letter and integer representations of amino acids and nucleotides, see Amino Acid Lookup or Nucleotide Lookup.

AlphabetValueCharacter vector or string specifying the type of sequence. Choices are 'AA' (default) or 'NT'.
ScoringMatrixValue

Either of the following:

  • Character vector or string specifying the scoring matrix to use for the local alignment. Choices for amino acid sequences are:

    • 'BLOSUM62'

    • 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

    • 'BLOSUM100'

    • 'PAM10' increasing by 10 up to 'PAM500'

    • 'DAYHOFF'

    • 'GONNET'

    Default is:

    • 'BLOSUM50' — When AlphabetValue equals 'AA'

    • 'NUC44' — When AlphabetValue equals 'NT'

    Note

    The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the 'Scale' property to specify an additional scale factor to convert the output score from bits to another unit.

  • Matrix representing the scoring matrix to use for the local alignment, such as returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.

    Note

    If you use a scoring matrix that you created or was created by one of the above functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix. You can use the 'Scale' property to specify a scale factor to convert the output score to another unit.

Note

If you need to compile swalign into a stand-alone application or software component using MATLAB® Compiler™, use a matrix instead of a character vector or string for ScoringMatrixValue.

ScaleValue

Positive value that specifies a scale factor that is applied to the output score.

For example, if the output score is initially determined in bits, and you enter log(2) for ScaleValue, then swalign returns Score in nats.

Default is 1, which does not change the units of the output score.

Note

If the 'ScoringMatrix' property also specifies a scale factor, then swalign uses it first to scale the output score, then applies the scale factor specified by ScaleValue to rescale the output score.

Tip

Before comparing alignment scores from multiple alignments, ensure the scores are in the same units. You can use the 'Scale' property to control the units of the output scores.

GapOpenValue

Positive value specifying the penalty for opening a gap in the alignment. Default is 8.

ExtendGapValue

Positive value specifying the penalty for extending a gap using the affine gap penalty scheme.

Note

If you specify this value, swalign uses the affine gap penalty scheme, that is, it scores the first gap using the GapOpenValue and scores subsequent gaps using the ExtendGapValue. If you do not specify this value, swalign scores all gaps equally, using the GapOpenValue penalty.

ShowscoreValueControls the display of the scoring space and the winning path of the alignment. Choices are true or false (default).

Output Arguments

ScoreOptimal local alignment score in bits.
Alignment3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows, and symbols representing the optimal local alignment between them in the second row.
Start2-by-1 vector of indices indicating the starting point in each sequence for the alignment.

Description

Score = swalign(Seq1, Seq2) returns the optimal local alignment score in bits. The scale factor used to calculate the score is provided by the scoring matrix.

[Score, Alignment] = swalign(Seq1, Seq2) returns a 3-by-N character array showing the two sequences, Seq1 and Seq2, in the first and third rows, and symbols representing the optimal local alignment between them in the second row. The symbol | indicates amino acids or nucleotides that match exactly. The symbol : indicates amino acids or nucleotides that are related as defined by the scoring matrix (nonmatches with a zero or positive scoring matrix value).

[Score, Alignment, Start] = swalign(Seq1, Seq2) returns a 2-by-1 vector of indices indicating the starting point in each sequence for the alignment.

... = swalign(Seq1,Seq2, ...'PropertyName', PropertyValue, ...) calls swalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

... = swalign(Seq1,Seq2, ...'Alphabet', AlphabetValue) specifies the type of sequences. Choices are 'AA' (default) or 'NT'.

... = swalign(Seq1,Seq2, ...'ScoringMatrix', ScoringMatrixValue, ...) specifies the scoring matrix to use for the local alignment. Default is:

  • 'BLOSUM50' — When AlphabetValue equals 'AA'

  • 'NUC44' — When AlphabetValue equals 'NT'

... = swalign(Seq1,Seq2, ...'Scale', ScaleValue, ...) specifies a scale factor that is applied to the output score, thereby controlling the units of the output score. Choices are any positive value.

... = swalign(Seq1,Seq2, ...'GapOpen', GapOpenValue, ...) specifies the penalty for opening a gap in the alignment. Choices are any positive value. Default is 8.

... = swalign(Seq1,Seq2, ...'ExtendGap', ExtendGapValue, ...) specifies the penalty for extending a gap using the affine gap penalty scheme. Choices are any positive value.

... = swalign(Seq1,Seq2, ...'Showscore', ShowscoreValue, ...) controls the display of the scoring space and winning path of the alignment. Choices are true or false (default).

The scoring space is a heat map displaying the best scores for all the partial alignments of two sequences. The color of each (n1,n2) coordinate in the scoring space represents the best score for the pairing of subsequences Seq1(s1:n1) and Seq2(s2:n2), where n1 is a position in Seq1, n2 is a position in Seq2, s1 is any position in Seq1 between 1:n1, and s2 is any position in Seq2 between 1:n2. The best score for a pairing of specific subsequences is determined by scoring all possible alignments of the subsequences by summing matches and gap penalties.

The winning path is represented by black dots in the scoring space, and it illustrates the pairing of positions in the optimal local alignment. The color of the last point (lower right) of the winning path represents the optimal local alignment score for the two sequences and is the Score output returned by swalign.

Note

The scoring space visually shows tandem repeats, small segments that potentially align, and partial alignments of domains from rearranged sequences.

Examples

  1. Locally align two amino acid sequences using the BLOSUM50 (default) scoring matrix and the default values for the GapOpen and ExtendGap properties. Return the optimal local alignment score in bits and the alignment character array.

    [Score, Alignment] = swalign('VSPAGMASGYD','IPGKASYD')
    
    Score =
    
         8.6667
    
    Alignment =
    
    PAGMASGYD
    | | || ||
    P-GKAS-YD
    
  2. Locally align two amino acid sequences specifying the PAM250 scoring matrix and a gap open penalty of 5.

    [Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE',...
                                 'ScoringMatrix', 'pam250',...
                                 'GapOpen',5)
    
    Score =
    
         8
    Alignment =
    
    GAWGHE
    :|| ||
    PAW-HE
    

  3. Locally align two amino acid sequences returning the Score in nat units (nats) by specifying a scale factor of log(2).

    [Score, Alignment] = swalign('HEAGAWGHEE','PAWHEAE','Scale',log(2))
                                 
    Score =
    
        6.4694
    
    Alignment =
    
    AWGHE
    || ||
    AW-HE

References

[1] Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998). Biological Sequence Analysis (Cambridge University Press).

[2] Smith, T., and Waterman, M. (1981). Identification of common molecular subsequences. Journal of Molecular Biology 147, 195–197.

Version History

Introduced before R2006a