Main Content

profalign

Align two profiles using Needleman-Wunsch global alignment

Syntax

Prof = profalign(Prof1, Prof2)
[Prof, H1, H2] = profalign(Prof1, Prof2)
profalign(..., 'ScoringMatrix', ScoringMatrixValue, ...)
profalign(..., 'GapOpen', {G1Value, G2Value}, ...)
profalign(..., 'ExtendGap', {E1Value, E2Value}, ...)
profalign(..., 'ExistingGapAdjust', ExistingGapAdjustValue, ...)
profalign(..., 'TerminalGapAdjust', TerminalGapAdjustValue, ...)
profalign(..., 'ShowScore', ShowScoreValue, ...)

Description

Prof = profalign(Prof1, Prof2) returns a new profile (Prof) for the optimal global alignment of two profiles (Prof1, Prof2). The profiles (Prof1, Prof2) are numeric arrays of size [(4 or 5 or 20 or 21) x Profile Length] with counts or weighted profiles. Weighted profiles are used to down-weight similar sequences and up-weight divergent sequences. The output profile is a numeric matrix of size [(5 or 21) x New Profile Length] where the last row represents gaps. Original gaps in the input profiles are preserved. The output profile is the result of adding the aligned columns of the input profiles.

[Prof, H1, H2] = profalign(Prof1, Prof2) returns pointers that indicate how to rearrange the columns of the original profiles into the new profile.

profalign(..., 'PropertyName', PropertyValue, ...) calls profalign with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

profalign(..., 'ScoringMatrix', ScoringMatrixValue, ...) defines the scoring matrix to be used for the alignment.

ScoringMatrixValue can be either of the following:

  • Character vector or string specifying the scoring matrix to use for the alignment. Choices for amino acid sequences are:

    • 'BLOSUM62'

    • 'BLOSUM30' increasing by 5 up to 'BLOSUM90'

    • 'BLOSUM100'

    • 'PAM10' increasing by 10 up to 'PAM500'

    • 'DAYHOFF'

    • 'GONNET'

    Default is:

    • 'BLOSUM50' — When AlphabetValue equals 'AA'

    • 'NUC44' — When AlphabetValue equals 'NT'

    Note

    The above scoring matrices, provided with the software, also include a structure containing a scale factor that converts the units of the output score to bits. You can also use the 'Scale' property to specify an additional scale factor to convert the output score from bits to another unit.

  • Matrix representing the scoring matrix to use for the alignment, such as returned by the blosum, pam, dayhoff, gonnet, or nuc44 function.

    Note

    If you use a scoring matrix that you created or was created by one of the above functions, the matrix does not include a scale factor. The output score will be returned in the same units as the scoring matrix.

Note

If you need to compile profalign into a stand-alone application or software component using MATLAB® Compiler™, use a matrix instead of a character vector or string for ScoringMatrixValue.

profalign(..., 'GapOpen', {G1Value, G2Value}, ...) sets the penalties for opening a gap in the first and second profiles respectively. G1Value and G2Value can be either scalars or vectors. When using a vector, the number of elements is one more than the length of the input profile. Every element indicates the position specific penalty for opening a gap between two consecutive symbols in the sequence. The first and the last elements are the gap penalties used at the ends of the sequence. The default gap open penalties are {10,10}.

profalign(..., 'ExtendGap', {E1Value, E2Value}, ...) sets the penalties for extending a gap in the first and second profile respectively. E1Value and E2Value can be either scalars or vectors. When using a vector, the number of elements is one more than the length of the input profile. Every element indicates the position specific penalty for extending a gap between two consecutive symbols in the sequence. The first and the last elements are the gap penalties used at the ends of the sequence. If ExtendGap is not specified, then extensions to gaps are scored with the same value as GapOpen.

profalign(..., 'ExistingGapAdjust', ExistingGapAdjustValue, ...), if ExistingGapAdjustValue is false, turns off the automatic adjustment based on existing gaps of the position-specific penalties for opening a gap. When ExistingGapAdjustValue is true (default), for every profile position, profalign proportionally lowers the penalty for opening a gap toward the penalty of extending a gap based on the proportion of gaps found in the contiguous symbols and on the weight of the input profile.

profalign(..., 'TerminalGapAdjust', TerminalGapAdjustValue, ...), when TerminalGapAdjustValue is true, adjusts the penalty for opening a gap at the ends of the sequence to be equal to the penalty for extending a gap. Default is false.

profalign(..., 'ShowScore', ShowScoreValue, ...), when ShowScoreValue is true, displays the scoring space and the winning path.

Examples

  1. Read in sequences and create profiles.

    ma1 = ['RGTANCDMQDA';'RGTAHCDMQDA';'RRRAPCDL-DA'];
    ma2 = ['RGTHCDLADAT';'RGTACDMADAA'];
    p1  = seqprofile(ma1,'gaps','all','counts',true);
    p2  = seqprofile(ma2,'counts',true);
  2. Merge two profiles into a single one by aligning them.

    p = profalign(p1,p2);
    seqlogo(p)
  3. Use the output pointers to generate the multiple alignment.

    [p, h1, h2] = profalign(p1,p2);
    ma = repmat('-',5,12);
    ma(1:3,h1) = ma1;
    ma(4:5,h2) = ma2;
    disp(ma)
  4. Increase the gap penalty before cysteine in the second profile.

    gapVec = 10 + [p2(aa2int('C'),:) 0] * 10
    p3 = profalign(p1,p2,'gapopen',{10,gapVec});
    seqlogo(p3)
  5. Add a new sequence to a profile without inserting new gaps into the profile.

    gapVec = [0 inf(1,11) 0];
    p4 = profalign(p3,seqprofile('PLHFMSVLWDVQQWP'),...
                   'gapopen',{gapVec,10});
    seqlogo(p4)

Version History

Introduced before R2006a