Contenu principal

Bowtie2AlignOptions

Options to map reads to reference sequence

Description

A Bowtie2AlignOptions object contains options to run the bowtie2 function, which aligns reads to a reference sequence.

Creation

Description

alignOptions = Bowtie2AlignOptions creates a Bowtie2AlignOptions object with default property values.

Bowtie2AlignOptions requires the Bowtie 2 Support Package for Bioinformatics Toolbox™. If this support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

example

alignOptions = Bowtie2AlignOptions(Name,Value) sets properties using one or more name-value pair arguments. Enclose each property name in quotes. For example, alignOptions = Bowtie2AlignOptions('Trim5',10) specifies to trim 10 residues from the 5' end.

example

alignOptions = Bowtie2AlignOptions(S) specifies optional parameters in a character vector S.

example

Input Arguments

expand all

Alignment parameters, specified as a character vector. S must be in the Bowtie 2 option syntax (prefixed by one or two dashes) [1].

Properties

expand all

Since R2023b

Flag to allow unpaired reads to be aligned to the forward (Watson) reference strand, specified as a numeric or logical 1 (true) or 0 (false). Set this option to false to prevent bowtie2 from aligning reads to the forward reference strand.

Data Types: double | logical

Since R2023b

Flag to allow unpaired reads to be aligned to the reverse (Crick) reference strand, specified as a numeric or logical 1 (true) or 0 (false). Set this option to false to prevent bowtie2 from aligning reads to the reverse reference strand.

Data Types: double | logical

Since R2023b

Base name of files where aligned paired reads are saved, specified as a character vector or string scalar. Paired reads that align at least one time are saved to the files. bowtie2 creates two files, one for each read pair. The files have the same format as the input data.

The function appends ".1" or ".2" to the base file name to specify each read pair file. If the base file name includes the % symbol, bowtie2 inserts 1 or 2 at this % position instead of appending ".1" or ".2". Use ReadSupplementFileCompression to compress these supplement files.

By default, bowtie2 does not create these supplement files.

Data Types: char | string

Since R2023b

Name of a file where aligned unpaired reads are saved, specified as a character vector or string scalar. Unpaired reads that align at least one time are saved to the file. The file has the same format as the input data. Use ReadSupplementFileCompression to compress these supplement files.

By default, bowtie2 does not create the file.

Data Types: char | string

Flag to allow dovetail configurations of input reads, specified as a numeric or logical 1 (true) or 0 ( false). This property specifies whether the alignment of one mate can extend past the beginning of the alignment of the other mate and be considered concordant.

This property applies to paired-end reads only.

Data Types: double | logical

Penalty for positions with ambiguous characters on the read sequence, reference sequence, or both, specified as a nonnegative integer.

Data Types: double

Since R2023b

Flag to append FASTQ or FASTA comments to the output SAM file, specified as a numeric or logical 1 (true) or 0 (false). A comment is any text after the first space in the read name.

Data Types: double | logical

Since R2023b

Flag to align the paired-end BAM reads, specified as a numeric or logical 1 (true) or 0 ( false). This flag is functional only if you also set ReadFormat="BAM".

By default, bowtie2 attempts to align unpaired BAM reads only. Set the value to true to align paired-end reads instead.

Data Types: double | logical

Since R2023b

Flag to preserve tags from the input BAM file by appending them to the SAM output, specified as a numeric or logical 1 (true) or 0 ( false). Set the value to true to add the tags to the end of the corresponding SAM output file.

Data Types: double | logical

Encoding format of the base quality in the input files, specified as one of the following: 'Phred33', 'Phred64', or 'Solexa'.

Data Types: char | string

Flag to allow one mate alignment to contain the alignment of the other mate and to be considered concordant, specified as a numeric or logical 1 (true) or 0 (false).

This property applies to paired-end reads only.

Data Types: double | logical

Flag to include discordant alignments, specified as a numeric or logical 1 (true) or 0 (false). A discordant alignment is an alignment where both mates align uniquely, but not in a way that satisfies the paired-end constraints.

Data Types: double | logical

Flag to exclude mixed alignments, specified as a numeric or logical 1 (true) or 0 ( false). A mixed alignment consists of mate reads that are not concordant or discordant, but align individually.

This property applies to paired-end reads only.

Data Types: double | logical

Flag to allow the alignment of one mate to overlap with the alignment of the other mate and to be considered concordant, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Since R2023b

Flag to exclude SAM headers, specified as a numeric or logical 1 (true) or 0 ( false). A SAM header starts with the @ symbol.

Data Types: double | logical

Since R2023b

Flag to exclude SAM reference sequence header lines in the output SAM file, specified as a numeric or logical 1 (true) or 0 ( false). A reference sequence header line starts with @SQ.

Data Types: double | logical

Flag to exclude reads that failed to align, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Additional options not included in the object properties, specified as a character vector. The character vector must be in the Bowtie 2 option syntax (prefixed by one or two dashes). The default value is an empty character vector ''.

Example: 'ExtraBowtie2Command','--version'

Data Types: char | string

Since R2023b

K-mer length and step size to use when you set ReadFormat="FASTAKMer", specified as a two-element vector of positive integers.

Data Types: double

Since R2023b

Flag to filter reads with nonzero QSEQ filter field, specified as a numeric or logical 1 (true) or 0 ( false). This flag is functional only if you also set ReadFormat="QSeq".

Data Types: double | logical

Flag to ignore the actual read position quality when a mismatch occurs, specified as a numeric or logical 1 (true) or 0 (false). Setting this property to true allows the quality value at that mismatched position to be the highest possible, regardless of the actual value.

Data Types: double | logical

Since R2023b

Flag to consider soft-clipped bases as unmapped when calculating TLEN in the output SAM file, specified as a numeric or logical 1 (true) or 0 ( false). This flag is functional only if you also set Mode="Local". TLEN stands for signed observed template length.

Data Types: double | logical

Since R2023b

Flag to specify quality values in the input reads as space-separated integers rather than ASCII characters, specified as a numeric or logical 1 (true) or 0 ( false).

Data Types: double | logical

Reward added to the alignment score when a position in the read matches a position in the reference, specified as a nonnegative integer.

Data Types: double

Since R2023b

Orientation of mate pairs for paired-end alignment, specified as one of the following:

  • "ForwardReverse" — Aligned pairs are derived from a forward-oriented mate upstream of a reverse-oriented complement mate.

  • "ReverseForward" — Aligned pairs are derived from a reverse-oriented complement mate upstream of a forward-oriented mate.

  • "ForwardForward" — Aligned pairs are derived from a forward-oriented mate upstream of a forward-oriented mate.

Data Types: char | string

Function governing the maximum number of ambiguous characters allowed in a read, specified as a character vector or string scalar.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

The default function is 'L,0,0.15', that is, H(x) = 0 + 0.15 * x.

Example: 'MaxAmbiguousFunction','L,-0.4,-0.6'

Data Types: char | string

Since R2023b

Maximum fragment length for the paired-end alignment, specified as a positive integer.

The larger the difference between MaxFragmentLength and MinFragmentLength is, the slower bowtie2 runs.

This option does not consider trimming into account. That is, if you specify trimming options, such as Trim3 or Trim5, MaxFragmentLength is applied to the untrimmed mates.

Data Types: double

Flag to use memory mapping (instead of file I/O) when loading the index, specified as a numeric 1 (true) or 0 (false). Memory mapping allows many concurrent processes to share the memory image of the index, resulting in a more efficient parallelization of the task.

Data Types: double | logical

Since R2023b

Name of the metrics file, specified as a character vector or string scalar. This file contains performance metrics for the alignment generated by bowtie2. By default, bowtie2 does not generate a metrics file.

Data Types: char | string

Since R2023b

Time interval in seconds for writing to the metrics file, specified as a positive integer. This option is functional only if you also specify MetricsFile. If so, by default, bowtie2 writes a new metrics record every second.

Data Types: double

Since R2023b

Minimum fragment length for the paired-end alignment, specified as a nonnegative integer.

The larger the difference between MaxFragmentLength and MinFragmentLength is, the slower bowtie2 runs.

This option does not consider trimming into account. That is, if you specify trimming options, such as Trim3 or Trim5, MinFragmentLength is applied to the untrimmed mates.

Data Types: double

Function governing the minimum score threshold of an alignment, specified as a character vector or string scalar.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

For the 'EndToEnd' alignment mode, the default function is 'L,-0.6,-0.6'. For the 'Local' mode, the default function is 'G,20,8'.

Example: 'MinScoreFunction','L,-0.4,-0.6'

Data Types: char | string

Maximum and minimum values to compute the mismatch penalty during alignment, specified as a two-element vector. The first element is the maximum value and the second element is the minimum value.

A number less than or equal to the maximum value, and greater than or equal to the minimum value is subtracted from the alignment score for each position where a read character aligns to a reference character, the characters do not match, and neither is an N character.

Example: 'MismatchPenalty',[5 3]

Data Types: double

Alignment mode, specified as 'EndToEnd' or 'Local'.

In the 'Local' mode, only part of the read must align to the reference, and some residues can be omitted (soft-clipped) to achieve the best alignment score. In the 'EndToEnd' mode, the entire read must align without any soft-clipping.

Data Types: char | string

Flag to reinitialize the pseudo-random generator for each read using the current time, specified as a numeric or logical 1 (true) or 0 (false). If true, the alignments reported for two identical reads can be different. The default value is false, that is, the pseudo-random generator is reinitialized using a seed derived from read information and the seed number.

Data Types: double | logical

Number of positions at the beginning or end of each read where gaps are not allowed, specified as a nonnegative integer.

Data Types: double

Maximum number of valid alignments to report before terminating the search, specified as a positive integer, 'Best', or 'All'. If you specify a positive integer N, the function searches for up to N distinct, valid alignments for each read. 'Best' reports the best alignment for each read. 'All' reports all the valid alignments for each read sorted by alignment scores.

The alignment score for a paired-end alignment equals the sum of the alignment scores of individual mates.

Data Types: double | char | string

Maximum number of reseeding attempts with repetitive seeds, specified as a nonnegative integer. During reseeding, the function chooses a new set of reads at different offsets to find more alignments.

Data Types: double

Maximum number of consecutive seed extension attempts before getting a new seed, specified as a nonnegative integer. A seed extension fails if it does not yield an alignment with the best (or second-best) score.

Data Types: double

Number of allowed mismatches in a seed alignment during the multiseed alignment, specified as 0 or 1.

Data Types: double

Number of parallel threads to perform the alignment, specified as a positive integer. Threads run on separate processors or cores. Increasing the number of threads provides a significant increase in speed (close to linear) but also increases the memory footprint.

Data Types: double

Offrate to use when reading the index to reduce the memory footprint, specified as a positive integer. The offrate must be greater than the offrate used to build the index.

Data Types: double

Since R2023b

Flag to omit SEQ and QUAL fields, specified as a numeric or logical 1 (true) or 0 (false). When this option is true, bowtie2 prints an asterisk "*" for these fields in the output SAM file.

Data Types: double | logical

Position in the reference sequence where the alignment for each sequence begins, specified as a nonnegative integer.

Data Types: double

Since R2023b

File format for the input reads, specified as one of the following strings.

  • "" — Uses the extensions of the input files to determine the file format. All the input files must have the same file extension.

  • "FASTQ" — FASTQ file format.

  • "FASTA"— FASTA file format.

  • "FASTAKMer" — FASTA file format and you aim to align k-mers from the input files. You must also specify FASTAKMerParameters that defines the k-mer length and step size.

  • "Interleaved" — Interleaved FASTQ files, where the first two records represent a mate pair.

  • "BAM" — Sorted and unaligned BAM files.

  • "RawSequences" — Input files contain a single sequence per line.

  • "QSeq" — QSEQ file format.

  • "Tab5" — TAB5 file format, where each read or pair is on a single line. An unpaired read line is [name]\t[seq]\t[qual]\n. A paired-end read line is [name]\t[seq1]\t[qual1]\t[seq2]\t[qual2]\n. An input file can contain a mix of unpaired and paired-end reads, and the function can distinguish and handle both read types.

  • "Tab6" — TAB6 file format, where an unpaired read line is [name]\t[seq]\t[qual]\n and a paired read line is [name1]\t[seq1]\t[qual1]\t[name2]\t[seq2]\t[qual2]\n.

Data Types: char | string

Gap costs for opening and extending a gap on the read, specified as a two-element vector of nonnegative integers. The first element is the cost of opening a gap, and the second element is the cost of extending a gap. Given the cost vector [GO GE], a read gap of length N is assigned a penalty of GO + N * GE.

Example: 'ReadGapCosts',[4 2]

Data Types: double

Read group information to add as a field on the @RG header line in the output SAM report, specified as a character vector or string. This property applies only if you specify 'ReadGroupID'.

Data Types: char | string

Read group ID to add on the @RG header line in the output SAM report, specified as a character vector or string. If you specify any read group ID, the function prints the @RG header line with the tag ID: followed by the specified group ID.

Data Types: char | string

Since R2023b

Compression type to use for the supplement files, specified as "None", "gz", "bz2", or "lz4".Use the following options to specify supplement files: AlignedPairedReadSupplementFile, AlignedUnpairedReadSupplementFile, UnalignedPairedReadSupplementFile, UnalignedUnpairedReadSupplementFile.

Data Types: char | string

Gap costs for opening and extending a gap on the reference, specified as a two-element vector of nonnegative integers. The first element is the cost of opening a gap, and the second element is the cost of extending a gap. Given the cost vector [GO GE], a reference gap of length N is assigned a penalty of GO + N * GE.

Example: 'RefGapCosts',[4 2]

Data Types: double

Flag to reorder SAM records to maintain the same order as in the input files, specified as a numeric or logical 1 (true) or 0 (false). This property applies only when the number of parallel threads is greater than one. When you use one thread, the order of the records in the output is the same as the order of the input.

Data Types: double | logical

Number to set the seed in the pseudo-random number generator, specified as a nonnegative integer.

Example: 'Seed',3

Data Types: double

Function governing the distance between seed substrings during the multiseed alignment, specified as a character vector or string scalar.

The function has the format 'f,B,A', where f is a function type, B is a constant term, and A is a coefficient. Available function types are:

  • 'C'– Constant

  • 'L'– Linear

  • 'S'– Square root

  • 'G'– Natural log

The resulting function is H(x) = B + A * f(x), where x is the read length.

For the 'EndToEnd' alignment mode, the default function is 'S,1,1.15'. For the 'Local' mode, the default function is 'S,1,0.75'.

Example: 'SeedIntervalFunction','S,2,2.15'

Data Types: char | string

Seed substring length to align during the multiseed alignment, specified as a positive integer.

Data Types: double

Number of reads to ignore from the beginning of the input files, specified as a nonnegative integer.

Data Types: double

Number of residues to trim from the 3' end of each read before aligning, specified as a nonnegative integer.

Data Types: double

Number of residues to trim from the 5' end of each read before aligning, specified as a nonnegative integer.

Data Types: double

Since R2023b

Threshold to trim reads exceeding a given number of bases, specified as a nonnegative integer or two-element array. By default, no reads are trimmed.

If the value is a nonnegative integer N, reads that contains more bases than the specified number N are trimmed from the 3' end.

If the value is a two-element array [M,N], the first number M must be either 3 or 5, which indicates either the 3' or 5' end to trim from. The second number specifies the maximum read length and any reads containing more bases than N are trimmed.

Data Types: double

Since R2023b

Flag to truncate read names, specified as a numeric or logical 1 (true) or 0 (false). By default, bowtie2 truncates the read name after the first white space.

Data Types: double | logical

Since R2023b

Base name of files where paired reads that are not aligned are saved, specified as a character vector or string scalar. bowtie2 creates two files, one for each read pair. The files have the same format as the input data.

The function appends ".1" or ".2" to the base file name to specify each read pair file. If the base file name includes the % symbol, bowtie2 inserts 1 or 2 at this % position instead of appending ".1" or ".2". Use ReadSupplementFileCompression to compress these supplement files.

By default, bowtie2 does not create these supplement files.

Data Types: char | string

Since R2023b

Name of a file where unpaired reads that are not aligned are saved, specified as a character vector or string scalar. The file has the same format as the input data. Use ReadSupplementFileCompression to compress these supplement files.

By default, bowtie2 does not create the file.

Data Types: char | string

Number of reads to consider from the beginning of input files, specified as a positive integer. The default value is Inf, that is, all reads are considered.

Data Types: double

Since R2023b

Flag to indicate the prioritization of 1-mismatch alignments over the multiseed alignment, specified as a numeric or logical 1 (true) or 0 (false). By default, bowtie2 attempts to find the exact matches or matches with a single mismatch before trying a multiseed alignment.

Data Types: double | logical

Object Functions

getBowtie2CommandTranslate object properties to Bowtie 2 options
getBowtie2TableRetrieve table with object properties and equivalent Bowtie 2 options
presetSet combination of alignment options
runMap sequence reads to reference sequence using Bowtie 2

Examples

collapse all

Build a set of index files for the Drosophila genome. An error message appears if you do not have the Bowtie 2 Support Package for Bioinformatics Toolbox installed when you run the function. Click the provided link to download the package from the Add-on menu.

For this example, the reference sequence Dmel_chr4.fa is already provided with the toolbox.

status = bowtie2build('Dmel_chr4.fa', 'Dmel_chr4_index');

If the index build is successful, the function returns 0 and creates the index files (*.bt2) in the current folder. The files have the prefix 'Dmel_chr4_index'.

Sometimes the index files exist, and you want to know the reference sequence used to build the index. In this case, use the bowtie2inspect function to get more information about the reference.

bowtie2inspect('Dmel_chr4', 'Dmel_chr4_retrieved.fa');

By default, the output file Dmel_chr4_retrieved.fa contains the sequence of the reference. You can also get a summary information about the reference name and lengths instead of the actual sequence. For details on the available options, see Bowtie2InspectOptions.

Once the index is ready, map the read sequences to the reference using the bowtie2 function. The paired-end read files (SRR6008575_10k_1.fq and SRR6008575_10k_2.fq) are already provided with the toolbox.

bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4.sam');

The output is a SAM-formatted file that contains the mapping results.

You can specify different alignment options by passing in a Bowtie 2 syntax string or using a Bowtie2AlignOptions object.

Suppose you want to trim some residues from the 3' end before aligning. First, create a Bowtie2AlignOptions object.

 alignOpt = Bowtie2AlignOptions;

Trim four residues from the 3' end before aligning.

 alignOpt.Trim3 = 4;

Map reads to the reference using the specified alignment option.

flag = bowtie2('Dmel_chr4','SRR6008575_10k_1.fq','SRR6008575_10k_2.fq','SRR6008575_10k_chr4_trimmed.sam',alignOpt);

References

[1] Langmead, B., and S. Salzberg. "Fast gapped-read alignment with Bowtie 2." Nature Methods. 9, 2012, 357–359.

Version History

Introduced in R2018a