Contenu principal

bioinfo.blastplus.TBLASTNOptions

Specify options for tblastn query program

Since R2024a

Description

A bioinfo.blastplus.TBLASTNOptions object contains options for the tblastn query program [1][2], which searches a protein query against a translated nucleotide database.

Creation

Description

optionsObj = bioinfo.blastplus.TBLASTNOptions creates a TBLASTNOptions object with default property values. Alternatively, you can use the blastplusoptions function to create the object.

example

optionsObj = bioinfo.blastplus.TBLASTNOptions(Name=Value) sets the object properties using one or more name-value arguments. Name is the property name and Value is the property value. For example, set ExpectValue=0.01 to use the expect value of 0.01.

optionsObj = bioinfo.blastplus.TBLASTNOptions(S) specifies optional parameters using a string scalar or character vector S. S must be in the native syntax (prefixed by one dash). For example, optionsObj = bioinfo.blastplus.TBLASTNOptions("-dbsize 50") sets the effective database size to 50.

Properties

expand all

Effective database size, specified as a nonnegative integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

Data Types: double

Expect value for saving hits, specified as a positive scalar.

This value describes the expected number of hits you might get when searching a database. The lower the expect value, the more significant the match is. You could use this value to create a significance threshold for reporting results. For details, see this FAQ page.

Data Types: double

Additional commands, specified as a character vector or string scalar.

The commands must be in the native syntax (prefixed by one dash). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.

Example: "-lcase_masking"

Data Types: char | string

Cost to extend a gap, specified as a nonnegative integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

Data Types: double

Cost to open a gap, specified as a nonnegative integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

Data Types: double

Flag to perform a gapped alignment, specified as a numeric or logical 1 (true) or 0 (false). To perform an ungapped alignment, set GappedAlignment=false.

Data Types: double | logical

Flag to include all object properties with their corresponding default values when converting to the original option syntax, specified as a numeric or logical 1 (true) or 0 (false). You can convert properties to the original syntax prefixed by a dash (such as -dbtype nucl) by using the getCommand function.

When IncludeAll=false and you call getCommand(optionsObject), the software converts only the specified properties. If the value is true, getCommand converts all available properties, using default values for unspecified properties, to the original syntax.

Note

If you set IncludeAll to true, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property is NaN, Inf, [], '', or "", then the software does not translate the corresponding property.

Example: true

Data Types: logical

Line length for formatting alignments in the report containing the search results, specified as a positive integer.

This option is not applicable for ReportFormat > 4.

Data Types: double

Maximum number of high-scoring segment pairs, specified as a positive integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

Data Types: double

Length of the largest intron allowed in a translated nucleotide sequence when linking multiple distinct alignments, specified as a nonnegative integer. For details, see BLAST Command Line Applications User Manual.

Data Types: double

Maximum number of aligned sequences to keep, specified as a positive integer.

Data Types: double

Number of database sequences to show alignments for, specified as a nonnegative integer.

Data Types: double

Number of database sequences to show a one-line description for, specified as a nonnegative integer.

Data Types: double

Number of parallel threads to use, specified as a positive integer. The software runs threads on separate processors or cores. Increasing the number of threads generally improves the runtime significantly, but also increases the memory footprint.

Data Types: double

Location on the query sequence where you want the BLAST search to focus, specified as a two-element vector of positive integers. The first element must be smaller than the second.

For example, if a query protein sequence is 200 amino acids long, and you are interested in the region from amino acid 50 to 100, set the value to [50 100].

Data Types: double

Format of the BLAST report, specified as one of the following.

FormatCorresponding NumberDescription
"Pairwise"0Traditional BLAST pairwise format. This format presents each query-subject pair alignment in detail, including alignment scores, e-values, and sequence alignments.
"QueryAnchored"1Query-anchored format showing identities, that is, matching bases or amino acids. This format is more compact than the default and emphasizes the identical matches.
"QueryAnchoredNoIdentities"2Query-anchored format with no identities. In this format, the query sequence is fixed and the database hit sequences are aligned to it without showing the identities, that is, matching bases or amino acids. This format is less detailed than "QueryAnchored" but is also less cluttered.
"FlatQuery"3Flat query-anchored format showing identities. Although this format is similar to "QueryAnchored", the alignments might be condensed to save space and reduce redundancy.
"FlatQueryNoIdentities"4Flat query-anchored format with no identities. Although this format is similar to "QueryAnchoredNoIdentities", the alignments might be condensed to save space and reduce redundancy.
"BLASTXML"5XML BLAST output format. The XML report contains information about the query sequences, database hits, alignments, scores, and statistical significance.
"Tabular"6

Tabular format. This format is a tab-delimited format that provides a concise summary. The default columns in the tabular output are as follows, in this order:

  1. qseqid — Query sequence ID

  2. sseqid — Subject sequence ID

  3. pident — Percentage of identical matches

  4. length — Alignment length

  5. mismatch — Number of mismatches

  6. gapopen — Number of gap openings

  7. qstart — Start of alignment in the query sequence

  8. qend — End of alignment in the query sequence

  9. sstart — Start of alignment in the subject (database hit) sequence

  10. send — End of alignment in the subject (database hit) sequence

  11. evalue — Expect value

  12. bitscore — Bit score

"TabularCommented"7Tabular format with comment lines. This format is the same as "Tabular" with the addition of comment lines that start with a hash # sign. Comment lines include metadata, such as the BLAST version, reference, database name, query ID,and names of columns included in the report.
"SeqalignText"8Text ASN.1 format. NCBI uses the Abstract Syntax Notation One data representation format for the storage and retrieval of data, such as nucleotide and protein sequences. For details, see Protein Domains and Macromolecular Structures.
"SeqalignBinary"9Binary ASN.1 format. NCBI uses the Abstract Syntax Notation One data representation format for the storage and retrieval of data, such as nucleotide and protein sequences. For details, see Protein Domains and Macromolecular Structures.
"CommaSeparated"10Comma-separated values (CSV) format. This format is the same as the "Tabular" format except it uses commas to separate values.
"BLASTArchive"11BLAST archive format (ASN.1). This format is a compact and complete record of the search. The format is useful for saving the results of a BLAST search for later reanalysis or for use with other NCBI tools without having to rerun the search.
"SeqalignJSON"12Seqalign (JSON) format. This format provides easy readability for many tools. For details, see BLAST database metadata.
"MultiBLASTJSON"13Multiple-file BLAST JSON format. For this format, the BLAST search generates multiple JSON files with the search results. One file contains a list of all the generated JSON files. For each query sequence, the search returns one JSON file specific to that query sequence, even when the search contains no hits for the query.
"MultiBLASTXML2"14Multiple-file BLAST XML2 format. For this format, the BLAST search generates multiple XML files with the search results. One file contains a list of all the generated XML files. For each query sequence, the search returns one XML file specific to that query sequence, even when the search contains no hits for the query.
"SingleBLASTJSON"15Single-file BLAST JSON format. This format returns a single JSON file with all the search results.
"SingleBLASTXML2"16Single-file BLAST XML2 format. This format returns a single XML file with all the search results.
"SAM"17Sequence alignment/map (SAM) format. For details about this format, see Sequence Alignment/Map Format Specification.
"OrganismReport"18Organism report format. This report is a BLAST taxonomy report that sorts the hits according to the species of the target sequence, so that all the hits to the same organism appear together. For details, see Taxonomy BLAST Help.

Data Types: double | char | string

Scoring matrix name, specified as one of the following: "BLOSUM90", "BLOSUM80", "BLOSUM62", "BLOSUM50", "BLOSUM45", "PAM250", "PAM70", "PAM30", or "IDENTITY".

Tip

To generate these matrices in MATLAB, use the blosum function.

Data Types: char | string

Effective length of the search space, specified as a nonnegative integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

The search space is the theoretical size of all possible alignments between the query sequence and the database sequences. The search space is a parameter used in the calculation of the statistical significance (E-values) of the BLAST hits.

Data Types: double

Task name, specified as one of the following:

  • "blastx" – Traditional BLASTX to translate a nucleotide query and search it against a protein database

  • "blastx-fast" – Faster version that uses a larger word-size per [3]

For details, see here.

Data Types: char | string

This property is read-only.

Supported version of the original BLAST+ software, specified as a string scalar.

Data Types: string

Multiple hits window size, specified as a nonnegative integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

A larger window size increases the sensitivity of the search to detect more divergent sequences, but might also increase the noise in the search results.

A smaller window size decreases the search sensitivity, which might cause missed alignments in sequences with larger gaps or more divergent regions. However, a smaller window size might decrease the noise and make the significant alignments more apparent.

Data Types: double

Word size for an initial match, specified as a positive integer. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

A larger word size decreases the search sensitivity because BLAST is less likely to find longer exact matches that are not highly conserved. However, a larger word size might speed up the search and reduce noise in the search results.

A smaller word size increases the search sensitivity because BLAST is more likely to detect alignments, including those with more distant or weak similarities. The search might become slower due to the increased number of initial matches. Also, the search noise might increase.

Data Types: double

Minimum score required to add a word to the BLAST lookup table, specified as a nonnegative scalar. The default value is a bioinfo.blastplus.Default object, which means that the corresponding BLAST task or query program sets the default value.

Data Types: double

Object Functions

getCommandTranslate object properties to original options syntax
getOptionsTableReturn table with all properties and equivalent options in original syntax
resetReset BLAST database options to default values

Examples

collapse all

Download some paired-end sequencing data in the FASTA format using the accession run number SRR26273031.

databaseFasta = srafasterqdump("SRR26273031",FastaOutput=true)

Create a local nucleotide database using the downloaded FASTA file. Specify "SRR26273031_nucl_db" as the base name of the output database. When creating the database, the function also generates multiple index files with the same base name. The blastplus function uses these index files automatically when you search the database later in this example.

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db");

You can also specify additional database creation options using a MakeDatabaseOptions object. For instance, specify the title of the database.

dbopts = bioinfo.blastplus.MakeDatabaseOptions;
dbopts.Title = "SRR26273031_Nucleotide_DB"
dbopts = 
  MakeDatabaseOptions with properties:

   Default properties:
        ExtraCommand: ""
          IncludeAll: 0
           InputType: "fasta"
    ParseSequenceIDs: 0
             Version: "2.14.0"

   Modified properties:
               Title: "SRR26273031_Nucleotide_DB"

You can then use the options object to make the database.

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db",dbopts);

Alternatively, you can use specify options, such as the title of the database, by using name-value arguments. For example:

blastplusdatabase("nucleotide","SRR26273031.fasta","SRR26273031_nucl_db",Title="SRR26273031_Nucleotide_DB");

To reset the property values to their default values, use the reset function.

dopts2 = reset(dbopts)
dopts2 = 
  MakeDatabaseOptions with properties:

   Default properties:
        ExtraCommand: ""
          IncludeAll: 0
           InputType: "fasta"
    ParseSequenceIDs: 0
               Title: [1×0 string]
             Version: "2.14.0"

   Modified properties:
    No properties.

Search the database using the FASTA file queryFile.fasta containing two nucleotide query sequences. This file is provided with the toolbox. Use the blastn query program which lets you search nucleotide queries against a nucleotide database. Specify "search1" as the name of the output report file. By default, the report file format is the traditional BLAST pairwise format. This format presents each query-subject pair alignment in detail.

blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search1");

Open the file to review the search results. The first query sequence returns no hits, while the second query sequence returns multiple hits.

open search1;

You can also modify search options by creating a corresponding options object for the blastn query program. Use blastplusoptions or bioinfo.blastplus.*Options to create the options object. For instance, change the report format to an XML format.

bnopts = blastplusoptions("blastn"); % Or use bioinfo.blastplus.BLASTNOptions
bnopts.ReportFormat = "BLASTXML";
blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search2_xml",bnopts);
open search2_xml;

Alternatively, you can set the value of a property of the options object, such as ReportFormat, using name-value argument syntax. For example:

blastplus("blastn","queryFile.fasta","SRR26273031_nucl_db","search2_xml",ReportFormat="BLASTXML");

You can use other query programs to search the database. For instance, use tblastx to search translated nucleotide queries against a translated nucleotide database. Both query sequences return hits for this search. Use the compact tabular format for the report. For details about the generated columns and other report formats, see ReportFormat.

blastplus("tblastx","queryFile.fasta","SRR26273031_nucl_db","search3_tab",ReportFormat="Tabular");
open search3_tab;

Delete the reports and downloaded FASTA file.

delete search1 search2_xml search3_tab SRR26273031.fasta

References

[1] Camacho, Christiam, George Coulouris, Vahram Avagyan, Ning Ma, Jason Papadopoulos, Kevin Bealer, and Thomas L Madden. “BLAST+: Architecture and Applications.” BMC Bioinformatics 10, no. 1 (December 2009): 421.

[2] “BLAST: Basic Local Alignment Search Tool.” https://blast.ncbi.nlm.nih.gov/Blast.cgi.

[3] Shiryev, Sergey A., Jason S. Papadopoulos, Alejandro A. Schäffer, and Richa Agarwala. “Improved BLAST Searches Using Longer Words for Protein Seeding.” Bioinformatics 23, no. 21 (November 1, 2007): 2949–51.

Version History

Introduced in R2024a