Perform search on local BLAST database to create BLAST report
blastlocal('InputQuery',
InputQueryValue
)
Data
= blastlocal('InputQuery', InputQueryValue
)
... blastlocal(..., 'Program', ProgramValue
,
...)
... blastlocal(..., 'Database', DatabaseValue
,
...)
... blastlocal(..., 'BlastPath', BlastPathValue
,
...)
... blastlocal(..., 'Expect', ExpectValue
,
...)
... blastlocal(..., 'Format', FormatValue
,
...)
... blastlocal(..., 'ToFile', ToFileValue
,
...)
... blastlocal(..., 'Filter', FilterValue
,
...)
... blastlocal(..., 'GapOpen', GapOpenValue
,
...)
... blastlocal(..., 'GapExtend', GapExtendValue
,
...)
... blastlocal(..., 'BLASTArgs', BLASTArgsValue
,
...)
InputQueryValue | Character vector or string specifying the file name or path and file name of a FASTA file
containing query nucleotide or amino acid
sequence(s). (This corresponds to the
blastall option
-i .) |
ProgramValue | Character vector or string specifying a BLAST program. Choices are:
(The
|
DatabaseValue | Character vector or string specifying a file name or path and file name of a local BLAST
database (formatted using the NCBI
formatdb function) to search.
Default is a local version of the
nr database in the MATLAB® current folder. (This corresponds to
the blastall option
-d .) |
BlastPathValue | Character vector or string specifying the full path to the blastall
executable file, including the name and extension
of the executable file. Default is the system
path. |
ExpectValue | Value specifying the statistical significance threshold for
matches against database sequences. Choices are any real number. Default
is 10 . (This corresponds to the blastall option -e .) |
FormatValue | Integer specifying the alignment format of the BLAST search results. Choices are:
(This corresponds to the
|
ToFileValue | Character vector or string specifying a file name or path and file name in which to save the
contents of the BLAST report. (This corresponds to
the blastall option
-o .) |
FilterValue | Controls the application of a filter (DUST filter for the blastn
program or SEG filter for other programs) to the query sequence(s).
Choices are true (default) or false .
(This corresponds to the blastall option -F .) |
GapOpenValue | Integer that specifies the penalty for opening a gap in the
alignment of sequences. Default is -1 . (This corresponds
to the blastall option -G .) |
GapExtendValue | Integer that specifies the penalty for extending a gap in the
alignment of sequences. Default is -1 . (This corresponds
to the blastall option -E .) |
BLASTArgsValue | NCBI blastall command, that is a character vector or string containing one
or more instances of
-
and the option associated with it, used to specify
input arguments. |
Data | MATLAB structure or array of structures (if multiple query sequences) containing fields corresponding to BLAST keywords and data from a local BLAST report. |
The Basic Local Alignment Search Tool (BLAST) offers a fast and powerful comparative analysis of protein and nucleotide sequences against known sequences in online or local databases.
To use the blastlocal
function, you must have a local copy of the NCBI
blastall
executable file (version 2.2.17) available from your
system. Run the downloaded executable and
configure it for your system. For convenience,
consider placing the NCBI
blastall
executable file on
your system path.
blastlocal('InputQuery',
submits
query sequence(s) specified by InputQueryValue
)InputQueryValue
,
a FASTA file containing nucleotide or amino acid sequence(s), for
a BLAST search of a local BLAST database, by calling a local version
of the NCBI blastall
executable file. The BLAST
search results are displayed in the MATLAB Command Window. (This
corresponds to the blastall
option -i
.)
returns
the BLAST search results in Data
= blastlocal('InputQuery', InputQueryValue
)Data
, a MATLAB structure
or array of structures (if multiple query sequences) containing fields
corresponding to BLAST keywords and data from a local BLAST report.
Data
contains a subset of the following
fields, based on the specified alignment format.
Field | Description |
---|---|
Algorithm | NCBI algorithm used to do a BLAST search. |
Query | Identifier of the query sequence submitted to a BLAST search. |
Length | Length of the query sequence. |
Database | All databases searched. |
Hits.Name | Name of a database sequence (subject sequence) that matched the query sequence. |
Hits.Score | Alignment score between the query sequence and the subject sequence. |
Hits.Expect | Expectation value for the alignment between the query sequence and the subject sequence. |
Hits.Length | Length of a subject sequence. |
Hits.HSPs.Score |
Pairwise alignment score for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.Expect | Expectation value for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.Identities | Identities (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.Positives | Identical or similar residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject amino acid sequence. NoteThis field applies only to translated nucleotide or amino acid query sequences and/or databases.
|
Hits.HSPs.Gaps | Nonaligned residues (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.Mismatches | Residues that are not similar to each other (match, possible, and percent) for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.Frame | Reading frame of the translated nucleotide sequence for a high-scoring
sequence pair between the query sequence and a subject sequence.NoteThis field applies only when performing translated searches,
that is, when using |
Hits.HSPs.Strand | Sense (Plus = 5' to 3' and Minus =
3' to 5') of the DNA strands for a high-scoring sequence pair between
the query sequence and a subject sequence. NoteThis field applies only when using a nucleotide query sequence and database. |
Hits.HSPs.Alignment | Three-row matrix showing the alignment for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.QueryIndices | Indices of the query sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.SubjectIndices | Indices of the subject sequence residue positions for a high-scoring sequence pair between the query sequence and a subject sequence. |
Hits.HSPs.AlignmentLength | Length of the pairwise alignment for a high-scoring sequence pair between the query sequence and a subject sequence. |
Alignment | Entire alignment for the query sequence and the subject sequence(s). |
Statistics | Summary of statistical details about the performed search, such as lambda values, gap penalties, number of sequences searched, and number of hits. |
... blastlocal(..., '
calls PropertyName
', PropertyValue
,
...)blastlocal
with optional properties
that use property name/property value pairs. You can specify one or
more properties in any order. Each PropertyName
must
be enclosed in single quotation marks and is case insensitive. These
property name/property value pairs are as follows.
... blastlocal(..., 'Program',
specifies the BLAST program. Choices are ProgramValue
,
...)'blastp'
(default), 'blastn'
, 'blastx'
, 'tblastn'
,
and 'tblastx'
. (This corresponds to the blastall
option -p
.)
For help in selecting an appropriate BLAST program, visit:
... blastlocal(..., 'Database',
specifies the local BLAST database (formatted using
the NCBI DatabaseValue
,
...)formatdb
function) to search. Default
is a local version of the nr
database in the MATLAB current
folder. (This corresponds to the blastall
option -d
.)
... blastlocal(..., 'BlastPath',
specifies the full path to the BlastPathValue
,
...)blastall
executable
file, including the name and extension of the executable file. Default
is the system path.
... blastlocal(..., 'Expect',
specifies a statistical significance threshold for
matches against database sequences. Choices are any real number. Default
is ExpectValue
,
...)10
. (This corresponds to the blastall
option -e
.)
You can learn more about the statistics of local sequence comparison
at:
... blastlocal(..., 'Format',
specifies the alignment format of the BLAST search
results. Choices are:FormatValue
,
...)
0
(default) — Pairwise
1
— Query-anchored, showing
identities
2
— Query-anchored, no identities
3
— Flat query-anchored,
showing identities
4
— Flat query-anchored,
no identities
5
— Query-anchored, no identities
and blunt ends
6
— Flat query-anchored,
no identities and blunt ends
7
— Not used
8
— Tabular
9
— Tabular with comment
lines
(This corresponds to the blastall
option -m
.)
... blastlocal(..., 'ToFile',
saves the contents of the BLAST report to the specified
file. (This corresponds to the ToFileValue
,
...)blastall
option -o
.)
... blastlocal(..., 'Filter',
specifies whether a filter (DUST filter for the blastn
program or SEG filter for other programs) is applied to the query
sequence(s). Choices are FilterValue
,
...)true
(default) or false
.
(This corresponds to the blastall
option -F
.)
... blastlocal(..., 'GapOpen',
specifies the penalty for opening a gap in the alignment
of sequences. Default is GapOpenValue
,
...)-1
. (This corresponds
to the blastall
option -G
.)
... blastlocal(..., 'GapExtend',
specifies the penalty for extending a gap in the alignment
of sequences. Default is GapExtendValue
,
...)-1
. (This corresponds
to the blastall
option -E
.)
... blastlocal(..., 'BLASTArgs',
specifies options using the input arguments for the NCBI
BLASTArgsValue
,
...)blastall
function.
BLASTArgsValue
is a character vector or
string containing one or more instances or
-
and the
option associated with it. For example, to specify the x
BLOSUM
45
matrix, you would use the following syntax:
blastlocal('InputQuery', ecoliquery.txt, 'BLASTArgs', '-M BLOSUM45')
Use the 'BlastArgs'
property to specify blastall
options
for which there are no corresponding property name/property value
pairs.
For a complete list of valid input arguments for the NCBI blastall
function,
make sure that the blastall
executable file is
located on your system path or current folder, then type the following
at your system's command prompt.
blastall -
You can also use the syntax and input arguments accepted by the NCBI
blastall
function, instead of the
property name/property value pairs listed previously. To do so,
supply a character vector or string containing multiple options
using the -
x
option
syntax. For example, you can
specify the ecoliquery.txt
FASTA file as your
query sequences, the blastp
program, and the
ecoli
local database, by using
blastlocal('-i ecoliquery.txt -p blastp -d ecoli')
For a complete list of valid input arguments for the NCBI blastall
function,
make sure that the blastall
executable file is
located on your system path or current folder, then type the following
at your system's command prompt.
blastall -
The following examples assume you have a FASTA nucleotide file
and a FASTA amino acid file for E. coli, such
as the files NC_004431.fna
and NC_004431.faa
,
saved to your MATLAB current folder.
Create local blastable databases from
the NC_004431.fna
and NC_004431.faa
FASTA
files by using the blastformat
function.
blastformat('inputdb', 'NC_004431.fna
', 'protein', 'false');
blastformat('inputdb', 'NC_004431.faa');
Use the getgenbank
function
to retrieve sequence information for the E. coli threonine
operon from the GenBank® database.
S = getgenbank('M28570');
Create a query file by using the fastawrite
function to create a FASTA
file named query_nt.fa
from this sequence information,
using only the accession number as the header.
S.Header = S.Accession; fastawrite('query_nt.fa', S);
Use MATLAB syntax to submit the query sequence
in the query_nt.fa
FASTA file for a BLAST search
of the local amino acid database NC_004431.faa
.
Specify the BLAST program blastx
. Return the BLAST
search results in results
, a MATLAB structure.
results = blastlocal('inputquery', 'query_nt.fa',... 'database', 'NC_004431.faa',... 'program', 'blastx');
If you have not already done so, create local blastable databases and a query file as described previously.
Use blastall
syntax to submit the
query sequence in the query_nt.fa
FASTA file for
a BLAST search of the local nucleotide database NC_004431.fna
.
Specify the BLAST program blastn
and an expectation
value of 0.0001
. Return the BLAST search results
in results
, a MATLAB structure.
results = blastlocal('-i query_nt.fa -d NC_004431.fna ... -p blastn -e 0.0001');
If you have not already done so, create local blastable databases and a query file as described previously.
Submit the query sequence in the
query_nt.fa
FASTA file for a
BLAST search of the local nucleotide database
NC_004431.fna
. Specify the
BLAST program blastn
and a
tabular alignment format. Save the contents of the
BLAST report to a file named
myecoli_nt.txt
.
blastlocal('inputquery', 'query_nt.fa',... 'database', 'NC_004431.fna', 'tofile',... 'myecoli_nt.txt', 'blastargs', '-p blastn -m 8');
[1] Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990). Basic local alignment search tool. J. Mol. Biol. 215, 403–410.
[2] Altschul, S.F., Madden, T.L., Schäffer, A.A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D.J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402.
blastformat
| blastncbi
| blastread
| blastreadlocal
| getblast