databaseDatastore
Datastore for data in database
Description
MATLAB® has various datastores that let you import large data sets into
MATLAB for analysis. A DatabaseDatastore
object is a type of
datastore that contains data from a database table or the results from executing an SQL
query in a relational database. For details about other datastores, see Getting Started with Datastore.
With a DatabaseDatastore
object, you can preview and read records or
chunks in a data set and reset the DatabaseDatastore
to its initial
state. Also, you can analyze a large data set in a database using tall arrays or
MapReduce.
Reading data from DatabaseDatastore
objects is the same as executing
the fetch
function on the data set. Using
DatabaseDatastore
objects provides advantages that enable you to:
Work with databases containing large amounts of data.
Analyze large amounts of data using tall arrays with common MATLAB functions, such as
mean
andhistogram
. Create a tall array using thetall
function. For details, see Tall Arrays for Out-of-Memory Data.Write MapReduce algorithms that define the chunking and reduction of large amounts of data by using the
mapreduce
function. For details, see Getting Started with MapReduce. For an example, see Analyze Large Data in Database Using MapReduce.Create parallelizable workflows by using a parallel pool constant when you create the connection for your
databaseDatastore
object (since R2024a). For more information, seeparallel.pool.Constant
(Parallel Computing Toolbox). You can use thesetSecret
andgetSecret
functions to add and retrieve your user credentials when you create a parallel pool constant. For more information on security consideration topics, see Keep Sensitive Information Out of Code.
Creation
Syntax
Description
creates a dbds
= databaseDatastore(dataSourceInfo
,source
)DatabaseDatastore
object by using information
about the data source contained in dataSourceInfo
,
which can be either a connection object or a data source name. The datastore
contains data from a database table or the results from an executed SQL
query contained in source
.
customizes the options for importing a large data set from a database using
the dbds
= databaseDatastore(dataSourceInfo
,source
,opts
)SQLImportOptions
object.
specifies additional options using one or more name-value arguments. For
example, dbds
= databaseDatastore(___,Name,Value
)'ReadSize',100
retrieves 100 rows of data from
the DatabaseDatastore
object.
Input Arguments
Name-Value Arguments
Limitations
You can create parallelizable workflows by using the
DatabaseDatastore
object and specifying a parallel pool constant with the following database vendors:MySQL®
PostgreSQL
Microsoft® SQL Server®
MariaDB®
Oracle®
If you set the
VariableNamingRule
name-value argument to the value"modify"
:The variable names
Properties
,RowNames
, andVariableNames
are reserved identifiers for thetable
data type.The length of each variable name must be less than the number returned by
namelengthmax
.
If you are using Microsoft SQL Server, the
DatabaseDatastore
object supports versions 2012 and later.
Properties
Object Functions
hasdata | Determine if data in DatabaseDatastore is
available to read |
preview | Return subset of data from DatabaseDatastore |
read | Read data in DatabaseDatastore |
readall | Read all data in DatabaseDatastore |
reset | Reset DatabaseDatastore to initial
state |
close | Close and invalidate database and driver resource utilizer |
isPartitionable | Determine whether datastore is partitionable |
isShuffleable | Determine whether datastore is shuffleable |
partition | Partition a datastore |
Examples
Create databaseDatastore with Data Source Name
This example assumes you have already stored your username and password with your data source in the Database Explorer app.
Use the dataSourceInfo
input argument to specify the
name of your data source. Then, specify the query you want to
execute.
dataSourceInfo = "PostgreSQL Native"; query = "SELECT * FROM airlinesmall";
Create the databaseDatastore
.
dbds = databaseDatastore(dataSourceInfo,query)
dbds = DatabaseDatastore with properties: Connection: [1×1 parallel.pool.Constant] Query: 'SELECT * FROM airlinesmall' VariableNames: {1×29 cell} SelectedVariableNames: {1×29 cell} VariableNamingRule: 'preserve' ReadSize: 10000 RowFilter:<unconstrained>
Create DatabaseDatastore
Object Using SQL Query Results
Create a database connection to a MySQL (R) database using an ODBC driver. Then, create a DatabaseDatastore
object using the results from an SQL query and preview a large data set.
Create a database connection to the ODBC data source MySQL ODBC
. Specify the user name and password.
datasource = "MySQL ODBC"; username = "username"; password = "password"; dataSourceInfo = database(datasource,username,password);
Create a DatabaseDatastore
object using a database connection and an SQL query. This SQL query retrieves all flight data from the airlinesmall
table. databaseDatastore
executes the SQL query.
sqlquery = 'select * from airlinesmall';
dbds = databaseDatastore(dataSourceInfo,sqlquery)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.odbc.connection] Query: 'select * from airlinesmall' VariableNames: {1×29 cell} SelectedVariableNames: {1×29 cell} VariableNamingRule: 'modify' ReadSize: 10000
dbds
is a DatabaseDatastore
object with these properties:
Connection
-- Database connection objectQuery
-- Executed SQL queryVariableNames
-- List of column names from the executed SQL queryReadSize
-- Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MySQL ODBC' UserName: 'root' Message: '' Type: 'ODBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 0 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'information_schema', 'mysql', 'performance_schema' ... and 3 more} Schemas: {} Database and Driver Information: DatabaseProductName: 'MySQL' DatabaseProductVersion: '8.0.3-rc-log' DriverName: 'myodbc8a.dll' DriverVersion: '08.00.0016'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the large data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans = 8×29 table Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1990 9 22 6 1801 1750 2005 1938 {'NW'} 209 {'NA'} 124 108 {'NA'} 27 11 {'PHL'} {'DTW'} 453 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 11 2 908 910 1613 1554 {'NW'} 248 {'NA'} 245 224 {'NA'} 19 -2 {'PHX'} {'DTW'} 1671 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 2 7 NaN 1805 NaN 1900 {'NW'} 284 {'NA'} NaN 55 {'NA'} NaN NaN {'JAN'} {'MEM'} 189 {'NA'} {'NA'} 1 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 29 6 1434 1435 1615 1630 {'NW'} 305 {'NA'} 221 235 {'NA'} -15 -1 {'MSP'} {'LAX'} 1536 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 3 1 925 755 1258 1144 {'NW'} 350 {'NA'} 153 169 {'NA'} 74 90 {'MSP'} {'BOS'} 1124 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 22 6 900 900 1241 1222 {'AA'} 11 {'NA'} 401 382 {'NA'} 19 0 {'BOS'} {'LAX'} 2611 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 20 4 1338 1335 1853 1907 {'AA'} 62 {'NA'} 255 272 {'NA'} -14 3 {'ORD'} {'SJU'} 2072 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 3 1 710 711 837 847 {'AA'} 101 {'NA'} 147 156 {'NA'} -10 -1 {'DTW'} {'DFW'} 987 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
Close the DatabaseDatastore
object and the database connection.
close(dbds)
Create DatabaseDatastore
Object Using Database Table
Retrieve a large data set from a database table by creating a DatabaseDatastore
object. This example uses a MySQL® database.
Create a database connection to a MySQL database with the username and password.
datasource = "MySQL ODBC"; username = "username"; password = "password"; dataSourceInfo = database(datasource,username,password);
Load flight information in the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight information.
tablename = 'flights';
sqlwrite(dataSourceInfo,tablename,flights)
Create a DatabaseDatastore
object using a database connection and the flights
database table.
dbds = databaseDatastore(dataSourceInfo,tablename)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.odbc.connection] Query: 'SELECT * from flights' VariableNames: {1×29 cell} SelectedVariableNames: {1×29 cell} VariableNamingRule: 'modify' ReadSize: 10000
dbds
is a DatabaseDatastore
object with these properties:
Connection
— Database connection objectQuery
— Executed SQL queryVariableNames
— List of column names from the executed SQL queryReadSize
— Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MySQL ODBC' UserName: 'root' Message: '' Type: 'ODBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 0 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'information_schema', 'mysql', 'performance_schema' ... and 3 more} Schemas: {} Database and Driver Information: DatabaseProductName: 'MySQL' DatabaseProductVersion: '8.0.3-rc-log' DriverName: 'myodbc8a.dll' DriverVersion: '08.00.0016'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans=8×29 table
Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay SDelay SecurityDelay LateAircraftDelay
____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ __________ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ______ _____________ _________________
1996 1 18 4 2117 2120 2305 2259 {'HP'} 415 {'N637AW'} 108 99 85 6 -3 {'COS'} {'PHX'} 551 5 18 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 12 5 1252 1245 1511 1500 {'HP'} 610 {'N905AW'} 79 75 58 11 7 {'LAX'} {'PHX'} 370 3 18 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 16 2 1441 1445 1708 1721 {'HP'} 211 {'N165AW'} 87 96 74 -13 -4 {'RNO'} {'PHX'} 601 4 9 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 1 1 2258 2300 2336 2335 {'HP'} 1245 {'N183AW'} 38 35 20 1 -2 {'TUS'} {'PHX'} 110 6 12 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 4 4 1814 1814 1901 1910 {'US'} 683 {'N963VJ'} 47 56 34 -9 0 {'DTW'} {'PIT'} 201 6 7 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 31 3 1822 1820 1934 1925 {'US'} 757 {'N912VJ'} 72 65 52 9 2 {'PHL'} {'PIT'} 267 6 14 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 18 4 729 730 841 843 {'US'} 1564 {'N941VJ'} 72 73 58 -2 -1 {'DCA'} {'PVD'} 357 3 11 0 NaN 0 NaN NaN NaN NaN NaN
1996 1 26 5 1704 1705 1829 1839 {'NW'} 1538 {'N960N' } 85 94 69 -10 -1 {'DTW'} {'RIC'} 456 3 13 0 NaN 0 NaN NaN NaN NaN NaN
Close the DatabaseDatastore
object and the database connection.
close(dbds)
Create a Parallelizable databaseDatastore
Object
Create a parallelizable databaseDatastore
object by using a parallel.pool.Constant
(Parallel Computing Toolbox) object.
You can use the setSecret
and getSecret
functions to store and retrieve your user
credentials.
Create a query to use on your data set.
query = "SELECT col1, col2, col3 from table where col1 > ____ & col1 < ____";
Store your user credentials.
setSecret("PostgreSQL.username"); setSecret("PostgreSQL.password");
Create a parallel pool constant and specify your user credentials by using
the getSecret
function.
dataSourceInfo = parallel.pool.Constant(@()postgresql(getSecret("PostgreSQL.username"),getSecret("PostgreSQL.password"), ... "Server","localhost","DatabaseName","toy_store"),@close);
Create a databaseDatastore
object and read in your data
in parallel.
dbds = databaseDatastore(dataSourceInfo,query); data = readall(dbds,UseParallel=true);
Create DatabaseDatastore
Object Using Custom Import Options
Customize import options when importing a large data set from
a database table. Control the import options by creating an
SQLImportOptions
object. Then, customize the import
options for database columns that contain logical data. Import and preview the
data by creating a DatabaseDatastore
object and using the
preview
function.
This example uses the airlinesmall_subset.xls
spreadsheet,
which contains the column Cancelled
. Also, the example uses a
Microsoft® SQL Server® Version 11.00.2100 database and the Microsoft SQL Server
Driver 11.00.5058.
Create a database connection to a Microsoft SQL Server database with Windows® authentication. Specify a blank username and password.
datasource = 'MS SQL Server Auth'; dataSourceInfo = database(datasource,'','');
Load flight information into the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight
information.
tablename = 'flights';
sqlwrite(dataSourceInfo,tablename,flights)
Create an SQLImportOptions
object using the
flights
database table with the
databaseImportOptions
function.
opts = databaseImportOptions(dataSourceInfo,tablename);
Retrieve the default import options for the
Cancelled
variable.
varnames = 'Cancelled';
varOpts = getoptions(opts,varnames)
varOpts = SQLVariableImportOptions with properties: Variable Properties : Name: 'Cancelled' Type: 'double' FillValue: NaN
Set the import options for the data type of the specified variable to
logical
. Also, set the import options to replace missing data
in the specified variable with the fill value true
.
opts = setoptions(opts,varnames,'Type','logical', ... 'FillValue',true);
Create the DatabaseDatastore
object to import a large data set
using the import options.
dbds = databaseDatastore(dataSourceInfo,tablename,opts);
Import the logical data in the selected variable and display a preview of the
data. The imported data shows that the variable has the logical
data type.
opts.SelectedVariableNames = varnames; data = preview(dbds); cancelled = data.Cancelled
cancelled = 8×1 logical array
0
0
0
0
0
0
0
0
Delete the flights
database table using the
execute
function.
sqlquery = ['DROP TABLE ' tablename];
execute(dataSourceInfo,sqlquery)
Close the database connection.
close(dataSourceInfo)
Create DatabaseDatastore
Object with Specific Record Count
Create a database connection using an ODBC driver. Then, create a DatabaseDatastore
object by setting the ReadSize
property, and preview a large data set.
Create a database connection to the ODBC data source MySQL ODBC
. Specify the user name and password.
datasource = "MySQL ODBC"; username = "username"; password = "password"; dataSourceInfo = database(datasource,username,password);
Create a DatabaseDatastore
object using a database connection and an SQL query. This SQL query retrieves all flight data from the airlinesmall
table. Specify reading a maximum of 1000 records from the executed SQL query. databaseDatastore
executes the SQL query.
sqlquery = 'select * from airlinesmall'; dbds = databaseDatastore(dataSourceInfo,sqlquery,'ReadSize',1000)
dbds = DatabaseDatastore with properties: Connection: [1×1 database.odbc.connection] Query: 'select * from airlinesmall' VariableNames: {1×29 cell} SelectedVariableNames: {1×29 cell} VariableNamingRule: 'modify' ReadSize: 1000
dbds
is a DatabaseDatastore
object with these properties:
Connection
-- Database connection objectQuery
-- Executed SQL queryVariableNames
-- List of column names from the executed SQL queryReadSize
-- Maximum number of records to read from the executed SQL query
Display the database connection property.
dbds.Connection
ans = connection with properties: DataSource: 'MySQL ODBC' UserName: 'root' Message: '' Type: 'ODBC Connection Object' Database Properties: AutoCommit: 'on' ReadOnly: 'off' LoginTimeout: 0 MaxDatabaseConnections: 0 Catalog and Schema Information: DefaultCatalog: 'toy_store' Catalogs: {'information_schema', 'mysql', 'performance_schema' ... and 3 more} Schemas: {} Database and Driver Information: DatabaseProductName: 'MySQL' DatabaseProductVersion: '8.0.3-rc-log' DriverName: 'myodbc8a.dll' DriverVersion: '08.00.0016'
The Message
property is blank when the database connection is successful.
Preview the first eight records in the large data set returned by executing the SQL query in the DatabaseDatastore
object.
preview(dbds)
ans = 8×29 table Year Month DayofMonth DayOfWeek DepTime CRSDepTime ArrTime CRSArrTime UniqueCarrier FlightNum TailNum ActualElapsedTime CRSElapsedTime AirTime ArrDelay DepDelay Origin Dest Distance TaxiIn TaxiOut Cancelled CancellationCode Diverted CarrierDelay WeatherDelay NASDelay SecurityDelay LateAircraftDelay ____ _____ __________ _________ _______ __________ _______ __________ _____________ _________ _______ _________________ ______________ _______ ________ ________ _______ _______ ________ ______ _______ _________ ________________ ________ ____________ ____________ ________ _____________ _________________ 1990 9 22 6 1801 1750 2005 1938 {'NW'} 209 {'NA'} 124 108 {'NA'} 27 11 {'PHL'} {'DTW'} 453 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 11 2 908 910 1613 1554 {'NW'} 248 {'NA'} 245 224 {'NA'} 19 -2 {'PHX'} {'DTW'} 1671 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 2 7 NaN 1805 NaN 1900 {'NW'} 284 {'NA'} NaN 55 {'NA'} NaN NaN {'JAN'} {'MEM'} 189 {'NA'} {'NA'} 1 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 29 6 1434 1435 1615 1630 {'NW'} 305 {'NA'} 221 235 {'NA'} -15 -1 {'MSP'} {'LAX'} 1536 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 3 1 925 755 1258 1144 {'NW'} 350 {'NA'} 153 169 {'NA'} 74 90 {'MSP'} {'BOS'} 1124 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 22 6 900 900 1241 1222 {'AA'} 11 {'NA'} 401 382 {'NA'} 19 0 {'BOS'} {'LAX'} 2611 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 20 4 1338 1335 1853 1907 {'AA'} 62 {'NA'} 255 272 {'NA'} -14 3 {'ORD'} {'SJU'} 2072 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'} 1990 9 3 1 710 711 837 847 {'AA'} 101 {'NA'} 147 156 {'NA'} -10 -1 {'DTW'} {'DFW'} 987 {'NA'} {'NA'} 0 {'NA'} 0 {'NA'} {'NA'} {'NA'} {'NA'} {'NA'}
Close the DatabaseDatastore
object and the database connection.
close(dbds)
Create DatabaseDatastore
Object Using Custom Import Options and Database Catalog and Schema
Customize import options when importing a large data set from
a database table. Control the import options by creating an
SQLImportOptions
object. Then, customize the import
options for database columns that contain logical data. Create a
DatabaseDatastore
object using the specified database
catalog and schema. Import the database data and preview it by using the
preview
function with the
DatabaseDatastore
object.
This example uses the airlinesmall_subset.xls
spreadsheet,
which contains the column Cancelled
. Also, the example uses a
Microsoft® SQL Server® Version 11.00.2100 database and the Microsoft SQL Server
Driver 11.00.5058.
Create a database connection to a Microsoft SQL Server database with Windows® authentication. Specify a blank username and password.
datasource = 'MS SQL Server Auth'; dataSourceInfo = database(datasource,'','');
Load flight information into the MATLAB® workspace.
flights = readtable('airlinesmall_subset.xlsx');
Create the flights
database table using the flight information
and the toy_store
database catalog and dbo
database schema.
tablename = 'flights'; sqlwrite(dataSourceInfo,tablename,flights, ... 'Catalog','toy_store','Schema','dbo')
Create an SQLImportOptions
object using the
flights
database table and the
databaseImportOptions
function. Specify the
toy_store
database catalog and dbo
database schema.
opts = databaseImportOptions(dataSourceInfo,tablename, ... 'Catalog','toy_store','Schema','dbo');
Retrieve the default import options for the
Cancelled
variable.
varnames = 'Cancelled';
varOpts = getoptions(opts,varnames)
varOpts = SQLVariableImportOptions with properties: Variable Properties : Name: 'Cancelled' Type: 'double' FillValue: NaN
Set the import options for the data type of the specified variable to
logical
. Also, set the import options to replace missing data
in the specified variable with the fill value true
.
opts = setoptions(opts,varnames,'Type','logical', ... 'FillValue',true);
Create the DatabaseDatastore
object to import a large data set
using import options, the toy_store
database catalog, and the
dbo
database schema.
dbds = databaseDatastore(dataSourceInfo,tablename,opts, ... 'Catalog','toy_store','Schema','dbo');
Import the logical data in the selected variable and display a preview of the
data. The imported data shows that the variable has the logical
data type.
opts.SelectedVariableNames = varnames; data = preview(dbds); cancelled = data.Cancelled
cancelled = 8×1 logical array
0
0
0
0
0
0
0
0
Delete the flights
database table from the
toy_store
database catalog and the dbo
database schema by using the execute
function.
sqlquery = ['DROP TABLE toy_store.dbo.' tablename];
execute(dataSourceInfo,sqlquery)
Close the database connection.
close(dataSourceInfo)
Version History
Introduced in R2014bSee Also
database
| sqlread
| fetch
| databaseImportOptions
| setoptions
| getoptions
| execute
| reset
| preview
| mysql