splitsqlquery
Split SQL query using paging
Syntax
Description
splits an SQL query into a basket of multiple SQL queries. By default, each SQL
query in the basket returns 100,000 rows in a batch. The resulting number of SQL
queries in the basket depends on the size of the original SQL query results.querybasket
= splitsqlquery(conn
,sqlquery
)
specifies a custom batch size for the number of rows returned by each SQL query in
the basket.querybasket
= splitsqlquery(conn
,sqlquery
,'SplitSize',splitsize
)
Examples
Access Large Data from SQL Query Using Database Toolbox™
Determine the minimum arrival delay using a large set of flight data stored in a database. Access the database in a serial MATLAB® environment.
Using the splitsqlquery
function, you can split the original SQL query into multiple SQL page queries. Then, you can access large data in chunks by using the fetch
function.
To run this example, you must configure a JDBC data source. For more information, see the
function.configureJDBCDataSource
Create a database connection to the JDBC data source MSSQLServerJDBCAuth
. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.
datasource = "MSSQLServerJDBCAuth"; username = ""; password = ""; conn = database(datasource,username,password);
Define an SQL query to select all columns from the airlinesmall
table, which contains 123,523 rows and 29 columns.
sqlquery = 'SELECT * FROM airlinesmall';
Split the original SQL query into multiple page queries and display them.
querybasket = splitsqlquery(conn,sqlquery)
querybasket = 2×1 string array " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 100000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 100000 ROWS FETCH NEXT 23523 ROWS ONLY"
The query basket contains the page queries in a string array. The splitsqlquery
function splits the queries using the default number of rows (100,000).
Define the airlinesdata
variable.
airlinesdata = [];
Define the minimum arrival delay minArrDelay
variable.
minArrDelay = [];
Execute the SQL page queries in querybasket
by using a for
loop, and import the data in chunks. Execute SQL page queries in the query basket, and import large data using the fetch
function. Find and store the local minimum arrival delay for each chunk.
for i = 1: length(querybasket) local_airlinesdata = fetch(conn,querybasket(i)); local_minArrDelay = min(local_airlinesdata.ArrDelay); minArrDelay = [minArrDelay; local_minArrDelay]; end
Find the minimum arrival delay from all the stored delays.
minArrDelay = min(minArrDelay)
minArrDelay = -64
Close the database connection.
close(conn)
Access Large Data from SQL Query Using Database Toolbox and Parallel Computing Toolbox
Determine the minimum arrival delay using a large set of flight data stored in a database. Access the database using a parallel pool.
To initialize a parallel pool with a JDBC database connection, you must configure a JDBC data source. For more information, see the
function.configureJDBCDataSource
Using the splitsqlquery
function, you can split the original SQL query into multiple SQL page queries. Then, you can access large data in chunks by executing each SQL page query on a separate worker in the pool.
When you import large data, the performance depends on the SQL query, amount of data, machine specifications, and type of data analysis. To manage the performance, use the splitsize
input argument of the splitsqlquery
function.
If you have a MATLAB® Parallel Server™ license, then use the
function with the cluster profile of your choice instead of the parpool
(Parallel Computing Toolbox)
function.gcp
(Parallel Computing Toolbox)
Create a database connection to the JDBC data source MSSQLServerJDBCAuth
. This data source configures a JDBC driver to a Microsoft® SQL Server® database with Windows® authentication. Specify a blank user name and password.
datasource = "MSSQLServerJDBCAuth"; username = ""; password = ""; conn = database(datasource,username,password);
Define an SQL query to select all columns from the airlinesmall
table, which contains 123,523 rows and 29 columns.
sqlquery = 'SELECT * FROM airlinesmall';
Split the original SQL query into multiple page queries and display them. Specify a split size of 10,000 rows.
splitsize = 10000;
querybasket = splitsqlquery(conn,sqlquery,'SplitSize',splitsize)
querybasket = 13×1 string array " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 0 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 10000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 20000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 30000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 40000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 50000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 60000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 70000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 80000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 90000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 100000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 110000 ROWS FETCH NEXT 10000 ROWS ONLY" " SELECT * FROM (SELECT * FROM airlinesmall) temp ORDER BY 1 OFFSET 120000 ROWS FETCH NEXT 3523 ROWS ONLY"
The query basket contains the page queries in a string array. Each SQL query in the basket, except the last one, returns 10,000 rows.
Close the database connection.
close(conn)
Start the parallel pool.
pool = gcp;
Starting parallel pool (parpool) using the 'local' profile ... Connected to the parallel pool (number of workers: 6).
Initialize the parallel pool using the JDBC data source.
c = createConnectionForPool(pool,datasource,username,password);
Define the airlinesdata
variable.
airlinesdata = [];
Define the minimum arrival delay minArrDelay
variable.
minArrDelay = [];
Use the parfor
function to parallelize data access using the query basket.
For each worker:
Retrieve the database connection object.
Execute the SQL page query from the query basket and import data locally.
Find the local minimum arrival delay.
Store the local minimum arrival delay.
parfor i = 1: length(querybasket) conn = c.Value; local_airlinesdata = fetch(conn,querybasket(i)); local_minArrDelay = min(local_airlinesdata.ArrDelay); minArrDelay = [minArrDelay; local_minArrDelay]; end
Find the minimum arrival delay using the stored delays from each worker.
minArrDelay = min(minArrDelay)
minArrDelay = -64
Close the parallel pool.
delete(pool)
Input Arguments
conn
— Database connection
connection
object
Database connection, specified as a connection
object created with the
database
function, connection
object created with the mysql
function, connection
object created with the postgresql
function, or sqlite
object.
Create a parallelizable databaseDatastore
object by first creating
a parallel pool constant. You can use the getSecret
function to retrieve your user credentials when you create this constant.
Example: conn =
parallel.pool.Constant(@()postgresql(getSecret("PostgreSQL.username"),getSecret("Postgresql.password"),"Server","localhost","DatabaseName","toy_store"),@close);
sqlquery
— SQL statement
character vector | string scalar
SQL statement, specified as a character vector or string scalar.
For information about the SQL query language, see the SQL Tutorial.
Example: SELECT * FROM invoice
selects all columns and rows
from the invoice
table.
Data Types: char
| string
splitsize
— SQL query split size
100000
(default) | numeric scalar
SQL query split size, specified as a numeric scalar. Specify this number to split an SQL query into a custom number of rows for each batch.
If the total number of rows returned from the original SQL query is less
than 100,000 (the default), then the splitsqlquery
function returns the original SQL query. Use this input argument to specify
a smaller number of rows in a batch.
Data Types: double
Output Arguments
querybasket
— SQL query basket
string array
SQL query basket, returned as a string array. Each SQL query in the basket is returned as a string scalar in the string array.
You can execute each SQL query in the basket using the fetch
function. Or, you
can run a parallel pool and assign each SQL query to a worker for
execution.
Limitations
The
splitsqlquery
function supports these databases only:Microsoft® SQL Server® 2012 and later
Oracle®
MySQL®
PostgreSQL
SQLite
Amazon Redshift®
Amazon Aurora®
Google® Cloud SQL that runs an instance of MySQL or PostgreSQL
MariaDB®
If the
connection
object uses an unsupported database, thesplitsqlquery
function displays a warning and returns the original SQL query.The
splitsqlquery
function does not support the MATLAB® interface to SQLite.
Version History
Introduced in R2017b
See Also
fetch
| database
| close
| javaaddpath
| addAttachedFiles
(Parallel Computing Toolbox) | parpool
(Parallel Computing Toolbox) | parfevalOnAll
(Parallel Computing Toolbox) | parallel.pool.Constant
(Parallel Computing Toolbox)
External Websites
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)