This example shows how to import a large set of flight data from a MongoDB® collection into the MATLAB® workspace using the Database Toolbox™ interface for MongoDB. To avoid out-of-memory issues with the Java® heap when retrieving many documents, use a loop to import large data in batches.
To run this example, you must first install the Database Toolbox interface for MongoDB. For details, see Database Toolbox Interface for MongoDB Installation.
Create a MongoDB connection to the database mongotest
. Here, the
database server dbtb01
hosts this database using port number
27017
.
server = "dbtb01"; port = 27017; dbname = "mongotest"; conn = mongo(server,port,dbname)
conn = mongo with properties: Database: 'mongotest' UserName: '' Server: {'dbtb01'} Port: 27017 CollectionNames: {'airlinesmall', 'employee', 'largedata' ... and 3 more} TotalDocuments: 23485919
conn
is the mongo
object that contains the
MongoDB connection. The object properties contain information about the
connection and the database.
The database name is mongotest
.
The user name is blank.
The database server is dbtb01
.
The port number is 27017
.
This database contains six document collections. The first three
collection names are airlinesmall
,
employee
, and
largedata
.
This database contains 23,485,919 documents.
Verify the MongoDB connection.
isopen(conn)
ans = logical 1
The database connection is successful because the isopen
function returns 1
. Otherwise, the database connection is closed.
Find the total number of documents totaldocs
in the
airlinesmall
collection for the years 1997 through 2010.
Use a MongoDB query to filter the flight data for the specified years.
collection = "airlinesmall"; mongoquery = '{"Year":{$gte:1997,$lte:2010}}'; totaldocs = count(conn,collection,'Query',mongoquery);
Estimate the batch size to be 15,000 documents. Define the MATLAB workspace variable for storing the retrieved data.
batchsize = 15000; flightdata = [];
You can change the batch size depending on the performance and memory capacity of your system.
Use a while
loop to retrieve flight data from the
collection. The variable flightdata
accumulates each batch of
retrieved data.
% Track number of documents read index = 0; while index < totaldocs % Retrieve documents in a batch localdata = find(conn,collection,'Query',mongoquery, ... 'Skip',index,'Limit',batchsize); % Store retrieved documents locally flightdata = [flightdata; localdata]; % Move to the next batch index = index + batchsize; end
Display information about the flightdata
variable. The
retrieved data is a structure array that contains 75,603 structures. Each
structure contains 30 fields of flight data.
whos flightdata
Name Size Bytes Class Attributes flightdata 75603x1 285102752 struct
close(conn)
close
| count
| find
| isopen
| mongo
| while