Quick Start Parallel Computing in MATLAB
You can use parallel computing to carry out many calculations simultaneously. Split large problems into smaller ones, which you can process at the same time.
With parallel computing, you can:
Save time by distributing tasks and executing them simultaneously
Solve big data problems by partitioning data
Take advantage of your desktop computer resources and scale up to clusters and cloud computing
This table lists some essential parallel computing terms and their definitions.
Term | Definition |
---|---|
Thread | Smallest set of instructions that a CPU can schedule and execute independently. A GPU, multiprocessor, or multicore computer can perform multithreading, or executing multiple threads simultaneously. |
Process | Execution of an instance of a computer program by one or many threads. Each process has its own blocks of memory. |
Node | Standalone computer containing one or more CPUs or GPUs. Nodes can be networked to form a cluster or supercomputer. |
Cluster | Collection of interconnected computers that work together as a unified system to provide high-performance computing power for processing complex and data-intensive tasks. |
Scalability | Increase in parallel speedup with the addition of more resources. |
Prerequisites
To run the examples on this page, you must have a Parallel Computing Toolbox™ license. To determine whether you have Parallel Computing Toolbox installed, and whether your machine can create a default parallel pool, enter this code in the MATLAB® Command Window.
if canUseParallelPool disp("Parallel Computing Toolbox is installed") else disp("Parallel Computing Toolbox is not installed") end
Alternatively, to see which MathWorks® products you have installed, in the Command Window, enter
ver
.
Accelerate MATLAB Code
Before you parallelize your code, you can use techniques such as vectorization and preallocation to improve the sequential performance of your MATLAB code. Sequential acceleration and parallelization can often work together to give cumulative performance improvements.
Vectorization
MATLAB is optimized for operations involving matrices and vectors. The process of revising loop-based, scalar-oriented code to use MATLAB matrix and vector operations is called vectorization. Using vectorized code instead of loop-based operations often improves your code performance.
These code snippets compare the amount of time the software needs to calculate the square root of 1,000,000 values with loop-based code against vectorized code.
Without Vectorization | With Vectorization |
---|---|
tic for k = 1:1000000 x(k) = sqrt(k); end toc Elapsed time is 0.112298 seconds. |
tic k = 1:1000000; x = sqrt(k); toc Elapsed time is 0.006783 seconds. |
Preallocation
In some cases, while
- and for
-loops that incrementally
increase the size of an array each time through the loop can adversely affect
performance and memory use. You can preallocate the maximum amount of space
required for an array instead of continuously resizing arrays when you run
loop-based code.
These code snippets compare the amount of time the software needs to create a
scalar variable x
, when you gradually increase the size of
x
in a for
-loop against when you
preallocate a 1-by-1,000,000 block of memory for
x
.
Without Preallocation | With Preallocation |
---|---|
tic x = 0; for k = 2:1000000 x(k) = x(k-1) + 5; end toc Elapsed time is 0.103415 seconds. |
tic x = zeros(1,1000000); for k = 2:1000000 x(k) = x(k-1) + 5; end toc Elapsed time is 0.018758 seconds. |
This table shows the appropriate preallocation function for the type of array you want to initialize.
Array Type to Initialize | Preallocation Function |
---|---|
Numeric | zeros |
String | strings |
Cell | cell |
Table | table |
Run MATLAB on Multicore and Multiprocessor Nodes
MATLAB supports two ways to parallelize your code on multicore and multiprocessor nodes.
Implicit Parallelization with Built-in Multithreading
Some MATLAB functions implicitly use multithreading to parallelize their
execution. These functions automatically execute on multiple computational
threads in a single MATLAB session, which means they run faster on multicore-enabled
machines. Some examples are linear algebra and numerical functions such as
fft
, mldivide
, eig
, svd
, and sort
. Therefore, if you use
these functions on a machine with many cores, you can observe an increase in
performance.
Explicit Parallelization with MATLAB Workers
MATLAB and Parallel Computing Toolbox software uses MATLAB workers to explicitly parallelize your code. MATLAB workers are MATLAB computational engines that run in the background without a graphical desktop. The MATLAB session you interact with, also called the MATLAB client, instructs the workers with parallel language functions. You use Parallel Computing Toolbox functions to automatically divide tasks and assign them to these workers to execute the computations in parallel.
Set Up Environment for Explicit Parallelization
If you have Parallel Computing Toolbox installed on your machine, you can start an interactive parallel pool of workers to take advantage of the cores in your multicore computer.
A parallel pool (parpool
) is a group
of MATLAB workers on which you can interactively run code.
You can create a parallel pool of workers using parpool
or functions with automatic
parallel support. By default, parallel language functions such as parfor
, parfeval
, and spmd
automatically create a
parallel pool when you need one. When the workers start, your MATLAB session connects to them. For example, this code automatically starts
a parallel pool and runs the statement in the parfor
-loop in
parallel on six
workers.
parfor i = 1:100 c(i) = max(eig(rand(1000))); end
Starting parallel pool (parpool) using the 'Processes' profile ... Connected to parallel pool with 6 workers.
You can also use the parallel status indicator in the lower left corner of MATLAB desktop to start a parallel pool manually. Click the indicator icon, and then select Start Parallel Pool.
To stop a parallel pool while it is starting, press Ctrl+C or Ctrl+Break. On Apple macOS operating systems, you also can use command+ (the command key and the plus key).
Starting a parallel pool often takes a long time, which can impact performance for code that takes only a few seconds to execute. For longer running code, the overhead becomes less significant.
Your default parallel environment determines the parallel pool cluster. The
default parallel environment of your local machine is called
Processes
. This environment starts a parallel pool of process
workers. You can see the selection of available cluster profiles in the
Parallel menu on the MATLAB
Home tab.
Note
For the default Processes
profile, the default number of
process workers is one per physical CPU core using a single computational
thread. This restriction ensures that each worker has exclusive access to a
floating-point unit, and generally optimizes performance of computational code.
If your code is not computationally intensive, for example, code that is
input/output (I/O) intensive, then consider using up to two workers per physical
core. Running too many workers on too few resources can impact the performance
and stability of your machine.
This table summarizes the different ways you can create interactive parallel pools.
Parallel Environment | Worker Type | Location | Number of Available Cores or Threads |
---|---|---|---|
Processes | Process | Local machine | Up to 512 cores |
Threads | Thread | Local machine | Up to 512 threads |
backgroundPool | Thread | Local machine | Without a Parallel Computing Toolbox license: 1 thread |
With a Parallel Computing Toolbox license: Up to the number of threads that the
| |||
Cluster | Process | Onsite or cloud cluster | Up to the maximum number of workers the cluster can start |
Parallel Computing Toolbox also supports running a parallel pool of workers that are backed by
computing threads instead of process workers. This parallel environment is called
Threads
. Thread workers have reduced memory usage, faster
scheduling, and lower data transfer costs. However thread workers support only a
subset of the MATLAB functions that are available to process workers.
MATLAB also supports an additional local parallel environment called backgroundPool
. The backgroundPool
environment is
backed by thread workers and supports running code in the background while you run
other code in your session at the same time. You can use one thread worker in the
backgroundPool
environment when you do not have a Parallel Computing Toolbox license. If you have a Parallel Computing Toolbox license, the maximum number of thread workers in your
backgroundPool
is the value that the maxNumCompThreads
function
returns.
If you have access to onsite or cloud clusters, you can discover other clusters running on your network or on Cloud Center by clicking Parallel > Discover Clusters and following the prompts. Parallel pools on clusters are backed by process workers and support the full parallel language.
When you have an interactive parallel pool of workers, you can use parallel
language functions to split large problems into smaller tasks that workers can
execute in parallel. To accelerate your MATLAB code, use interactive parallel features such as parfor
.
Run Explicit Parallelization with parfor
-loop
This example shows how to convert a for
-loop into a parfor
-loop and calculate the scalability of the parfor
-loop with the number of workers.
You can convert for
-loops to run in parallel by using a parfor
-loop. Often, you can simply replace for
with parfor
. However, you often need to adjust your code further to run in it parallel.
Mechanics of parfor
-loops
When you run a parfor
-loop, MATLAB executes the statements in the loop body in parallel. Each execution of the parfor
-loop body is an iteration. The MATLAB client issues the parfor
command and coordinates with the workers to execute the loop iterations in parallel on the workers in a parallel pool. A parfor
-loop can provide significantly better performance than its analogous for
-loop because several workers compute iterations simultaneously.
When you run a parfor
-loop, the MATLAB client divides the loop iterations into subranges and assigns them to the workers. If the number of workers is equal to the number of loop iterations, each worker performs one iteration of the loop. If the number of iterations is greater than the number of workers, some workers perform more than one loop iteration. In this case, a worker receives multiple iterations at once to reduce communication time. The client also performs a static analysis of the parfor
-loop code to determine which data to transfer to each worker and which data to transfer back to the client. The client sends the necessary data to the workers, which execute most of the computation. The workers then send the results back to the client, which assembles those results. MATLAB workers evaluate iterations in no particular order and independently of each other. Because each iteration is independent, the iterations need not be synchronized, and often are not.
A parfor
-loop must satisfy these basic requirements.
Loop iterations are independent. When you convert your
for
-loop into aparfor
-loop, you must ensure that the loop iterations are independent. If yourparfor
code has dependence between the loop iterations, the Code Analyzer in the MATLAB Editor detects the dependence. Executing theparfor
-loop generates an error.
Loop execution are not in order. Because
parfor
-loop iterations have no guaranteed order, you must ensure that your code that uses aparfor
-loop does not rely on the output of theparfor
-loop being in order.
Convert for
-loops to parfor
-loops
Convert a for
-loop into a parfor
-loop in code that calculates the maximum value of the singular-value decomposition of 5000 200-by-200 random matrices by replacing for
with parfor
. Execute the parfor
-loop on six workers. Compare their execution times.
When you use parfor
and you have Parallel Computing Toolbox software installed, MATLAB automatically starts a parallel pool of workers. The parallel pool can take a long time to start. This example shows a second run with the pool already started. You can observe that the parfor
code executed on six workers runs much faster than the for
-loop code.
tic y = zeros(5000,1); for n = 1:5000 y(n) = max(svd(randn(200))); end toc
Elapsed time is 21.837346 seconds.
tic y = zeros(5000,1); parfor n = 1:5000 y(n) = max(svd(randn(200))); end toc
Elapsed time is 3.908282 seconds.
If the speed-up is less than you expect, you can calculate the scalability of your parfor
-loop code.
Calculate Scalability
You can calculate the scalability of converting this for
-loop into a parfor
-loop. Use the scalability to determine whether your parfor
-loop code scales well with the number of workers, and whether a limit exists.
Use a for
-loop to iterate through different numbers of workers to run the parfor
-loop. To specify the number of workers, use the second input argument of parfor
. You can modify the values in the NumWorkers
array to match your available resources.
numIterations = 5000; numWorkers = [1 2 3 4 5 6]; t = zeros(size(numWorkers)); for w = 1:numel(numWorkers) tic; y = zeros(numIterations,1); parfor (n = 1:numIterations,numWorkers(w)) y(n) = max(svd(randn(200))); end t(w) = toc; end
Calculate the speedup by computing the ratio between the computation time of a single worker and the computation time of each maximum number of workers. To calculate the efficiency of parallelizing the tasks, divide the ideal speedup by the calculated speedup.
speedup = t(1)./t; efficiency = (speedup./numWorkers).*100;
To visualize how the computations scale up with the number of workers, plot the speedup and efficiency against the number of workers with the comparePlot
function defined at the end of the example.
The speedup increases as the number of workers increases. Adding more workers shows a reduction in computation time, but the scaling is not perfect because the efficiency decreases as the number of workers increases. This is due to the overhead associated with parallelization. Parallel overhead includes the time the software needs for communication, coordination, and data transfer from the client to the workers and back.
parfor
-loops that do not have many iterations or computationally demanding tasks generally do not scale well with an increasing number of workers because the time the software needs for data transfer is significant compared with the time the software needs for computation.
comparePlot(numWorkers,speedup,efficiency);
After you finish your computation, you can delete the current parallel pool. Get the current parallel pool with the gcp
function.
delete(gcp)
Parallel pool using the 'Processes' profile is shutting down.
Helper Functions
This function plots the speedup and efficiency of the parfor
-loop against the number of workers.
function comparePlot(numWorkers,speedup,efficiency) yyaxis left plot(numWorkers,speedup,'-*') grid on title('Speedup and Efficiency with Number of Workers'); xlabel('Number of Workers'); xticks(numWorkers); ylabel('Speedup'); yyaxis right plot(numWorkers,efficiency,'--o'); ylabel('Efficiency') xticks(numWorkers); ylabel('Efficiency (%)'); legend('Speedup','Efficiency') end
Discover Other Parallel Language Functions
You can perform these tasks by using Parallel Computing Toolbox with other parallel language functions.
Perform asynchronous processing with
parfeval
.Speed up your calculation on the supported GPUs of your computer by using
gpuArray
.Scale up your computation using big data processing tools, such as
distributed
andtall
, with parallel pools.Offload your calculation to computer clusters or cloud computing facilities using
batch
.Run Simulink® models in parallel with
parsim
(Simulink) andbatchsim
(Simulink).Offload your calculation to a cluster onsite or in the cloud using MATLAB Parallel Server™ software. For more information, see Clusters and Clouds.
Several MathWorks products now offer built-in support for parallel computing products without requiring extra coding. For the current list of these products and their parallel functionality, see Parallel Computing Support in MATLAB and Simulink Products.
For more information about the parallel language functions and their applications, see Choose a Parallel Computing Solution and Parallel Language Decision Tables.
See Also
for
| parfor
| parfeval
| gpuArray
| distributed
| tall
| datastore
| mapreduce
| batch
| parsim
(Simulink) | batchsim
(Simulink)
Related Topics
- Vectorization
- Preallocation
- Choose a Parallel Computing Solution
- Parallel Language Decision Tables
- Run Code on Parallel Pools
- Run MATLAB Functions with Automatic Parallel Support
- Decide When to Use parfor
- Evaluate Functions in the Background Using parfeval
- Identify and Select a GPU Device
- Distributing Arrays to Parallel Workers
- Run Single Programs on Multiple Data Sets
- Run Batch Parallel Jobs