Training and Simulation

Train and simulate reinforcement learning agents

During training, the agent continuously updates its parameters to learn the optimal policy for a given environment. During simulation, the agent receives observations and a reward from the environment, and returns an action to the environment without updating its parameters.

Reinforcement Learning Toolbox™ provides functions for training agents and validating the training results through simulation. For an introduction to training and simulating agents, see Train Reinforcement Learning Agents.

Apps

Reinforcement Learning Designer

Design, train, and simulate reinforcement learning agents (Since R2021a)

Functions

expand all

Train Agents

`train`	Train reinforcement learning agents within a specified environment
`rlTrainingOptions`	Options for training reinforcement learning agents
`rlMultiAgentTrainingOptions`	Options for training multiple reinforcement learning agents (Since R2022a)
`trainWithEvolutionStrategy`	Train DDPG, TD3 or SAC agent using an evolutionary strategy within a specified environment (Since R2023b)
`rlEvolutionStrategyTrainingOptions`	Options for training off-policy reinforcement learning agents using an evolutionary strategy (Since R2023b)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (Since R2024a)

Train Agents Offline

`trainFromData`	Train off-policy reinforcement learning agent using existing data (Since R2023a)
`rlTrainingFromDataOptions`	Options to train reinforcement learning agents using existing data (Since R2023a)
`show`	Visualize a training result object in a new Reinforcement Learning Training Monitor window (Since R2024a)

Evaluate Agents During Training

`rlEvaluator`	Options for evaluating reinforcement learning agents during training (Since R2023b)
`rlCustomEvaluator`	Custom object for evaluating reinforcement learning agents during training (Since R2023b)

Log Data

`rlDataLogger`	Create either a file logger object or a monitor logger object to log training data (Since R2022b)
`rlDataViewer`	Open Reinforcement Learning Data Viewer tool (Since R2023a)
`FileLogger`	Log reinforcement learning training data to MAT files (Since R2022b)
`MonitorLogger`	Log reinforcement learning training data to monitor window (Since R2022b)
`trainingProgressMonitor`	Monitor and plot training progress for deep learning custom training loops (Since R2022b)
`setup`	Set up reinforcement learning environment or initialize data logger object (Since R2022a)
`store`	Store data in the internal memory of a (file or monitor) logger object (Since R2022b)
`write`	Transfer stored data from the internal logger memory to the logging target (Since R2022b)
`cleanup`	Clean up reinforcement learning environment or data logger object (Since R2022a)

Simulate Agents

`sim`	Simulate trained reinforcement learning agents within specified environment
`rlSimulationOptions`	Options for simulating a reinforcement learning agent within an environment

Experience Buffer

`rlReplayMemory`	Replay memory experience buffer (Since R2022a)
`rlPrioritizedReplayMemory`	Replay memory experience buffer with prioritized sampling (Since R2022b)
`rlHindsightReplayMemory`	Hindsight replay memory experience buffer (Since R2023a)
`rlHindsightPrioritizedReplayMemory`	Hindsight replay memory experience buffer with prioritized sampling (Since R2023a)
`append`	Append experiences to replay memory buffer (Since R2022a)
`sample`	Sample experiences from replay memory buffer (Since R2022a)
`resize`	Resize replay memory experience buffer (Since R2022b)
`allExperiences`	Return all experiences in replay memory buffer (Since R2022b)
`validateExperience`	Validate experiences for replay memory (Since R2023a)
`generateHindsightExperiences`	Generate hindsight experiences from hindsight experience replay buffer (Since R2023a)

Custom Training

`rlOptimizer`	Creates an optimizer object for actors and critics (Since R2022a)
`runEpisode`	Simulate reinforcement learning environment against policy or agent (Since R2022a)
`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (Since R2022a)
`update`	Update the state of on optimizer object and a set of learnable parameters using the gradient value (Since R2022a)
`evaluate`	Evaluate function approximator object given observation (or observation-action) input data (Since R2022a)
`setup`	Set up reinforcement learning environment or initialize data logger object (Since R2022a)
`cleanup`	Clean up reinforcement learning environment or data logger object (Since R2022a)
`Future`	Object that supports deferred outputs for reinforcement learning environment simulations running on workers (Since R2022a)
`fetchNext`	Retrieve next available unread outputs from a reinforcement learning environment simulations running on workers (Since R2022a)
`fetchOutputs`	Retrieve results from all reinforcement learning environment simulations running on workers (Since R2022a)
`cancel`	Cancel unfinished reinforcement learning environment simulations on workers (Since R2022a)
`wait`	Wait for reinforcement learning environment simulations running on a workers to finish (Since R2022a)
`dlfeval`	Evaluate deep learning model for custom training loops
`dlaccelerate`	Accelerate deep learning function for custom training loops (Since R2021a)
`AcceleratedFunction`	Accelerated deep learning function (Since R2021a)

Get and Set Parameters

`syncParameters`	Modify the learnable parameters of one approximator towards the learnable parameters of another approximator (Since R2022a)
`getLearnableParameters`	Obtain learnable parameter values from agent, function approximator, or policy object
`setLearnableParameters`	Set learnable parameter values of agent, function approximator, or policy object
`policyParameters`	Obtain structure of policy parameters to update policy during simulation or deployment (Since R2025a)
`updatePolicyParameters`	Update policy according to structure of policy parameters given as input argument (Since R2025a)

Blocks

RL Agent	Reinforcement learning agent
Policy	Reinforcement learning policy (Since R2022b)

Topics

Training and Simulation Basics

Train Reinforcement Learning Agents
Find the optimal policy by training your agent within a specified environment.
Train Reinforcement Learning Agent in Basic Grid World
Train Q-learning and SARSA agents to solve a grid world in MATLAB^®.
Train Reinforcement Learning Agent in MDP Environment
Train a reinforcement learning agent in a generic Markov decision process environment.

Use the Reinforcement Learning Designer App

Specify Training Options in Reinforcement Learning Designer
Interactively specify options for training reinforcement learning agents using the Reinforcement Learning Designer app.
Specify Simulation Options in Reinforcement Learning Designer
Interactively specify options for simulating reinforcement learning agents using the Reinforcement Learning Designer app.
Design and Train Agent Using Reinforcement Learning Designer
Design and train a DQN agent for a cart-pole system using the Reinforcement Learning Designer app.
Tune Hyperparameters Using Reinforcement Learning Designer
Search the hyperparameter space using Reinforcement Learning Designer.

Train Agents for Simulink Environment

Control Water Level in a Tank Using a DDPG Agent
Train a controller using reinforcement learning with a plant modeled in Simulink^® as the training environment.

Use Multiple Processes and GPUs

Train Agents Using Parallel Computing and GPUs
Accelerate agent training by running simulations in parallel on multiple cores, GPUs, clusters or cloud resources.
Train AC Agent to Balance Discrete Cart-Pole System Using Parallel Computing
Train an AC agent to control a discrete action space cart-pole system using asynchronous parallel computing.
Train DQN Agent for Lane Keeping Assist Using Parallel Computing
Train a DQN agent for an automated driving application using parallel computing.

Training and Simulation Advanced

Train PPO Agent with Curriculum Learning for a Lane Keeping Application
Train a PPO agent for a lane keeping assist task by gradually increasing task complexity.
Train DQN Agent Using Hindsight Experience Replay
Train a DQN agent in a navigation environment with sparse rewards.
Train Reinforcement Learning Agent Offline to Control Quanser QUBE Pendulum
Train TD3 agent offline to control a Quanser QUBE pendulum.
Train Biped Robot to Walk Using Evolution Strategy-Reinforcement Learning Agents
Train TD3 agent using evolutionary strategy.
Create DQN Agent Using Deep Network Designer and Train Using Image Observations
Create a reinforcement learning agent using the Deep Network Designer app from the Deep Learning Toolbox™.

Log Training Data and Tune Hyperparameters

Log Training Data to Disk
Log a variety of data to disk while training an agent.
Train Agent or Tune Environment Parameters Using Parameter Sweeping
Tune a DDPG agent using hyperparameter sweeping.
Tune Hyperparameters Using Bayesian Optimization
Tune reinforcement learning hyperparameters using Bayesian optimization.
Configure Exploration for Reinforcement Learning Agents
Use visualization to configure exploration in reinforcement learning agents.

Multi-Agent Training

Train Multiple Agents to Perform Collaborative Task
Train two continuous action space PPO agents to collaboratively move an object.
Train Multiple Agents for Area Coverage
Train three discrete action space PPO agents to explore a grid-world environment in a collaborative-competitive manner.
Train Multiple Agents for Path Following Control
Train a DQN and a DDPG agent to collaboratively perform adaptive cruise control and lane keeping assist to follow a path.

Develop Custom Agents and Training Algorithms

Train Reinforcement Learning Policy Using Custom Training Loop
Train a reinforcement learning policy using your own custom training loop.
Create and Train Custom PG Agent
Create a custom PG agent and train it using the built-in train function.
Create and Train Custom LQR Agent
Create a custom agent that solves an LQR problem and train it using the built-in train function.
Custom PPO Training Loop With Random Network Distillation
Use a custom training loop to train a custom PPO policy with random network distillation on a pendulum environment with sparse rewards.
Custom Training Loop with Simulink Action Noise
Use a custom training loop to train a continuous action space reinforcement learning policy in Simulink when action noise is generated within the model.

Train Model Based Policy Optimization Agents

Train MBPO Agent to Balance Continuous Cart-Pole System
A model-based reinforcement learning agent learns a model of its environment that it can use to generate additional experiences for training.
Model-Based Reinforcement Learning Using Custom Training Loop
Create a model-based reinforcement learning agent using a custom training loop.