rlReplayMemory

Replay memory experience buffer

Since R2022a

Description

An off-policy reinforcement learning agent stores experiences in a circular experience buffer.

During training the agent stores each of its experiences (S,A,R,S',D) in the buffer. Here:

S is the current observation of the environment.
A is the action taken by the agent.
R is the reward for taking action A.
S' is the next observation after taking action A.
D is the is-done signal after taking action A.

The agent then samples mini-batches of experiences from the buffer and uses these mini-batches to update its actor and critic function approximators.

By default, built-in off-policy agents (DQN, DDPG, TD3, SAC, MBPO) use an rlReplayMemory object as their experience buffer. Agents uniformly sample data from this buffer.

You can replace the default experience buffer using one of the following alternative buffer objects.

rlPrioritizedReplayMemory — Prioritized nonuniform sampling of experiences
rlHindsightReplayMemory — Uniform sampling of experiences and generation of hindsight experiences by replacing goals with goal measurements
rlHindsightPrioritizedReplayMemory — Prioritized nonuniform sampling of experiences and generation of hindsight experiences

When you create a custom off-policy reinforcement learning agent, you can create an experience buffer using an rlReplayMemory object.

Creation

Syntax

buffer = rlReplayMemory(obsInfo,actInfo)

buffer = rlReplayMemory(obsInfo,actInfo,maxLength)

Description

buffer = rlReplayMemory(obsInfo,actInfo) creates a replay memory experience buffer that is compatible with the observation and action specifications in obsInfo and actInfo, respectively.

buffer = rlReplayMemory(obsInfo,actInfo,maxLength) sets the maximum length of the buffer by setting the MaxLength property.

example

Input Arguments

expand all

`obsInfo` — Observation specifications
specification object | array of specification objects

Observation specifications, specified as a reinforcement learning specification object or an array of specification objects defining properties such as dimensions, data types, and names of the observation signals.

You can extract the observation specifications from an existing environment or agent using getObservationInfo. You can also construct the specifications manually using rlFiniteSetSpec or rlNumericSpec.

`actInfo` — Action specifications
specification object | array of specification objects

Action specifications, specified as a reinforcement learning specification object defining properties such as dimensions, data types, and names of the action signals.

You can extract the action specifications from an existing environment or agent using getActionInfo. You can also construct the specification manually using rlFiniteSetSpec or rlNumericSpec.

Properties

expand all

`MaxLength` — Maximum buffer length
Read-only: `10000` (default) | nonnegative integer

This property is read-only.

Maximum buffer length, specified as a nonnegative integer.

To change the maximum buffer length, use the resize function.

`Length` — Number of experiences in buffer
Read-only: `0` (default) | nonnegative integer

This property is read-only.

Number of experiences in buffer, specified as a nonnegative integer.

Object Functions

`append`	Append experiences to replay memory buffer
`sample`	Sample experiences from replay memory buffer
`resize`	Resize replay memory experience buffer
`reset`	Reset environment, agent, experience buffer, or policy object
`allExperiences`	Return all experiences in replay memory buffer
`validateExperience`	Validate experiences for replay memory
`getActionInfo`	Obtain action data specifications from reinforcement learning environment, agent, or experience buffer
`getObservationInfo`	Obtain observation data specifications from reinforcement learning environment, agent, or experience buffer

Examples

collapse all

Create Experience Buffer

Open Live Script

Define observation specifications for the environment. For this example, assume that the environment has a single observation channel with three continuous signals in specified ranges.

obsInfo = rlNumericSpec([3 1],...
    LowerLimit=0,...
    UpperLimit=[1;5;10]);

Define action specifications for the environment. For this example, assume that the environment has a single action channel with two continuous signals in specified ranges.

actInfo = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[5;10]);

Create an experience buffer with a maximum length of 20,000.

buffer = rlReplayMemory(obsInfo,actInfo,20000);

Append a single experience to the buffer using a structure. Each experience contains the following elements: current observation, action, next observation, reward, and is-done.

For this example, create an experience with random observation, action, and reward values. Indicate that this experience is not a terminal condition by setting the IsDone value to 0.

exp.Observation = {obsInfo.UpperLimit.*rand(3,1)};
exp.Action = {actInfo.UpperLimit.*rand(2,1)};
exp.Reward = 10*rand(1);
exp.NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
exp.IsDone = 0;

Before appending experience to the buffer, you can validate whether the experience is compatible with the buffer. The validateExperience function generates an error if the experience is incompatible with the buffer.

validateExperience(buffer,exp)

Append the experience to the buffer.

append(buffer,exp);

You can also append a batch of experiences to the experience buffer using a structure array. For this example, append a sequence of 100 random experiences, with the final experience representing a terminal condition.

for i = 1:100
    expBatch(i).Observation = {obsInfo.UpperLimit.*rand(3,1)};
    expBatch(i).Action = {actInfo.UpperLimit.*rand(2,1)};
    expBatch(i).Reward = 10*rand(1);
    expBatch(i).NextObservation = {obsInfo.UpperLimit.*rand(3,1)};
    expBatch(i).IsDone = 0;
end
expBatch(100).IsDone = 1;

validateExperience(buffer,expBatch)

append(buffer,expBatch);

After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 50 experiences from the buffer.

miniBatch = sample(buffer,50);

You can sample a horizon of data from the buffer. For example, sample a horizon of 10 consecutive experiences with a discount factor of 0.95.

horizonSample = sample(buffer,1,...
    NStepHorizon=10,...
    DiscountFactor=0.95);

The returned sample includes the following information.

Observation and Action are the observation and action from the first experience in the horizon.
NextObservation and IsDone are the next observation and termination signal from the final experience in the horizon.
Reward is the cumulative reward across the horizon using the specified discount factor.

You can also sample a sequence of consecutive experiences. In this case, the structure fields contain arrays with values for all sampled experiences.

sequenceSample = sample(buffer,1,...
    SequenceLength=20);

Create Experience Buffer with Multiple Observation Channels

Open Live Script

Define observation specifications for the environment. For this example, assume that the environment has two observation channels: one channel with two continuous observations and one channel with a three-valued discrete observation.

obsContinuous = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[1;5]);
obsDiscrete = rlFiniteSetSpec([1 2 3]);
obsInfo = [obsContinuous obsDiscrete];

Define action specifications for the environment. For this example, assume that the environment has a single action channel with one continuous action in a specified range.

actInfo = rlNumericSpec([2 1],...
    LowerLimit=0,...
    UpperLimit=[5;10]);

Create an experience buffer with a maximum length of 5,000.

buffer = rlReplayMemory(obsInfo,actInfo,5000);

Append a sequence of 50 random experiences to the buffer.

for i = 1:50
    exp(i).Observation = ...
        {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
    exp(i).Action = {actInfo.UpperLimit.*rand(2,1)};
    exp(i).NextObservation = ...
        {obsInfo(1).UpperLimit.*rand(2,1) randi(3)};
    exp(i).Reward = 10*rand(1);
    exp(i).IsDone = 0;
end

append(buffer,exp);

After appending experiences to the buffer, you can sample mini-batches of experiences for training of your RL agent. For example, randomly sample a batch of 10 experiences from the buffer.

miniBatch = sample(buffer,10);

Resize Experience Buffer in Reinforcement Learning Agent

Open Live Script

Create an environment for training the agent. For this example, load a predefined environment.

env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Extract the observation and action specifications from the agent.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a DQN agent from the environment specifications.

agent = rlDQNAgent(obsInfo,actInfo);

By default, the agent uses an experience buffer with a maximum size of 10,000.

agent.ExperienceBuffer

ans = 
  rlReplayMemory with properties:

    MaxLength: 10000
       Length: 0

Increase the maximum size of the experience buffer to 20,000.

resize(agent.ExperienceBuffer,20000)

View the updated experience buffer.

agent.ExperienceBuffer

ans = 
  rlReplayMemory with properties:

    MaxLength: 20000
       Length: 0

Replace Agent Experience Buffer

Open Live Script

Create an environment for training the agent. For this example, load a predefined environment.

env = rlPredefinedEnv("SimplePendulumWithImage-Discrete");

Extract the observation and action specifications from the agent.

obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a DQN agent from the environment specifications.

agent = rlDQNAgent(obsInfo,actInfo);

Display the default experience buffer.

agent.ExperienceBuffer

ans = 
  rlReplayMemory with properties:

    MaxLength: 10000
       Length: 0

Create a new experience buffer.

new_buffer = rlReplayMemory(obsInfo,actInfo,20000);

Replace the experience buffer in the agent.

agent.ExperienceBuffer = new_buffer;

Display the new experience buffer.

agent.ExperienceBuffer

ans = 
  rlReplayMemory with properties:

    MaxLength: 20000
       Length: 0

Display the dimensions of the observation channels.

obsInfo.Dimension

ans = 1×2

    50    50

ans = 1×2

     1     1

Check the agent using a random input observation.

getAction(agent,{rand(obsInfo(1).Dimension),rand(obsInfo(2).Dimension)})

ans = 1×1 cell array
    {[1]}

Version History

Introduced in R2022a

rlReplayMemory

Description

Creation

Syntax

Description

Input Arguments

`obsInfo` — Observation specifications
specification object | array of specification objects

`actInfo` — Action specifications
specification object | array of specification objects

Properties

`MaxLength` — Maximum buffer length
Read-only: `10000` (default) | nonnegative integer

`Length` — Number of experiences in buffer
Read-only: `0` (default) | nonnegative integer

Object Functions

Examples

Create Experience Buffer

Create Experience Buffer with Multiple Observation Channels

Resize Experience Buffer in Reinforcement Learning Agent

Replace Agent Experience Buffer

Version History

See Also

Functions

Objects

Topics

rlReplayMemory

Description

Creation

Syntax

Description

Input Arguments

obsInfo — Observation specifications specification object | array of specification objects

actInfo — Action specifications specification object | array of specification objects

Properties

MaxLength — Maximum buffer length Read-only: 10000 (default) | nonnegative integer

Length — Number of experiences in buffer Read-only: 0 (default) | nonnegative integer

Object Functions

Examples

Create Experience Buffer

Create Experience Buffer with Multiple Observation Channels

Resize Experience Buffer in Reinforcement Learning Agent

Replace Agent Experience Buffer

Version History

See Also

Functions

Objects

Topics

`obsInfo` — Observation specifications
specification object | array of specification objects

`actInfo` — Action specifications
specification object | array of specification objects

`MaxLength` — Maximum buffer length
Read-only: `10000` (default) | nonnegative integer

`Length` — Number of experiences in buffer
Read-only: `0` (default) | nonnegative integer