Load Predefined Grid World Environments

Reinforcement Learning Toolbox™ software provides several predefined grid world environments for which the actions, observations, rewards, and dynamics are already defined. You can use these environments to:

Learn reinforcement learning concepts.
Gain familiarity with Reinforcement Learning Toolbox software features.
Test your own discrete-action-space reinforcement learning agents.

You can load the following predefined MATLAB^® grid world environments using the rlPredefinedEnv function.

Environment	Agent Task
Basic grid world	Move from a starting location to a target location on a two-dimensional grid by selecting moves from the discrete action space `{N,S,E,W}`.
Waterfall grid world	Move from a starting location to a target location on a larger two-dimensional grid with unknown deterministic or stochastic dynamics.

In Reinforcement Learning Toolbox, these grid world environments are implemented as rlMDPEnv objects. For more information on the properties of grid world environments, see Create Custom Grid World Environments.

You can also load predefined MATLAB control system environments. For more information, see Load Predefined Control System Environments.

Basic Grid World

The basic grid world environment is a two-dimensional 5-by-5 grid with a starting location, a terminal location, and obstacles. The environment also contains a special jump from state [2,4] to state [4,4]. The goal of the agent is to move from the starting location to the terminal location while avoiding obstacles and maximizing the total reward.

To create a basic grid world environment, use the rlPredefinedEnv function. When using the keyword "BasicGridWorld" this function returns an rlMDPEnv object representing the grid world.

env = rlPredefinedEnv("BasicGridWorld")

env = 

  rlMDPEnv with properties:

       Model: [1×1 rl.env.GridWorld]
    ResetFcn: []

Environment Visualization

You can visualize the grid world environment using the plot function.

The agent location is a red circle.
The terminal location is a blue square.
The obstacles are black squares.

plot(env)

Basic five-by-five grid world with agent (indicated by a red circle) positioned on the top left corner, terminal location (indicated by a blue square) in the bottom right corner, and four obstacle squares, in black, in the middle.

Actions

For this environment, the action channel carries a scalar integer from 1 to 4, which indicates an (attempted) move in one of four possible directions (north, south, east, or west, respectively). Therefore the action specification is an rlFiniteSetSpec object. To extract the action specification, use getActionInfo.

actInfo = getActionInfo(env)

actInfo = 

  rlFiniteSetSpec with properties:

       Elements: [4×1 double]
           Name: "MDP Actions"
    Description: [0×0 string]
      Dimension: [1 1]
       DataType: "double"

Observations

The environment observation channel carries a scalar integer from 1 to 25, which indicates the current agent location (that is, its state) in columnwise fashion. For example, the observation 5 corresponds to the agent position [5,1] on the grid, the observation 6 corresponds to the position [1,2], and so on. Therefore the observation specification is an rlFiniteSetSpec object. To extract the observation specification, use getObservationInfo.

obsInfo = getObservationInfo(env)

obsInfo = 

  rlFiniteSetSpec with properties:

       Elements: [25×1 double]
           Name: "MDP Observations"
    Description: [0×0 string]
      Dimension: [1 1]
       DataType: "double"

If the agent attempts an illegal move, such as an action that would get it out of the grid or into an obstacle, the resulting position remains unchanged, otherwise, the agent position is updated according to the action. For example, if the agent is in the position 5, the action 3 (attempted move eastward) will move the agent, at the next time step, to the position 10, while the action 4 will result in the agent still keeping position 5 at the next time step.

Note that there is no direct feedthrough between the action and the observation, that is, the observation does not depend on the current value of the action. For more information, see Reinforcement Learning Environments.

Rewards

The action A(t), results in the transition from the current state S(t) to the following one S(t+1), which in turn results in the following rewards or penalties from the environment to the agent, represented by the scalar R(t+1) :

+10 reward for reaching the terminal state at [5,5]
+5 reward for jumping from state [2,4] to state [4,4]
-1 penalty for every other action

As for the observation, there is no direct feedthrough between the action and the reward, that is, the reward R(t+1) does not depend on the next action A(t+1). For more information, see Reinforcement Learning Environments.

Reset Function

The default reset function for this environment sets the initial of the agent on the grid randomly, while avoiding the obstacle and the target cells.

x0 = reset(env)

x0 =

    11

You can write your own reset function to specify a different initial state. For example, to specify that the initial state of the agent is always 2, create a reset function that returns the state number for the initial agent state.

env.ResetFcn = @() 2;

The reset function is called (by a training or simulation function) at the beginning of each training or simulation episode.

Step Function

The environment observation and action specifications allow you to create an agent that works with your environment. You can then use both the environment and agent as arguments for the built-in functions train and sim, which train or simulate the agent within the environment, respectively.

You can also call the step function to return the next observation, reward and an is-done scalar indicating the whether a final state has been reached. For example, reset the basic grid world environment and call the step function.

x0 = reset(env)

x0 =

     1

[xn,rn,id]=step(env,3)

xn =

     6


rn =

    -1


id =

  logical

   0

The environment step and reset functions allow you to create a custom training or simulation loop.

Deterministic Waterfall Grid Worlds

The deterministic waterfall grid world environment is a two-dimensional 8-by-7 grid with a starting location and terminal location. The environment includes a waterfall that pushes the agent toward the bottom of the grid. The goal of the agent is to move from the starting location to the terminal location while maximizing the total reward.

To create a deterministic waterfall grid world, use the rlPredefinedEnv function. This function creates an rlMDPEnv object representing the grid world.

env = rlPredefinedEnv('WaterFallGridWorld-Deterministic');

As with the basic grid world, you can visualize the environment, where the agent is a red circle and the terminal location is a blue square.

plot(env)

Basic 8-by-7 grid world with agent positioned on the left and terminal location in the middle.

Actions

The agent can move in one of four possible directions (north, south, east, or west).

Rewards

The agent receives the following rewards or penalties:

+10 reward for reaching the terminal state at [4,5]
-1 penalty for every other action

Waterfall Dynamics

In this environment, a waterfall pushes the agent toward the bottom of the grid.

Basic 8-by-7 grid world with blue arrows indicating a waterfall that pushes the agent position downward.

The intensity of the waterfall varies between the columns, as shown at the top of the preceding figure. When the agent moves into a column with a nonzero intensity, the waterfall pushes it downward by the indicated number of squares. For example, if the agent goes east from state [5,2], it reaches state [7,3].

Stochastic Waterfall Grid Worlds

The stochastic waterfall grid world environment is a two-dimensional 8-by-7 grid with a starting location and terminal locations. The environment includes a waterfall that pushes the agent towards the bottom of the grid with a stochastic intensity. The goal of the agent is to move from the starting location to the target terminal location while avoiding the penalty terminal states along the bottom of the grid and maximizing the total reward.

To create a stochastic waterfall grid world, use the rlPredefinedEnv function. This function creates an rlMDPEnv object representing the grid world.

env = rlPredefinedEnv('WaterFallGridWorld-Stochastic');

As with the basic grid world, you can visualize the environment, where the agent is a red circle and the terminal location is a blue square.

plot(env)

Basic 8-by-7 grid world with terminal locations indicated by blue squares in the bottom row.

Actions

The agent can move in one of four possible directions (north, south, east, or west).

Rewards

The agent receives the following rewards or penalties:

+10 reward for reaching the terminal state at [4,5]
-10 penalty for reaching any terminal state in the bottom row of the grid
-1 penalty for every other action

Waterfall Dynamics

In this environment, a waterfall pushes the agent towards the bottom of the grid with a stochastic intensity. The baseline intensity matches the intensity of the deterministic waterfall environment. However, in the stochastic waterfall case, the agent has an equal chance of experiencing the indicated intensity, one level above that intensity, or one level below that intensity. For example, if the agent goes east from state [5,2], it has an equal chance of reaching state [6,3], [7,3], or [8,3].

Related Examples

Train Reinforcement Learning Agents