Main Content

generatePolicyBlock

Generate Simulink block that evaluates policy of an agent or policy object

    Description

    This function generates a Simulink® Policy evaluation block from an agent or policy object. It also creates a data file which stores policy information. The generated policy block loads this data file to properly initialize itself prior to simulation. You can use the block to simulate the policy and generate code for deployment purposes.

    For more information on policies and value functions, see Create Policies and Value Functions.

    example

    generatePolicyBlock(agent) creates a block that evaluates the policy of the specified agent using the default block name, policy name, and data file name.

    example

    generatePolicyBlock(policy) creates a block that evaluates the learned policy of the specified policy object using the default block name, policy name, and data file name.

    generatePolicyBlock(___,MATFileName=dataFileName) specifies the file name of the data file.

    Examples

    collapse all

    First, create and train a reinforcement learning agent. For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System.

    load("MATLABCartpolePG.mat","agent")

    Then, create a policy evaluation block from this agent using default names.

    generatePolicyBlock(agent);

    This command creates an untitled Simulink® model, containing the policy block, and the blockAgentData.mat file, containing information needed to create and initialize the policy block, (such as the trained deep neural network used by the actor within the agent). The block loads this data file to properly initialize itself prior to simulation.

    You can now drag and drop the block in a Simulink® model and connect it so that it takes the observation from the environment as input and so that the calculated action is returned to the environment. This allows you to simulate the policy in a closed loop. You can then generate code for deployment purposes. For more information, see Deploy Trained Reinforcement Learning Policies.

    Close the model.

     bdclose("untitled")

    Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlNumericSpec([2 1]);

    Alternatively use getObservationInfo and getActionInfo to extract the specification objects from an environment.

    Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

    To approximate the policy function within the actor, use a recurrent deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects. To create a recurrent network, use a sequenceInputLayer as the input layer (with size equal to the number of dimensions of the observation channel) and include at least one lstmLayer.

    layers = [
        sequenceInputLayer(obsInfo.Dimension(1))
        fullyConnectedLayer(10)
        reluLayer
        lstmLayer(8,OutputMode="sequence")
        fullyConnectedLayer(20)
        fullyConnectedLayer(actInfo.Dimension(1))
        tanhLayer
        ];

    Convert the network to a dlnetwork object and display the number of weights.

    model = dlnetwork(layers);
    summary(model)
       Initialized: true
    
       Number of learnables: 880
    
       Inputs:
          1   'sequenceinput'   Sequence input with 4 dimensions (CTB)
    

    Create the actor using model, and the observation and action specifications.

    actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
    actor = 
      rlContinuousDeterministicActor with properties:
    
        ObservationInfo: [1×1 rl.util.rlNumericSpec]
             ActionInfo: [1×1 rl.util.rlNumericSpec]
              UseDevice: "cpu"
    
    

    Check the actor with a random observation input.

    act = getAction(actor,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2×1 single column vector
    
       -0.0742
        0.0158
    
    

    Create a policy object from actor.

    policy = rlDeterministicActorPolicy(actor)
    policy = 
      rlDeterministicActorPolicy with properties:
    
                  Actor: [1×1 rl.function.rlContinuousDeterministicActor]
        ObservationInfo: [1×1 rl.util.rlNumericSpec]
             ActionInfo: [1×1 rl.util.rlNumericSpec]
             SampleTime: -1
    
    

    You can access the policy options using dot notation. Check the policy with a random observation input.

    act = getAction(policy,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2×1
    
       -0.0060
       -0.0161
    
    

    You can train the policy with a custom training loop.

    Then, create a policy evaluation block from this policy object using the default name for the generated MAT-file.

    generatePolicyBlock(policy);

    This command creates an untitled Simulink® model, containing the policy block, and the blockAgentData.mat file, containing information needed to create and initialize the policy block, (such as the trained deep neural network used by the actor within the agent). The block loads this data file to properly initialize itself prior to simulation.

    You can now drag and drop the block in a Simulink® model and connect it so that it takes the observation from the environment as input and so that the calculated action is returned to the environment. This allows you to simulate the policy in a closed loop. You can then generate code for deployment purposes. For more information, see Deploy Trained Reinforcement Learning Policies.

    Close the model.

     bdclose("untitled")

    Input Arguments

    collapse all

    Trained reinforcement learning agent, specified as one of the following agent objects. To train your agent, use the train function.

    For agents with a stochastic actor (PG, PPO, SAC, TRPO, AC), the action returned by the generated policy function depends on the value of the UseExplorationPolicy property of the agent. By default, UseExplorationPolicy is false and the generated action is deterministic. If UseExplorationPolicy is true, the generated action is stochastic.

    Reinforcement learning policy, specified as one of the following objects:

    Note

    rlAdditiveNoisePolicy and rlEpsilonGreedyPolicy policy objects are not supported.

    Name of generated data file, specified as a string or character vector. If a file with the specified name already exists in the current MATLAB® folder, then an appropriate digit is added to the name so that no existing file is overwritten.

    The generated data file contains four structures that store data needed to fully characterize the policy. Prior to simulation, the block (which is generated with the data file name as mask parameter) loads this data file to properly initialize itself.

    Version History

    Introduced in R2019a