Contenu principal

rlAdditiveNoisePolicy

Policy object to generate continuous noisy actions for custom training loops

Since R2022a

    Description

    This object implements an additive noise policy, which returns continuous deterministic actions with added noise, given an input observation. You can create an rlAdditiveNoisePolicy object from an rlContinuousDeterministicActor or extract it from an rlDDPGAgent or rlTD3Agent. You can then train the policy object using a custom training loop. If UseNoisyAction is set to 0 the policy does not explore. This object is not compatible with generatePolicyBlock and generatePolicyFunction. For more information on policies and value functions, see Create Policies and Value Functions.

    Creation

    Description

    policy = rlAdditiveNoisePolicy(actor) creates the additive noise policy object policy from the continuous deterministic actor actor. It also sets the Actor property of policy to the input argument actor.

    example

    policy = rlAdditiveNoisePolicy(actor,NoiseType=noiseType) specifies the type of noise distribution for the policy. noiseType can be either "gaussian" (Gaussian noise) or "ou" (Ornstein-Uhlenbeck noise). This syntax also sets the NoiseType property of policy to the input argument noiseType.

    example

    Properties

    expand all

    Continuous deterministic actor, specified as an rlContinuousDeterministicActor object.

    Noise type, specified as either "gaussian" (default, Gaussian noise) or "ou" (Ornstein-Uhlenbeck noise). For more information on noise models, see Noise Models.

    Example: "ou"

    Noise model options, specified as a GaussianActionNoise object or an OrnsteinUhlenbeckActionNoise object. Changing the noise state or any noise option of an rlAdditiveNoisePolicy object deployed through code generation is not supported.

    For more information on noise models, see Noise Models.

    Option to enable noise decay, specified as a logical value: either true (default, enabling noise decay) or false (disabling noise decay).

    Example: false

    Normalization method, returned as an array in which each element (one for each input channel defined in the observationInfo and actionInfo properties, in that order) is one of the following values:

    • "none" — Do not normalize the input.

    • "rescale-zero-one" — Normalize the input by rescaling it to the interval between 0 and 1. The normalized input Y is (UMin)./(UpperLimitLowerLimit), where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than 0. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

    • "rescale-symmetric" — Normalize the input by rescaling it to the interval between –1 and 1. The normalized input Y is 2(ULowerLimit)./(UpperLimitLowerLimit) – 1, where U is the nonnormalized input. Note that nonnormalized input values lower than LowerLimit result in normalized values lower than –1. Similarly, nonnormalized input values higher than UpperLimit result in normalized values higher than 1. Here, UpperLimit and LowerLimit are the corresponding properties defined in the specification object of the input channel.

    Note

    When you specify the Normalization property of rlAgentInitializationOptions, normalization is applied only to the approximator input channels corresponding to rlNumericSpec specification objects in which both the UpperLimit and LowerLimit properties are defined. After you create the agent, you can use setNormalizer to assign normalizers that use any normalization method. For more information on normalizer objects, see rlNormalizer.

    Example: "rescale-symmetric"

    Option to enable noisy actions, specified as a logical value: either true (default, adding noise to actions, which helps exploration) or false (no noise is added to the actions). When noise is not added to the actions the policy is deterministic and therefore it does not explore.

    Example: false

    Observation specifications, returned as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

    Action specifications, returned as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

    Note

    For this approximator object, only one action channel is allowed.

    Sample time of the policy, specified as a positive scalar or as -1.

    Within a MATLAB® environment, the policy is executed every time you call it within your custom training loop, so, SampleTime does not affect the timing of the policy execution.

    Within a Simulink® environment, the Policy block that uses the policy object executes every SampleTime seconds of simulation time. If SampleTime is -1 the block inherits the sample time from its input signals. Set SampleTime to -1 when the block is a child of an event-driven subsystem.

    Note

    Set SampleTime to a positive scalar when the block is not a child of an event-driven subsystem. Doing so ensures that the block executes at appropriate intervals when input signal sample times change due to model variations.

    If SampleTime is a positive scalar, this value is also the time interval between consecutive elements in the output experience returned by sim, regardless of the type of environment.

    If SampleTime is -1, for Simulink environments, the time interval between consecutive elements in the returned output experience reflects the timing of the events that trigger the Policy block execution, while for MATLAB environments, this time interval is considered equal to 1.

    Example: SampleTime=-1

    Object Functions

    getActionObtain action from agent, actor, or policy object given environment observations
    getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
    resetReset environment, agent, experience buffer, or policy object
    setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object

    Examples

    collapse all

    Create observation and action specification objects. For this example, define the observation and action spaces as continuous four- and two-dimensional spaces, respectively.

    obsInfo = rlNumericSpec([4 1]);
    actInfo = rlNumericSpec([2 1]);

    Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment.

    Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

    To approximate the policy function within the actor, use a deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects.

    layers = [ 
        featureInputLayer(obsInfo.Dimension(1))
        fullyConnectedLayer(16)
        reluLayer
        fullyConnectedLayer(actInfo.Dimension(1)) 
        ];

    Convert the network to a dlnetwork object and display the number of weights.

    model = dlnetwork(layers);
    summary(model)
       Initialized: true
    
       Number of learnables: 114
    
       Inputs:
          1   'input'   4 features
    

    Create the actor using model, and the observation and action specifications.

    actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
    actor = 
      rlContinuousDeterministicActor with properties:
    
        ObservationInfo: [1×1 rl.util.rlNumericSpec]
             ActionInfo: [1×1 rl.util.rlNumericSpec]
          Normalization: "none"
              UseDevice: "cpu"
             Learnables: {4×1 cell}
                  State: {0×1 cell}
    
    

    Check the actor with a random observation input.

    act = getAction(actor,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2×1 single column vector
    
        0.4013
        0.0578
    
    

    Create a policy object from actor.

    policy = rlAdditiveNoisePolicy(actor)
    policy = 
      rlAdditiveNoisePolicy with properties:
    
                   Actor: [1×1 rl.function.rlContinuousDeterministicActor]
               NoiseType: "gaussian"
            NoiseOptions: [1×1 rl.option.GaussianActionNoise]
        EnableNoiseDecay: 1
           Normalization: "none"
          UseNoisyAction: 1
         ObservationInfo: [1×1 rl.util.rlNumericSpec]
              ActionInfo: [1×1 rl.util.rlNumericSpec]
              SampleTime: -1
    
    

    You can access the policy options using dot notation. For example, change the upper and lower limits of the distribution.

    policy.NoiseOptions.LowerLimit = -3;
    policy.NoiseOptions.UpperLimit = 3;

    Check the policy with a random observation input.

    act = getAction(policy,{rand(obsInfo.Dimension)});
    act{1}
    ans = 2×1
    
        0.1878
       -0.1645
    
    

    You can now train the policy with a custom training loop and then deploy it to your application.

    Create observation and action specification objects. For this example, define the observation and action spaces as continuous three- and one-dimensional spaces, respectively.

    obsInfo = rlNumericSpec([3 1]);
    actInfo = rlNumericSpec([1 1]);

    Alternatively, use getObservationInfo and getActionInfo to extract the specification objects from an environment

    Create a continuous deterministic actor. This actor must accept an observation as input and return an action as output.

    To approximate the policy function within the actor, use a deep neural network model. Define the network as an array of layer objects, and get the dimension of the observation and action spaces from the environment specification objects.

    layers = [ 
        featureInputLayer(obsInfo.Dimension(1))
        fullyConnectedLayer(9)
        reluLayer
        fullyConnectedLayer(actInfo.Dimension(1)) 
        ];

    Convert the network to a dlnetwork object and display the number of weights.

    model = dlnetwork(layers);
    summary(model)
       Initialized: true
    
       Number of learnables: 46
    
       Inputs:
          1   'input'   3 features
    

    Create the actor using model, and the observation and action specifications.

    actor = rlContinuousDeterministicActor(model,obsInfo,actInfo)
    actor = 
      rlContinuousDeterministicActor with properties:
    
        ObservationInfo: [1×1 rl.util.rlNumericSpec]
             ActionInfo: [1×1 rl.util.rlNumericSpec]
          Normalization: "none"
              UseDevice: "cpu"
             Learnables: {4×1 cell}
                  State: {0×1 cell}
    
    

    Check the actor with a random observation input.

    act = getAction(actor,{rand(obsInfo.Dimension)});
    act{1}
    ans = single
    
    -0.2535
    

    Create a policy object from actor, specifying an Ornstein-Uhlenbeck probability distribution for the noise.

    policy = rlAdditiveNoisePolicy(actor,NoiseType="ou")
    policy = 
      rlAdditiveNoisePolicy with properties:
    
                   Actor: [1×1 rl.function.rlContinuousDeterministicActor]
               NoiseType: "ou"
            NoiseOptions: [1×1 rl.option.OrnsteinUhlenbeckActionNoise]
        EnableNoiseDecay: 1
           Normalization: "none"
          UseNoisyAction: 1
         ObservationInfo: [1×1 rl.util.rlNumericSpec]
              ActionInfo: [1×1 rl.util.rlNumericSpec]
              SampleTime: -1
    
    

    You can access the policy options using dot notation. For example, change the standard deviation of the distribution.

    policy.NoiseOptions.StandardDeviation = 0.6;

    Check the policy with a random observation input.

    act = getAction(policy,{rand(obsInfo.Dimension)});
    act{1}
    ans = 
    -0.1625
    

    You can now train the policy with a custom training loop and then deploy it to your application.

    Version History

    Introduced in R2022a