Main Content

rlOptimizerOptions

Optimization options for actors and critics

Since R2022a

    Description

    Use an rlOptimizerOptions object to specify an optimization options set for actors and critics.

    Creation

    Description

    example

    optOpts = rlOptimizerOptions creates a default optimizer option set to use as a CriticOptimizerOptions or ActorOptimizerOptions property of an agent option object, or as a last argument of rlOptimizer to create an optimizer object. You can modify the object properties using dot notation.

    example

    optOpts = rlOptimizerOptions(Name=Value) creates an options set with the specified properties using one or more name-value arguments.

    Properties

    expand all

    Learning rate used in training the actor or critic function approximator, specified as a positive scalar. If the learning rate is too low, then training takes a long time. If the learning rate is too high, then training might reach a suboptimal result or diverge.

    Example: LearnRate=0.025

    Gradient threshold value used in training the actor or critic function approximator, specified as Inf or a positive scalar. If the gradient exceeds this value, the gradient is clipped as specified by the GradientThresholdMethod option. Clipping the gradient limits how much the network parameters can change in a training iteration.

    Example: GradientThreshold=1

    Gradient threshold method used in training the actor or critic function approximator. This is the specific method used to clip gradient values that exceed the gradient threshold, and it is specified as one of the following values.

    • "l2norm" — If the L2 norm of the vector Glyr containing the gradient components related to the weights or biases of a layer is larger than GradientThreshold, then this option scales Glyr by a factor of GradientThreshold/L, where L is the L2 norm of Glyr. When you use this option, the L2 norm of Glyr in the returned gradient cannot exceed GradientThreshold. For example, a fully connected layer has two parameter arrays, Weights and Bias. The threshold is applied to the L2 norm of the gradient components related to Weights and Bias separately.

    • "global-l2norm" — If the L2 norm of the gradient G (with respect to all learnable network parameters), is larger than GradientThreshold, then this option scales G by a factor of L, where L is the L2 norm of G. When you use this option, the L2 norm of the returned gradient cannot exceed GradientThreshold.

    • "absolute-value" — If the absolute value of an individual (scalar) partial derivative in the gradient G (with respect to all learnable network parameters), is larger than GradientThreshold, then this option scales the partial derivative so that the corresponding component in the returned gradient has magnitude equal to GradientThreshold and the same sign of the original partial derivative. When you use this option, the absolute value of any component of the returned gradient cannot exceed GradientThreshold.

    For more information, see Gradient Clipping in the Algorithms section of trainingOptions in Deep Learning Toolbox™.

    Example: GradientThresholdMethod="absolute-value"

    Factor for L2 regularization (weight decay) used in training the actor or critic function approximator, specified as a nonnegative scalar. For more information, see L2 Regularization in the Algorithms section of trainingOptions in Deep Learning Toolbox.

    To avoid overfitting when using a representation with many parameters, consider increasing the L2RegularizationFactor option.

    Example: L2RegularizationFactor=0.0005

    Algorithm used for training the actor or critic function approximator, specified as one of the following values.

    • "adam" — Use the Adam (adaptive movement estimation) algorithm. You can specify the decay rates of the gradient and squared gradient moving averages using the GradientDecayFactor and SquaredGradientDecayFactor fields of the OptimizerParameters option.

    • "sgdm" — Use the stochastic gradient descent with momentum (SGDM) algorithm. You can specify the momentum value using the Momentum field of the OptimizerParameters option.

    • "rmsprop" — Use the RMSProp algorithm. You can specify the decay rate of the squared gradient moving average using the SquaredGradientDecayFactor fields of the OptimizerParameters option.

    For more information about these algorithms, see the Algorithms section of trainingOptions in Deep Learning Toolbox.

    Example: Optimizer="sgdm"

    Parameters for the training algorithm used for training the actor or critic function approximator, specified as an OptimizerParameters object with the following parameters.

    ParameterDescription
    Momentum

    Contribution of previous step, specified as a scalar from 0 to 1. A value of 0 means no contribution from the previous step. A value of 1 means maximal contribution.

    This parameter applies only when Optimizer is "sgdm". In that case, the default value is 0.9. This default value works well for most problems.

    Epsilon

    Denominator offset, specified as a positive scalar. The optimizer adds this offset to the denominator in the network parameter updates to avoid division by zero.

    This parameter applies only when Optimizer is "adam" or "rmsprop". In that case, the default value is 10–8. This default value works well for most problems.

    GradientDecayFactor

    Decay rate of gradient moving average, specified as a positive scalar from 0 to 1.

    This parameter applies only when Optimizer is "adam". In that case, the default value is 0.9. This default value works well for most problems.

    SquaredGradientDecayFactor

    Decay rate of squared gradient moving average, specified as a positive scalar from 0 to 1.

    This parameter applies only when Optimizer is "adam" or "rmsprop". In that case, the default value is 0.999. This default value works well for most problems.

    When a particular property of OptimizerParameters is not applicable to the optimizer type specified in Algorithm, that property is set to "Not applicable".

    To change property values, create an rlOptimizerOptions object and use dot notation to access and change the properties of OptimizerParameters.

    repOpts = rlRepresentationOptions;
    repOpts.OptimizerParameters.GradientDecayFactor = 0.95;

    Object Functions

    rlQAgentOptionsOptions for Q-learning agent
    rlSARSAAgentOptionsOptions for SARSA agent
    rlDQNAgentOptionsOptions for DQN agent
    rlPGAgentOptionsOptions for PG agent
    rlDDPGAgentOptionsOptions for DDPG agent
    rlTD3AgentOptionsOptions for TD3 agent
    rlACAgentOptionsOptions for AC agent
    rlPPOAgentOptionsOptions for PPO agent
    rlTRPOAgentOptionsOptions for TRPO agent
    rlSACAgentOptionsOptions for SAC agent
    rlOptimizerCreates an optimizer object for actors and critics

    Examples

    collapse all

    Use rlOprimizerOptions to create a default optimizer option object to use for the training of a critic function approximator.

    myCriticOpts = rlOptimizerOptions
    myCriticOpts = 
      rlOptimizerOptions with properties:
    
                      LearnRate: 0.0100
              GradientThreshold: Inf
        GradientThresholdMethod: "l2norm"
         L2RegularizationFactor: 1.0000e-04
                      Algorithm: "adam"
            OptimizerParameters: [1x1 rl.option.OptimizerParameters]
    
    

    Using dot notation, change the training algorithm to stochastic gradient descent with momentum and set the value of the momentum parameter to 0.6.

    myCriticOpts.Algorithm = "sgdm";
    myCriticOpts.OptimizerParameters.Momentum = 0.6;

    Create an AC agent option object, and set its CriticOptimizerOptions property to myCriticOpts.

    myAgentOpt = rlACAgentOptions;
    myAgentOpt.CriticOptimizerOptions = myCriticOpts;

    You can now use myAgentOpt as last input argument to rlACAgent when creating your AC agent.

    Use rlOprimizerOptions to create an optimizer option object to use for the training of an actor function approximator. Specify a learning rate of 0.2 and set the GradientThresholdMethod to "absolute-value".

    myActorOpts=rlOptimizerOptions(LearnRate=0.2, ...
        GradientThresholdMethod="absolute-value")
    myActorOpts = 
      rlOptimizerOptions with properties:
    
                      LearnRate: 0.2000
              GradientThreshold: Inf
        GradientThresholdMethod: "absolute-value"
         L2RegularizationFactor: 1.0000e-04
                      Algorithm: "adam"
            OptimizerParameters: [1x1 rl.option.OptimizerParameters]
    
    

    Using dot notation, change the a GradientThreshold to 10.

    myActorOpts.GradientThreshold = 10;

    Create an AC agent option object and set its ActorOptimizerOptions property to myActorOpts.

    myAgentOpt = rlACAgentOptions( ...
        ActorOptimizerOptions=myActorOpts);

    You can now use myAgentOpt as last input argument to rlACAgent when creating your AC agent.

    Version History

    Introduced in R2022a