Main Content

rlContinuousGaussianActor

Stochastic Gaussian actor with a continuous action space for reinforcement learning agents

Since R2022a

Description

This object implements a function approximator to be used as a stochastic actor within a reinforcement learning agent with a continuous action space. A continuous Gaussian actor takes an environment observation as input and returns as output a random action sampled from a parametrized Gaussian probability distribution, thereby implementing a parametrized stochastic policy. After you create an rlContinuousGaussianActor object, use it to create a suitable agent, such as an rlACAgent or rlPGAgent agent. For more information on creating representations, see Create Policies and Value Functions.

Creation

Description

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdvActName) creates a Gaussian stochastic actor with a continuous action space using the deep neural network net as approximation model. Here, net must have two differently named output layers, each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers must return the mean and standard deviation of each component of the action, respectively. The actor uses the output from these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled. This syntax sets the ObservationInfo and ActionInfo properties of actor to the input arguments observationInfo and actionInfo, respectively.

Note

actor does not enforce constraints set by the action specification. When using this actor anywhere else than in a SAC agent, you must enforce action space constraints within the environment.

example

actor = rlContinuousGaussianActor(net,observationInfo,actionInfo,ActionMeanOutputNames=netMeanActName,ActionStandardDeviationOutputNames=netStdActName,ObservationInputNames=netObsNames) specifies the names of the network input layers to be associated with the environment observation channels. The function assigns, in sequential order, each environment observation channel specified in observationInfo to the layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation specifications, as ordered in observationInfo.

actor = rlContinuousGaussianActor(___,UseDevice=useDevice) specifies the device used to perform computational operations on the actor object, and sets the UseDevice property of actor to the useDevice input argument. You can use this syntax with any of the previous input-argument combinations.

Input Arguments

expand all

Deep neural network used as the underlying approximation model within the actor. It must have as many input layers as the number of environment observation channels (with each input layer receiving input from an observation channel). The network must have two differently named output layers each with as many elements as the number of dimensions of the action space, as specified in actionInfo. The two output layers return the mean and standard deviation of each component of the action. The actor uses these layers, according to the names specified in the strings netMeanActName and netStdActName, to represent the Gaussian probability distribution from which the action is sampled.

Note

Since standard deviations must be nonnegative, the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity. Also, unless the actor is used in a SAC agent, the mean values must fall within the range of the action. In this case, to scale the mean values to the output range, use a scaling layer as the output layer for the mean values, preceded by an hyperbolic tangent layer. SAC agents automatically read the action range from the UpperLimit and LowerLimit properties of the action specification and then internally scale the distribution and bounds the action. Therefore, if the actor must be used in a SAC agent, do not add any layer that scales or bounds the mean values output.

You can specify the network as one of the following:

Note

Among the different network representation options, dlnetwork is preferred, since it has built-in validation checks and supports automatic differentiation. If you pass another network object as an input argument, it is internally converted to a dlnetwork object. However, best practice is to convert other representations to dlnetwork explicitly before using it to create a critic or an actor for a reinforcement learning agent. You can do so using dlnet=dlnetwork(net), where net is any neural network object from the Deep Learning Toolbox™. The resulting dlnet is the dlnetwork object that you use for your critic or actor. This practice allows a greater level of insight and control for cases in which the conversion is not straightforward and might require additional specifications.

rlContinuousGaussianActor objects support recurrent deep neural networks.

The learnable parameters of the actor are the weights of the deep neural network. For a list of deep neural network layers, see List of Deep Learning Layers. For more information on creating deep neural networks for reinforcement learning, see Create Policies and Value Functions.

Names of the network output layers corresponding to the mean values of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the mean values of each elements of the action channel. Therefore, this network output layer must be named as indicated in netMeanActName. Furthermore, it must be a scaling layer that scales the returned mean values to the desired action range.

Example: "myNetOut_Force_Mean_Values"

Names of the network output layers corresponding to the standard deviations of the action channel, specified as a string or character vector. The actor uses this name to select the network output layer that returns the standard deviations of each elements of the action channel. Therefore, this network output layer must be named as indicated in netStdvActName. Furthermore, it must be a softplus or ReLU layer, to enforce nonnegativity of the returned standard deviations.

Example: "myNetOut_Force_Standard_Deviations"

Network input layers names corresponding to the environment observation channels, specified as a string array or a cell array of character vectors. When you use this argument after 'ObservationInputNames', the function assigns, in sequential order, each environment observation channel specified in observationInfo to each network input layer specified by the corresponding name in the string array netObsNames. Therefore, the network input layers, ordered as the names in netObsNames, must have the same data type and dimensions as the observation channels, as ordered in observationInfo.

Example: {"NetInput1_airspeed","NetInput2_altitude"}

Properties

expand all

Observation specifications, specified as an rlFiniteSetSpec or rlNumericSpec object or an array containing a mix of such objects. Each element in the array defines the properties of an environment observation channel, such as its dimensions, data type, and name.

rlContinuousGaussianActor sets the ObservationInfo property of actor to the input observationInfo.

You can extract ObservationInfo from an existing environment or agent using getObservationInfo. You can also construct the specifications manually.

Action specifications, specified as an rlNumericSpec object. This object defines the properties of the environment action channel, such as its dimensions, data type, and name.

Note

Only one action channel is allowed.

rlContinuousGaussianActor sets the ActionInfo property of critic to the input actionInfo.

You can extract ActionInfo from an existing environment or agent using getActionInfo. You can also construct the specifications manually.

Computation device used to perform operations such as gradient computation, parameter update and prediction during training and simulation, specified as either "cpu" or "gpu".

The "gpu" option requires both Parallel Computing Toolbox™ software and a CUDA® enabled NVIDIA® GPU. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox).

You can use gpuDevice (Parallel Computing Toolbox) to query or select a local GPU device to be used with MATLAB®.

Note

Training or simulating an agent on a GPU involves device-specific numerical round-off errors. These errors can produce different results compared to performing the same operations using a CPU.

To speed up training by using parallel processing over multiple cores, you do not need to use this argument. Instead, when training your agent, use an rlTrainingOptions object in which the UseParallel option is set to true. For more information about training using multicore processors and GPUs for training, see Train Agents Using Parallel Computing and GPUs.

Example: "gpu"

Object Functions

rlACAgentActor-critic (AC) reinforcement learning agent
rlPGAgentPolicy gradient (PG) reinforcement learning agent
rlPPOAgentProximal policy optimization (PPO) reinforcement learning agent
rlSACAgentSoft actor-critic (SAC) reinforcement learning agent
getActionObtain action from agent, actor, or policy object given environment observations
evaluateEvaluate function approximator object given observation (or observation-action) input data
gradientEvaluate gradient of function approximator object given observation and action input data
accelerateOption to accelerate computation of gradient for approximator object based on neural network
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
setModelSet approximation model in function approximator object
getModelGet approximation model from function approximator object

Examples

collapse all

Create an observation specification object (or alternatively use getObservationInfo to extract the specification object from an environment). For this example, define the observation space as a continuous six-dimensional space, so that there is a single observation channel that carries a column vector containing five doubles.

obsInfo = rlNumericSpec([5 1]);

Create an action specification object (or alternatively use getActionInfo to extract the specification object from an environment). For this example, define the action space as a continuous three-dimensional space, so that the action channel carries a column vector containing three doubles, each between -10 and 10.

actInfo = rlNumericSpec([3 1], ...
    LowerLimit=-10, ...
    UpperLimit=10);

A continuous Gaussian actor implements a parametrized stochastic policy for a continuous action space. This actor takes an observation as input and returns as output a random action sampled from a Gaussian probability distribution.

To approximate the mean values and standard deviations of the Gaussian distribution, you must use a neural network with two output layers, each having as many elements as the dimension of the action space. One output layer must return a vector containing the mean values for each action dimension. The other must return a vector containing the standard deviation for each action dimension.

Note that standard deviations must be nonnegative and mean values must fall within the range of the action. Therefore the output layer that returns the standard deviations must be a softplus or ReLU layer, to enforce nonnegativity, while the output layer that returns the mean values must be a scaling layer, to scale the mean values to the output range. However, do not add a tanhLayer as the last nonlinear layer in the mean output path if you are going to use the actor within a SAC agent. For more information see Soft Actor-Critic (SAC) Agents.

For this example the environment has only one observation channel and therefore the network has only one input layer. Note that prod(obsInfo.Dimension) and prod(actInfo.Dimension) return the number of dimensions of the observation and action spaces, respectively, regardless of whether they are arranged as row vectors, column vectors, or matrices.

Define each network path as an array of layer objects, and assign names to the input and output layers of each path. These names allow you to connect the paths and then later explicitly associate the network input and output layers with the appropriate environment channel.

% Input path layers
inPath = [ 
    featureInputLayer( ...
        prod(obsInfo.Dimension), ...
        Name="netOin")
    fullyConnectedLayer( ...
        prod(actInfo.Dimension), ...
        Name="infc") 
    ];

% Path layers for mean value 
% Using scalingLayer to scale range from (-1,1) to (-10,10)
meanPath = [ 
    tanhLayer(Name="tanhMean");
    fullyConnectedLayer(prod(actInfo.Dimension));
    scalingLayer(Name="scale", ...
    Scale=actInfo.UpperLimit) 
    ];

% Path layers for standard deviations
% Using softplus layer to make them non negative
sdevPath = [ 
    tanhLayer(Name="tanhStdv");
    fullyConnectedLayer(prod(actInfo.Dimension));
    softplusLayer(Name="splus") 
    ];

% Add layers to network object
net = layerGraph(inPath);
net = addLayers(net,meanPath);
net = addLayers(net,sdevPath);

% Connect layers
net = connectLayers(net,"infc","tanhMean/in");
net = connectLayers(net,"infc","tanhStdv/in");

% Plot the network
plot(net)

Figure contains an axes object. The axes object contains an object of type graphplot.

Convert the network to a dlnetwork object and display the number of learnable parameters (weights).

net = dlnetwork(net);
summary(net)
   Initialized: true

   Number of learnables: 42

   Inputs:
      1   'netOin'   5 features

Create the actor with rlContinuousGaussianActor, using the network, the observation and action specification objects, and the names of the network input and output layers.

actor = rlContinuousGaussianActor(net, obsInfo, actInfo, ...
    ActionMeanOutputNames="scale",...
    ActionStandardDeviationOutputNames="splus",...
    ObservationInputNames="netOin");

To check your actor, use getAction to return an action from a random observation vector, using the current network weights. Each of the three elements of the action vector is a random sample from the Gaussian distribution with mean and standard deviation calculated, as a function of the current observation, by the neural network.

act = getAction(actor,{rand(obsInfo.Dimension)}); 
act{1}
ans = 3x1 single column vector

  -12.0285
    1.7628
   10.8733

To return the Gaussian distribution of the action, given an observation, use evaluate.

dist = evaluate(actor,{rand(obsInfo.Dimension)});

Display the vector of mean values.

dist{1}
ans = 3x1 single column vector

   -5.6127
    3.9449
    9.6213

Display the vector of standard deviations.

dist{2}
ans = 3x1 single column vector

    0.8516
    0.8366
    0.7004

You can now use the actor (along with a critic) to create an agent for the environment described by the given specification objects. Examples of agents that can work with continuous action and observation spaces, and use a continuous Gaussian actor, are rlACAgent, rlPGAgent, rlSACAgent, rlPPOAgent, and rlTRPOAgent.

For more information on creating approximator objects such as actors and critics, see Create Policies and Value Functions.

Version History

Introduced in R2022a