rlDiscreteCategoricalActor
Stochastic categorical actor with a discrete action space for reinforcement learning agents
Description
This object implements a function approximator to be used as a stochastic actor
within a reinforcement learning agent with a discrete action space. A discrete categorical
actor takes an environment state as input and returns as output a random action sampled from a
categorical (also known as Multinoulli) probability distribution of the expected cumulative
long term reward, thereby implementing a stochastic policy. After you create an
rlDiscreteCategoricalActor
object, use it to create a suitable agent, such
as an rlACAgent
or rlPGAgent
agent. For
more information on creating representations, see Create Policies and Value Functions.
Creation
Syntax
Description
creates a stochastic actor with a discrete action space, using the deep neural network
actor
= rlDiscreteCategoricalActor(net
,observationInfo
,actionInfo
)net
as function approximator. For this actor,
actionInfo
must specify a discrete action space. The network input
layers are automatically associated with the environment observation channels according to
the dimension specifications in observationInfo
. The network must
have a single output layer with as many elements as the number of possible discrete
actions, as specified in actionInfo
. This function sets the
ObservationInfo
and ActionInfo
properties of
actor
to the inputs observationInfo
and
actionInfo
, respectively.
Note
actor
does not enforce constraints set by the action
specification, therefore, when using this actor, you must enforce action space
constraints within the environment.
specifies the names of the network input layers to be associated with the environment
observation channels. The function assigns, in sequential order, each environment
observation channel specified in actor
= rlDiscreteCategoricalActor(net
,observationInfo
,actionInfo
,ObservationInputNames=netObsNames
)observationInfo
to the layer
specified by the corresponding name in the string array netObsNames
.
Therefore, the network input layers, ordered as the names in
netObsNames
, must have the same data type and dimensions as the
observation specifications, as ordered in observationInfo
.
creates a discrete space stochastic actor using a custom basis function as underlying
approximator. The first input argument is a two-element cell array whose first element is
the handle actor
= rlDiscreteCategoricalActor({basisFcn
,W0
},observationInfo
,actionInfo
)basisFcn
to a custom basis function and whose second
element is the initial weight matrix W0
. This function sets the
ObservationInfo
and ActionInfo
properties of
actor
to the inputs observationInfo
and
actionInfo
, respectively.
specifies the device used to perform computational operations on the
actor
= rlDiscreteCategoricalActor(___,UseDevice=useDevice
)actor
object, and sets the UseDevice
property
of actor
to the useDevice
input argument. You
can use this syntax with any of the previous input-argument combinations.
Input Arguments
Properties
Object Functions
rlACAgent | Actor-critic reinforcement learning agent |
rlPGAgent | Policy gradient reinforcement learning agent |
rlPPOAgent | Proximal policy optimization reinforcement learning agent |
getAction | Obtain action from agent or actor given environment observations |