Main Content

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

During training, most agents rely on an actor, a critic, or both. The actor learns the policy that selects the action to take. The critic learns the value (or Q-value) function that estimates the value of a policy.

Reinforcement Learning Toolbox™ provides function approximator objects for actors and critics, and policy objects for custom loops and deployment. Approximator objects can internally use different approximation models, such as deep neural networks, linear basis functions, or look-up tables.

For an introduction to policies, value functions, actors and critics, see Create Policies and Value Functions.

Blocks

PolicyReinforcement learning policy (Since R2022b)

Functions

expand all

rlTableValue table or Q table
rlValueFunctionValue function approximator object for reinforcement learning agents (Since R2022a)
rlQValueFunction Q-Value function approximator object for reinforcement learning agents (Since R2022a)
rlVectorQValueFunction Vector Q-value function approximator for reinforcement learning agents (Since R2022a)
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents (Since R2022a)
rlDiscreteCategoricalActorStochastic categorical actor with a discrete action space for reinforcement learning agents (Since R2022a)
rlContinuousGaussianActorStochastic Gaussian actor with a continuous action space for reinforcement learning agents (Since R2022a)
getActorExtract actor from reinforcement learning agent
setActorSet actor of reinforcement learning agent
getCriticExtract critic from reinforcement learning agent
setCriticSet critic of reinforcement learning agent
getModelGet approximation model from function approximator object (Since R2020b)
setModelSet approximation model in function approximator object (Since R2020b)
getLearnableParametersObtain learnable parameter values from agent, function approximator, or policy object
setLearnableParametersSet learnable parameter values of agent, function approximator, or policy object
rlNormalizerConfigure normalization for input of function approximator object (Since R2024a)
getNormalizerGet normalizer from function approximator object (Since R2024a)
setNormalizerSet normalizer in function approximator object (Since R2024a)
normalizeNormalize input data using method defined in normalizer object (Since R2024a)
rlOptimizerOptionsOptimization options for actors and critics (Since R2022a)
getGreedyPolicyExtract greedy (deterministic) policy object from agent (Since R2022a)
getExplorationPolicyExtract exploratory (stochastic) policy object from agent (Since R2023a)
rlMaxQPolicyPolicy object to generate discrete max-Q actions for custom training loops and application deployment (Since R2022a)
rlEpsilonGreedyPolicyPolicy object to generate discrete epsilon-greedy actions for custom training loops (Since R2022a)
rlDeterministicActorPolicyPolicy object to generate continuous deterministic actions for custom training loops and application deployment (Since R2022a)
rlAdditiveNoisePolicyPolicy object to generate continuous noisy actions for custom training loops (Since R2022a)
rlStochasticActorPolicyPolicy object to generate stochastic actions for custom training loops and application deployment (Since R2022a)
rlContinuousDeterministicTransitionFunctionDeterministic transition function approximator object for neural network-based environment (Since R2022a)
rlContinuousGaussianTransitionFunctionStochastic Gaussian transition function approximator object for neural network-based environment (Since R2022a)
rlContinuousDeterministicRewardFunctionDeterministic reward function approximator object for neural network-based environment (Since R2022a)
rlContinuousGaussianRewardFunctionStochastic Gaussian reward function approximator object for neural network-based environment (Since R2022a)
rlIsDoneFunctionIs-done function approximator object for neural network-based environment (Since R2022a)
getActionObtain action from agent, actor, or policy object given environment observations (Since R2020a)
getValueObtain estimated value from a critic given environment observations and actions (Since R2020a)
getMaxQValueObtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations (Since R2020a)
evaluateEvaluate function approximator object given observation (or observation-action) input data (Since R2022a)
quadraticLayerQuadratic layer for actor or critic network
scalingLayerScaling layer for actor or critic network
softplusLayerSoftplus layer for actor or critic network (Since R2020a)
featureInputLayerFeature input layer (Since R2020b)
reluLayerRectified Linear Unit (ReLU) layer
tanhLayerHyperbolic tangent (tanh) layer
fullyConnectedLayerFully connected layer
lstmLayerLong short-term memory (LSTM) layer for recurrent neural network (RNN)
softmaxLayerSoftmax layer

Topics