getGreedyPolicy
Description
Examples
Extract Policy Object from Agent
For this example, load the PG agent trained in Train PG Agent to Balance Cart-Pole System.
load("MATLABCartpolePG.mat","agent")
Extract the agent greedy policy using getGreedyPolicy
.
policyDtr = getGreedyPolicy(agent)
policyDtr = rlStochasticActorPolicy with properties: Actor: [1x1 rl.function.rlDiscreteCategoricalActor] UseMaxLikelihoodAction: 1 ObservationInfo: [1x1 rl.util.rlNumericSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1
Note that, in the extracted policy object, the UseMaxLikelihoodAction
property is set to true
. This means that the policy object always generates the maximum likelihood action in response to a given observation, and is therefore greedy (and deterministic).
Alternatively, you can extract a stochastic policy using getExplorationPolicy
.
policyXpl = getExplorationPolicy(agent)
policyXpl = rlStochasticActorPolicy with properties: Actor: [1x1 rl.function.rlDiscreteCategoricalActor] UseMaxLikelihoodAction: 0 ObservationInfo: [1x1 rl.util.rlNumericSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 1
This time, the extracted policy object has the UseMaxLikelihoodAction
property is set to false
. This means that the policy object generates a random action, given an observation. The policy is therefore stochastic and useful for exploration.
Input Arguments
agent
— Reinforcement learning agent
reinforcement learning agent object
Reinforcement learning agent that contains a critic, specified as one of the following objects:
rlPGAgent
(when using a critic to estimate a baseline value function)
Note
if agent
is an rlMBPOAgent
object, to extract the greedy policy, use
getGreedyPolicy(agent.BaseAgent)
.
Output Arguments
policy
— Reinforcement learning policy object
rlMaxQPolicy
object | rlDeterministicActorPolicy
object | rlStochasticActorPolicy
object
Policy object, returned as one of the following:
rlMaxQPolicy
object — Returned whenagent
is anrlQAgent
,rlSARSAAgent
, orrlDQNAgent
object.rlDeterministicActorPolicy
object — Returned whenagent
is anrlDDPGAgent
orrlTD3Agent
object.rlStochasticActorPolicy
object, with theUseMaxLikelihoodAction
set totrue
— Returned whenagent
is anrlACAgent
,rlPGAgent
,rlPPOAgent
,rlTRPOAgent
orrlSACAgent
object. Since the returned policy object has theUseMaxLikelihoodAction
property set totrue
, it always generates the deterministic maximum likelihood action as a response to given observation.
Version History
Introduced in R2022a
See Also
Functions
Objects
rlMaxQPolicy
|rlEpsilonGreedyPolicy
|rlAdditiveNoisePolicy
|rlDeterministicActorPolicy
|rlStochasticActorPolicy
Blocks
Ouvrir l'exemple
Vous possédez une version modifiée de cet exemple. Souhaitez-vous ouvrir cet exemple avec vos modifications ?
Commande MATLAB
Vous avez cliqué sur un lien qui correspond à cette commande MATLAB :
Pour exécuter la commande, saisissez-la dans la fenêtre de commande de MATLAB. Les navigateurs web ne supportent pas les commandes MATLAB.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)