rlPGAgent

Policy gradient reinforcement learning agent

Description

The policy gradient (PG) algorithm is a model-free, online, on-policy reinforcement learning method. A PG agent is a policy-based reinforcement learning agent which directly computes an optimal policy that maximizes the long-term reward.

For more information on PG agents, see Policy Gradient Agents.

For more information on the different types of reinforcement learning agents, see Reinforcement Learning Agents.

Creation

Description

agent = rlPGAgent(actor) creates a PG agent with the specified actor network. By default, the UseBaseline property of the agent is false in this case.

agent = rlPGAgent(actor,critic) creates a PG agent with the specified actor and critic networks. By default, the UseBaseline option is true in this case.

example

agent = rlPGAgent(___,agentOptions) creates a PG agent and sets the AgentOptions property.

Input Arguments

expand all

Actor network representation, specified as an rlStochasticActorRepresentation. For more information on creating actor representations, see Create Policy and Value Function Representations.

Critic network representation, specified as an rlValueRepresentation object. For more information on creating critic representations, see Create Policy and Value Function Representations.

Properties

expand all

Agent options, specified as an rlPGAgentOptions object.

Object Functions

trainTrain a reinforcement learning agent within a specified environment
simSimulate a trained reinforcement learning agent within a specified environment
getActorGet actor representation from reinforcement learning agent
setActorSet actor representation of reinforcement learning agent
getCriticGet critic representation from reinforcement learning agent
setCriticSet critic representation of reinforcement learning agent
generatePolicyFunctionCreate function that evaluates trained policy of reinforcement learning agent

Examples

collapse all

Create an environment interface.

% load predefined environment
env = rlPredefinedEnv("DoubleIntegrator-Discrete");

% get observation and specification info
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);

Create a critic representation to use as a baseline.

% create a network to be used as underlying critic approximator
baselineNetwork = [
    imageInputLayer([obsInfo.Dimension(1) 1 1], 'Normalization', 'none', 'Name', 'state')
    fullyConnectedLayer(8, 'Name', 'BaselineFC')
    reluLayer('Name', 'CriticRelu1')
    fullyConnectedLayer(1, 'Name', 'BaselineFC2', 'BiasLearnRateFactor', 0)];

% set some options for the critic
baselineOpts = rlRepresentationOptions('LearnRate',5e-3,'GradientThreshold',1);

% create the critic based on the network approximator
baseline = rlValueRepresentation(baselineNetwork,obsInfo,'Observation',{'state'},baselineOpts);

Create an actor representation.

% create a network to be used as underlying actor approximator
actorNetwork = [
    imageInputLayer([obsInfo.Dimension(1) 1 1], 'Normalization', 'none', 'Name', 'state')
    fullyConnectedLayer(numel(actInfo.Elements), 'Name', 'action', 'BiasLearnRateFactor', 0)];

% set some options for the actor
actorOpts = rlRepresentationOptions('LearnRate',5e-3,'GradientThreshold',1);

% create the actor based on the network approximator
actor = rlStochasticActorRepresentation(actorNetwork,obsInfo,actInfo,...
    'Observation',{'state'},actorOpts);

Specify agent options, and create a PG agent using the environment, actor, and critic.

agentOpts = rlPGAgentOptions(...
    'UseBaseline',true, ...
    'DiscountFactor', 0.99);
agent = rlPGAgent(actor,baseline,agentOpts)
agent = 
  rlPGAgent with properties:

    AgentOptions: [1x1 rl.option.rlPGAgentOptions]

To check your agent, use getAction to return the action from a random observation.

getAction(agent,{rand(2,1)})
ans = -2

You can now test and train the agent against the environment.

Introduced in R2019a