Creating an actorLossFunction for ContinuousDeterministicActor

Question

0 votes

Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Elements',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
end

Here is my

actInfo =

rlNumericSpec with properties:

LowerLimit: [2×1 double]

UpperLimit: [2×1 double]

Name: "CartPole Action"

Description: [0×0 string]

Dimension: [2 1]

DataType: "double"

obsInfo =

rlNumericSpec with properties:

LowerLimit: -Inf

UpperLimit: Inf

Name: "CartPole States"

Description: "pendulum_force, cart position, cart velocity"

Dimension: [4 1501]

DataType: "double"

Here is how I set my actor

actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);

To create my loss function can I do the following?

function loss = actorLossFunction(policy, lossData)
    policy = policy{1};
    % Create the action indication matrix.
    batchSize = lossData.batchSize;
    Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
    actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
    
    % Resize the discounted return to the size of policy.
    G = actionIndicationMatrix .* lossData.discountedReturn;
    G = reshape(G,size(policy));
    
    % Round any policy values less than eps to eps.
    policy(policy < eps) = eps;
    
    % Compute the loss.
    loss = -sum(G .* log(policy),'all');
    
end

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Takeshi Takahashi le 2 Juin 2022

0 votes

Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.

rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Creating an actorLossFunction for ContinuousDeterministicActor

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Community Treasure Hunt

Creating an actorLossFunction for Continuous​Determinis​ticActor

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

Plus de réponses (0)

Catégories

Produits

Version

Tags

Voir également

Community Treasure Hunt

Creating an actorLossFunction for ContinuousDeterministicActor

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens