Creating an actorLossFunction for ContinuousDeterministicActor
6 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
rtn
le 24 Mai 2022
Réponse apportée : Takeshi Takahashi
le 2 Juin 2022
Hi in the example the actor loss function is the following for a rlDiscreteCategoricalActor
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Elements',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
Here is my
actInfo =
rlNumericSpec with properties:
LowerLimit: [2×1 double]
UpperLimit: [2×1 double]
Name: "CartPole Action"
Description: [0×0 string]
Dimension: [2 1]
DataType: "double"
obsInfo =
rlNumericSpec with properties:
LowerLimit: -Inf
UpperLimit: Inf
Name: "CartPole States"
Description: "pendulum_force, cart position, cart velocity"
Dimension: [4 1501]
DataType: "double"
Here is how I set my actor
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
actor = accelerate(actor,true);
actorOpts = rlOptimizerOptions('LearnRate',1e-3);
actorOptimizer = rlOptimizer(actorOpts);
To create my loss function can I do the following?
function loss = actorLossFunction(policy, lossData)
policy = policy{1};
% Create the action indication matrix.
batchSize = lossData.batchSize;
Z = repmat(lossData.actInfo.Dimension(1)',1,batchSize);
actionIndicationMatrix = lossData.actionBatch(:,:) == Z;
% Resize the discounted return to the size of policy.
G = actionIndicationMatrix .* lossData.discountedReturn;
G = reshape(G,size(policy));
% Round any policy values less than eps to eps.
policy(policy < eps) = eps;
% Compute the loss.
loss = -sum(G .* log(policy),'all');
end
0 commentaires
Réponse acceptée
Takeshi Takahashi
le 2 Juin 2022
Please take a look at this example for rlContinuousDeterministicActor if you want to use it in a custom training loop.
rlDiscreteCategoricalActor is for stochastic discrete actions while rlContinuousDeterministicActor is for deterministic continuous actions. You need different formulations.
0 commentaires
Plus de réponses (0)
Voir également
Catégories
En savoir plus sur Policies and Value Functions dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!