Effacer les filtres
Effacer les filtres

beta distribution in PPO

1 vue (au cours des 30 derniers jours)
Sourabh
Sourabh le 2 Fév 2024
Commenté : Kautuk Raj le 15 Fév 2024
I want to confine the actions of my PPO algorithm and I was thinking whether or not I can implement beta distribution for my PPO algorithm to confine my action space somehow.
heres the script of networks i am using
----------
commonPath = [
featureInputLayer(prod(obsInfo.Dimension),Name="comPathIn")
fullyConnectedLayer(120)
tanhLayer
fullyConnectedLayer(1,Name="comPathOut")
];
% Define mean value path
meanPath = [
fullyConnectedLayer(64,Name="meanPathIn")
tanhLayer
fullyConnectedLayer(64,Name="fc_2")
tanhLayer
fullyConnectedLayer(prod(actInfo.Dimension))
leakyReluLayer(0.1,Name="meanPathOut")
];
% Define standard deviation path
sdevPath = [
fullyConnectedLayer(64,"Name","stdPathIn")
tanhLayer
fullyConnectedLayer(64)
tanhLayer
fullyConnectedLayer(prod(actInfo.Dimension));
softmaxLayer(Name="stdPathOut")
];
% Add layers to layerGraph object
actorNet = layerGraph(commonPath);
actorNet = addLayers(actorNet,meanPath);
actorNet = addLayers(actorNet,sdevPath);
% Connect paths
actorNet = connectLayers(actorNet,"comPathOut","meanPathIn/in");
actorNet = connectLayers(actorNet,"comPathOut","stdPathIn/in");
actorNetwork = dlnetwork(actorNet);
  1 commentaire
Kautuk Raj
Kautuk Raj le 15 Fév 2024
To implement a Beta distribution for the action outputs in the PPO algorithm, I think we would need to modify the network architecture to output the parameters (alpha and beta) of the Beta distribution. These parameters must be positive, so one would typically use an activation function that ensures positivity, such as the softplus function.

Connectez-vous pour commenter.

Réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by