Is my PPO+LSTM agent reasonable?

Question

xiang le 17 Mai 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2119726-is-my-ppo-lstm-agent-reasonable

Réponse apportée : Prasanna le 27 Juin 2024 à 8:10

Hello everyone, I created PPO+LSTM agent, here is my code

cnet = [
    sequenceInputLayer(numObs,"Name","name1")
    fullyConnectedLayer(256,"Name","fc1")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(128,"Name","fc2")
    tanhLayer("Name","tanh2") 
    fullyConnectedLayer(64,"Name","fc3")
    tanhLayer("Name","tanh3")
    lstmLayer(16,"Name","LSTM2")
    reluLayer
    fullyConnectedLayer(1,"Name","CriticOutput")];
criticdlnet = dlnetwork(cnet);
critic= rlValueFunction(criticdlnet,obsInfo);
anet = [
    sequenceInputLayer(numObs,"Name","name1")
    fullyConnectedLayer(256,"Name","fc1")
    tanhLayer("Name","tanh1")
    fullyConnectedLayer(128,"Name","fc2") 
    tanhLayer("Name","tanh2")
    lstmLayer(16,"Name","LSTM1")
    reluLayer("Name","relu2")
    ];                         
meanPath = [
    fullyConnectedLayer(32,Name="meanPathIn")
    tanhLayer
    fullyConnectedLayer(numAct,"Name","mean")
    tanhLayer("Name","meanPathOut")];
stdPath = [
    fullyConnectedLayer(32,Name="stdPathIn")
    reluLayer
    fullyConnectedLayer(numAct,"Name","std")
    softplusLayer("Name","stdPathOut")];
actordlnet = layerGraph(anet);
actordlnet = addLayers(actordlnet,meanPath);
actordlnet = addLayers(actordlnet,stdPath);
actordlnet = connectLayers(actordlnet,"relu2","meanPathIn/in");
actordlnet = connectLayers(actordlnet,"relu2","stdPathIn/in");
actordlnet = dlnetwork(actordlnet);
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo,...
    "ActionMeanOutputNames","meanPathOut", ...
    "ActionStandardDeviationOutputNames","stdPathOut","ObservationInputNames","name1");
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.992,"ExperienceHorizon",1024,"MiniBatchSize",64,"ClipFactor",0.2, ...
                               "EntropyLossWeight",0.01,"NumEpoch",1,"AdvantageEstimateMethod","gae","GAEFactor",0.95, ...
                               "NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);

Is my deep network structure and hyperparameters reasonable?

I hope I can get your help？

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Prasanna le 27 Juin 2024 à 8:10

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2119726-is-my-ppo-lstm-agent-reasonable#answer_1477831

Hi Xiang,

It is my understanding that the model presented in the code is designed for a Proximal Policy Optimization (PPO) algorithm, incorporating both a critic and an actor network to handle environments with temporal dependencies. The critic network features a sequence input layer, multiple fully connected layers with tanh activations, an LSTM layer for capturing temporal patterns, and a final fully connected layer for value function output. The actor network follows a similar initial structure, with an LSTM layer to process sequences, but then diverges into two paths: one determining the mean and the other the standard deviation of the action distribution, using fully connected layers followed by ‘tanh’ and ‘softplus’ activations, respectively. This bifurcation allows the actor to output a continuous Gaussian action distribution, informed by both the central tendency and variability of actions, based on the current state observations.

Regarding the hyperparameters, the discount factor (0.992) is quite high, and is appropriate for tasks where long term benefits are significant. The mini batch size can be larger to reduce variance but may require more memory and computational power.

Depending on your specific application, you might find better performance with different architectures. For example, additional LSTM layers or different sizes for the fully connected layers might capture the policy or value function more effectively. Normalizing inputs to the network can significantly improve training stability and performance. Also, regularly evaluating your agent in the environment using a separate set of episodes from training to monitor for overfitting or underfitting.

Ultimately, the "reasonableness" of your setup can only be fully assessed through experimentation and iterative modification based on the performance of your agent in the environment it is being trained in.

Hope this helps.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Is my PPO+LSTM agent reasonable?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

Is my PPO+LSTM agent reasonable?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens