Effacer les filtres
Effacer les filtres

Is my PPO+LSTM agent reasonable?

11 vues (au cours des 30 derniers jours)
xiang
xiang le 17 Mai 2024
Réponse apportée : Prasanna le 27 Juin 2024 à 8:10
Hello everyone, I created PPO+LSTM agent, here is my code
cnet = [
sequenceInputLayer(numObs,"Name","name1")
fullyConnectedLayer(256,"Name","fc1")
tanhLayer("Name","tanh1")
fullyConnectedLayer(128,"Name","fc2")
tanhLayer("Name","tanh2")
fullyConnectedLayer(64,"Name","fc3")
tanhLayer("Name","tanh3")
lstmLayer(16,"Name","LSTM2")
reluLayer
fullyConnectedLayer(1,"Name","CriticOutput")];
criticdlnet = dlnetwork(cnet);
critic= rlValueFunction(criticdlnet,obsInfo);
anet = [
sequenceInputLayer(numObs,"Name","name1")
fullyConnectedLayer(256,"Name","fc1")
tanhLayer("Name","tanh1")
fullyConnectedLayer(128,"Name","fc2")
tanhLayer("Name","tanh2")
lstmLayer(16,"Name","LSTM1")
reluLayer("Name","relu2")
];
meanPath = [
fullyConnectedLayer(32,Name="meanPathIn")
tanhLayer
fullyConnectedLayer(numAct,"Name","mean")
tanhLayer("Name","meanPathOut")];
stdPath = [
fullyConnectedLayer(32,Name="stdPathIn")
reluLayer
fullyConnectedLayer(numAct,"Name","std")
softplusLayer("Name","stdPathOut")];
actordlnet = layerGraph(anet);
actordlnet = addLayers(actordlnet,meanPath);
actordlnet = addLayers(actordlnet,stdPath);
actordlnet = connectLayers(actordlnet,"relu2","meanPathIn/in");
actordlnet = connectLayers(actordlnet,"relu2","stdPathIn/in");
actordlnet = dlnetwork(actordlnet);
actor = rlContinuousGaussianActor(actordlnet,obsInfo,actInfo,...
"ActionMeanOutputNames","meanPathOut", ...
"ActionStandardDeviationOutputNames","stdPathOut","ObservationInputNames","name1");
agentOptions=rlPPOAgentOptions("SampleTime",Ts,"DiscountFactor",0.992,"ExperienceHorizon",1024,"MiniBatchSize",64,"ClipFactor",0.2, ...
"EntropyLossWeight",0.01,"NumEpoch",1,"AdvantageEstimateMethod","gae","GAEFactor",0.95, ...
"NormalizedAdvantageMethod","current");
agent=rlPPOAgent(actor,critic,agentOptions);
Is my deep network structure and hyperparameters reasonable?
I hope I can get your help?

Réponses (1)

Prasanna
Prasanna le 27 Juin 2024 à 8:10
Hi Xiang,
It is my understanding that the model presented in the code is designed for a Proximal Policy Optimization (PPO) algorithm, incorporating both a critic and an actor network to handle environments with temporal dependencies. The critic network features a sequence input layer, multiple fully connected layers with tanh activations, an LSTM layer for capturing temporal patterns, and a final fully connected layer for value function output. The actor network follows a similar initial structure, with an LSTM layer to process sequences, but then diverges into two paths: one determining the mean and the other the standard deviation of the action distribution, using fully connected layers followed by ‘tanh’ and ‘softplus’ activations, respectively. This bifurcation allows the actor to output a continuous Gaussian action distribution, informed by both the central tendency and variability of actions, based on the current state observations.
Regarding the hyperparameters, the discount factor (0.992) is quite high, and is appropriate for tasks where long term benefits are significant. The mini batch size can be larger to reduce variance but may require more memory and computational power.
Depending on your specific application, you might find better performance with different architectures. For example, additional LSTM layers or different sizes for the fully connected layers might capture the policy or value function more effectively. Normalizing inputs to the network can significantly improve training stability and performance. Also, regularly evaluating your agent in the environment using a separate set of episodes from training to monitor for overfitting or underfitting.
Ultimately, the "reasonableness" of your setup can only be fully assessed through experimentation and iterative modification based on the performance of your agent in the environment it is being trained in.
Hope this helps.

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by