- Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
- Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.
10 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
"SampleTime",sampleTime,...
"TargetSmoothFactor",1e-3,...
"DiscountFactor",0.995, ...
"MiniBatchSize",128, ...
"ExperienceBufferLength",1e6);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
"MaxEpisodes",maxEpisodes, ...
"MaxStepsPerEpisode",maxSteps, ...
"ScoreAveragingWindowLength",50, ... "StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",10000, ...
"Verbose", true, ...
"Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:
0 commentaires
Réponses (1)
Matteo D'Ambrosio
le 28 Mai 2023
Modifié(e) : Matteo D'Ambrosio
le 28 Mai 2023
I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
Hope this helps.
0 commentaires
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!