Effacer les filtres
Effacer les filtres

I am working on path planning and obstacle avoidance using deep reinforcement learning but training is not converging.

10 vues (au cours des 30 derniers jours)
Following is the code for creating rl Agent:
criticOpts = rlRepresentationOptions("LearnRate",1e-3,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},criticOpts);
actorOptions = rlRepresentationOptions("LearnRate",1e-4,"L2RegularizationFactor",1e-4,"GradientThreshold",1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,"Observation",{'State'},"Action",{'Action'},actorOptions);
agentOpts = rlDDPGAgentOptions(...
"SampleTime",sampleTime,...
"TargetSmoothFactor",1e-3,...
"DiscountFactor",0.995, ...
"MiniBatchSize",128, ...
"ExperienceBufferLength",1e6);
agentOpts.NoiseOptions.Variance = 0.1;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
obstacleAvoidanceAgent = rlDDPGAgent(actor,critic,agentOpts);
Training options are:
maxEpisodes = 5000;
maxSteps = ceil(Tfinal/sampleTime);
trainOpts = rlTrainingOptions(...
"MaxEpisodes",maxEpisodes, ...
"MaxStepsPerEpisode",maxSteps, ...
"ScoreAveragingWindowLength",50, ... "StopTrainingCriteria","AverageReward", ...
"StopTrainingValue",10000, ...
"Verbose", true, ...
"Plots","training-progress");
trainingStats = train(obstacleAvoidanceAgent,env,trainOpts);
and for training, it is not converging as shown in the attached fig:

Réponses (1)

Matteo D'Ambrosio
Matteo D'Ambrosio le 28 Mai 2023
Modifié(e) : Matteo D'Ambrosio le 28 Mai 2023
I'm not too familiar with DDPG as i use other agents, but by looking at your episode reward figure a few things come to mind:
  1. Try decreasing the sparsity in your episode reward. You have some episodes with 0 reward and some with 10k reward which can generate some problems with gradients. Maybe add a multiplier to the rewards you are giving so that your high-reward episodes reach a reward of ~10, but play around with it.
  2. Decrease learning rate, which always helps when you start a new RL project. At least until you find a number that works. Maybe try something like 1e-4, 1e-5, 1e-6, i wouldn't go lower.
Hope this helps.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by