ddpg agent does not learn
Afficher commentaires plus anciens
hi im using a ddpg alghorithm to learn for tuning a pd like controller (transpose jacobian) for tuning its gains.my gains need to be beetween 0.01 and 0.00001 and based on this range i tune my variance : variance*sqrt(sample time) = 10% of range
but my agent does not learn and just see peaks some times but after that it falls to minimum again. i dont know why is this happening.

and the construct of my architectures is:
statepath = [featureInputLayer(numObs , Name = 'stateinp')
fullyConnectedLayer(96,Name = 'stateFC1')
reluLayer
fullyConnectedLayer(74,Name = 'stateFC2')
reluLayer
fullyConnectedLayer(36,Name = 'stateFC3')]
actionpath = [featureInputLayer(numAct, Name = 'actinp')
fullyConnectedLayer(72,Name = 'actFC1')
reluLayer
fullyConnectedLayer(36,Name = 'actFC2')]
commonpath = [additionLayer(2,Name = 'add')
fullyConnectedLayer(96,Name = 'FC1')
reluLayer
fullyConnectedLayer(72,Name = 'FC2')
reluLayer
fullyConnectedLayer(24,Name = 'FC3')
reluLayer
fullyConnectedLayer(1,Name = 'output')]
critic_network = layerGraph()
critic_network = addLayers(critic_network,actionpath)
critic_network = addLayers(critic_network,statepath)
critic_network = addLayers(critic_network,commonpath)
critic_network = connectLayers(critic_network,'actFC2','add/in1')
critic_network = connectLayers(critic_network,'stateFC3','add/in2')
plot(critic_network)
critic = dlnetwork(critic_network)
criticOptions = rlOptimizerOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueFunction(critic,obsInfo,actInfo,...
'ObservationInputNames','stateinp','ActionInputNames','actinp');
%% actor
actorNetwork = [featureInputLayer(numObs,Name = 'observation')
fullyConnectedLayer(72,Name = 'actorFC1')
reluLayer
fullyConnectedLayer(48,Name='actorFc2')
reluLayer
fullyConnectedLayer(36,Name='actorFc3')
reluLayer
fullyConnectedLayer(numAct,Name='output')
tanhLayer
scalingLayer(Name = 'actorscaling',scale = max(actInfo.UpperLimit))]
actorNetwork = dlnetwork(actorNetwork);
actorOptions = rlOptimizerOptions('LearnRate',5e-04,'GradientThreshold',1);
actor = rlContinuousDeterministicActor(actorNetwork,obsInfo,actInfo);
%% agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime',0.001,...
'ActorOptimizerOptions',actorOptions,...
'CriticOptimizerOptions',criticOptions,...
'ExperienceBufferLength',1e6,...
'MiniBatchSize',128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent_MTJ_rl_mobilemanipualtor9 = rlDDPGAgent(actor,critic,agentOption
Réponses (2)
Mrutyunjaya Hiremath
le 23 Juil 2023
Check this
% Define the observation and action space
numObs = 4; % Replace with the actual number of observation features
numAct = 2; % Replace with the actual number of action dimensions
% Create the actor network
actorNetwork = [
featureInputLayer(numObs, 'Name', 'observation')
fullyConnectedLayer(72, 'Name', 'actorFC1')
reluLayer
fullyConnectedLayer(48, 'Name', 'actorFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'actorFC3')
reluLayer
fullyConnectedLayer(numAct, 'Name', 'output')
tanhLayer
scalingLayer('Name', 'actorscaling', 'Scale', max(actInfo.UpperLimit))
];
actorNetwork = dlnetwork(actorNetwork);
% Create the critic network
statePath = [
featureInputLayer(numObs, 'Name', 'stateinp')
fullyConnectedLayer(96, 'Name', 'stateFC1')
reluLayer
fullyConnectedLayer(74, 'Name', 'stateFC2')
reluLayer
fullyConnectedLayer(36, 'Name', 'stateFC3')
];
actionPath = [
featureInputLayer(numAct, 'Name', 'actinp')
fullyConnectedLayer(72, 'Name', 'actFC1')
reluLayer
fullyConnectedLayer(36, 'Name', 'actFC2')
];
commonPath = [
additionLayer(2, 'Name', 'add')
fullyConnectedLayer(96, 'Name', 'FC1')
reluLayer
fullyConnectedLayer(72, 'Name', 'FC2')
reluLayer
fullyConnectedLayer(24, 'Name', 'FC3')
reluLayer
fullyConnectedLayer(1, 'Name', 'output')
];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork, statePath);
criticNetwork = addLayers(criticNetwork, actionPath);
criticNetwork = addLayers(criticNetwork, commonPath);
criticNetwork = connectLayers(criticNetwork, 'actFC2', 'add/in1');
criticNetwork = connectLayers(criticNetwork, 'stateFC3', 'add/in2');
critic = dlnetwork(criticNetwork);
% Create the actor and critic options
actorOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 5e-4));
criticOptions = rlRepresentationOptions('Optimizer', rlADAMOptimizer('LearnRate', 1e-3));
% Create the actor and critic representations
actor = rlDeterministicActorRepresentation(actorNetwork, obsInfo, actInfo, 'Observation', 'observation', actorOptions);
critic = rlQValueRepresentation(criticNetwork, obsInfo, actInfo, 'Observation', 'stateinp', 'Action', 'actinp', criticOptions);
% Create the DDPG agent
agentOptions = rlDDPGAgentOptions(...
'SampleTime', 0.001,...
'Actor', actor,...
'Critic', critic,...
'ExperienceBufferLength', 1e6,...
'MiniBatchSize', 128);
agentOptions.NoiseOptions.StandardDeviation = 0.03;
agentOptions.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(obsInfo, actInfo, agentOptions);
% Train the agent
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 1000,...
'MaxStepsPerEpisode', 1000,...
'ScoreAveragingWindowLength', 5,...
'Plots', 'training-progress');
trainingStats = train(agent, env, trainOpts);
awcii
le 24 Juil 2023
0 votes
.
1 commentaire
Harold
le 31 Mar 2025
@awciihill climb racing Bonjour, de quoi souhaitez-vous discuter ? ou je me demande encore où. Veuillez être clair sur les problèmes que vous rencontrez. Si cela correspond à ma compréhension, je suis prêt à vous aider.
Catégories
En savoir plus sur Reinforcement Learning dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!