- Environment Reset: Start each episode by resetting the environment.
- Action Selection: Use the actor network to select an action based on the current observation.
- Environment Step: Apply the action to the environment (e.g., via sim for Simulink models) and collect the next observation, reward, and done flag.
- Experience Storage: Store the transition (state, action, reward, next state, done) in a replay buffer.
- Learning: Sample mini-batches from the buffer and perform gradient updates on the actor and critic networks.
- Target Updates: Soft update the target networks (actor and critic) toward the main networks.
- Logging & Evaluation: Track performance (e.g., cumulative reward) and optionally evaluate the agent periodically.
Training DDPG agent with custom training loop
13 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Currently, I am designing a control system using deep reinforcement learning (DDPG) in reinforcement learning toolbox, MATLAB/Simulink. Specifically, I need to implement a custom training loop that does not rely on train functon. Could you please show me how to implement a custom training loop for training a DDPG agent? I would like to understand how to implement a standard DDPG-based control system using a custom training loop in MATLAB.
I will now provide the MATLAB code I currently use train function for a DDPG agent. Could you convert it into a version that uses a custom training loop (without using train)?
obsInfo = rlNumericSpec([6 1]);
obsInfo.Name = "observations";
actInfo = rlNumericSpec([1 1]);
actInfo.Name = "control input";
mdl ='SIM_RL'; % Simulink model by Plant + RL agent block
env = rlSimulinkEnv( ...
"SIM_RL", ...
"SIM_RL/Agent/RL Agent", ...
obsInfo, actInfo);
% Domain randomization: Reset function
env.ResetFcn = @(in)localResetFcn(in);
function in = localResetFcn(in)
% Fixed range of plant parameter
M_min = Nominal_value*(1 - 0.5); % -50% of nominal mass
M_max = Nominal_value*(1 + 0.5); % +50% of nominal mass
% Randomize mass
randomValue_M = M_min + (M_max - M_min) * rand;
in = setBlockParameter(in, ...
"SIM_RL/Plant/Mass", ...
Value=num2str(randomValue_M));
end
% The construction of the critic Network structure is omitted here.
% ....
criticNet = initialize(criticNet);
critic = rlQValueFunction(criticNet,obsInfo,actInfo);
% The construction of the actor Network structure is omitted here.
% ....
actorNet = initialize(actorNet);
actor = rlContinuousDeterministicActor(actorNet,obsInfo,actInfo);
% Set-up agent
criticOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
actorOpts = rlOptimizerOptions(LearnRate=1e-04,GradientThreshold=1);
agentOpts = rlDDPGAgentOptions(...
SampleTime=0.01,...
CriticOptimizerOptions=criticOpts,...
ActorOptimizerOptions=actorOpts,...
ExperienceBufferLength=1e5,...
DiscountFactor=0.99,...
MiniBatchSize=128,...
TargetSmoothFactor=1e-3);
agent = rlDDPGAgent(actor,critic,agentOpts);
maxepisodes = 5000;
maxsteps = ceil(Simulation_End_Time/0.01);
trainOpts = rlTrainingOptions(...
MaxEpisodes=maxepisodes,...
MaxStepsPerEpisode=maxsteps,...
ScoreAveragingWindowLength=5,...
Verbose=true,...
Plots="training-progress",...
StopTrainingCriteria="EpisodeCount",...
SaveAgentCriteria="EpisodeReward",...
SaveAgentValue=-1.0);
doTraining = true;
if doTraining
evaluator = rlEvaluator(...
NumEpisodes=1,...
EvaluationFrequency=5);
% Train the agent.
trainingStats = train(agent,env,trainOpts,Evaluator=evaluator);
else
% Load the pretrained agent
load("agent.mat","agent")
end
0 commentaires
Réponses (1)
Hitesh
le 3 Juin 2025
Hi 平成,
To convert DDPG agent training setup from using the "train" function into a custom training loop in MATLAB. The custom loop gives you greater control over training, evaluation, logging, and integration with domain randomization.
Main Components of a Custom Training Loop are:
Kindly refer to the following custom training loop as an example.
% Create agent
agent = rlDDPGAgent(actor, critic, agentOpts);
% Experience buffer
buffer = agent.ExperienceBuffer;
% Logging
episodeRewards = zeros(maxEpisodes,1);
% Custom Training Loop
for episode = 1:maxEpisodes
% Reset environment and agent
initialObs = reset(env);
agent.reset();
% Track episode reward
totalReward = 0;
for step = 1:maxStepsPerEpisode
% Get action from agent
action = getAction(agent, initialObs);
% Step the environment
[nextObs, reward, isDone, ~] = step(env, action);
% Store experience
experience = rlExperience(initialObs, action, reward, nextObs, isDone);
append(buffer, experience);
% Learn from experience if enough samples available
if buffer.NumExperiences >= agentOpts.MiniBatchSize
learn(agent, buffer);
end
% Update state and reward
initialObs = nextObs;
totalReward = totalReward + reward;
if isDone
break;
end
end
% Log reward
episodeRewards(episode) = totalReward;
fprintf("Episode %d: Total Reward = %.2f\n", episode, totalReward);
% Optional: save best agent
if mod(episode, 50) == 0
save(sprintf('agent_episode_%d.mat', episode), 'agent');
end
end
For more information regarding "DDPG Training Algorithm", kindly refer to the following MATLAB documentation:
0 commentaires
Voir également
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!