Independently working multiple reinforcement learning agents
Afficher commentaires plus anciens
Hello everybody, I am using two TD3 RL agents for tracking two different references. However, I recieved the following result of the reward plot. As you can see, when one of the agent works properly the other works very bad and vice verca.

here you can find the code:
- oInfo1 = rlNumericSpec([3,1]);
- oInfo2 = rlNumericSpec([3,1]);
- oInfo.Name = 'observations';
- numObservations = oInfo1.Dimension(1);
- act1 = rlNumericSpec([3,1]);
- act2 = rlNumericSpec([3,1]);
- numActions = act1.Dimension(1);
- obsInfo = {oInfo1,oInfo2};
- actInfo = {act1,act2};
- agentblk =["PV/Control_rll/Agent A", "PV/Control_rll/Agent B"];
- env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
- Ts = 1e-2;
- statePath = [
- featureInputLayer(numObservations,'Normalization','none','Name','State')
- fullyConnectedLayer(64,'Name','CriticStateFC1')
- reluLayer('Name','CriticRelu1')
- fullyConnectedLayer(32,'Name','CriticStateFC2')];
- actionPath = [
- featureInputLayer(numActions,'Normalization','none','Name','Action')
- fullyConnectedLayer(32,'Name','CriticActionFC1')];
- commonPath = [
- additionLayer(2,'Name','add')
- reluLayer('Name','CriticCommonRelu')
- fullyConnectedLayer(32, 'Name','fc3')
- reluLayer('Name','relu3')
- fullyConnectedLayer(16, 'Name','fc4')
- fullyConnectedLayer(1,'Name','CriticOutput')];
- criticNetwork = layerGraph();
- criticNetwork = addLayers(criticNetwork,statePath);
- criticNetwork = addLayers(criticNetwork,actionPath);
- criticNetwork = addLayers(criticNetwork,commonPath);
- criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
- criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
- criticOpts = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
- criticA = rlQValueRepresentation(criticNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},criticOpts);
- criticB = rlQValueRepresentation(criticNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},criticOpts);
- actorNetwork = [
- featureInputLayer(numObservations,'Normalization','none','Name','State')
- fullyConnectedLayer(64, 'Name','actorFC1')
- tanhLayer('Name','actorTanh1')
- fullyConnectedLayer(32, 'Name','actorFC2')
- tanhLayer('Name','actorTanh2')
- fullyConnectedLayer(numActions,'Name','Action')
- ];
- actorOptions = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
- actorA = rlDeterministicActorRepresentation(actorNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},actorOptions);
- actorB = rlDeterministicActorRepresentation(actorNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},actorOptions);
- agentOpts = rlTD3AgentOptions(...
- 'SampleTime',Ts,...
- 'TargetSmoothFactor',1e-3,...
- 'DiscountFactor',.997, ...
- 'MiniBatchSize',64, ...
- 'ExperienceBufferLength',1e6);
- agentA = rlTD3Agent(actorA,criticA,agentOpts);
- agentB = rlTD3Agent(actorB,criticB,agentOpts)
- maxsteps = ceil(6/Ts);
- trainOpts = rlTrainingOptions(...
- 'MaxEpisodes',5000,...
- 'MaxStepsPerEpisode',maxsteps,...
- 'ScoreAveragingWindowLength',20, ...
- 'Verbose',true, ...
I know since R2020b, the agent neural networks are updated independently. However, I can see here that Since R2022a, Learning strategy for each agent group (specified as either "decentralized" or "centralized") could be selected, where I can use decentralized training, that agents collect their own set of experiences during the episodes and learn independently from other agents.
Now my question is that: Do I need to use R2022a or my problem is in envirenment difination?
1 commentaire
Esan freedom
le 24 Mar 2023
Réponse acceptée
Plus de réponses (0)
Catégories
En savoir plus sur Training and Simulation dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!