Independently working multiple reinforcement learning agents

Question

0 votes

Hello everybody, I am using two TD3 RL agents for tracking two different references. However, I recieved the following result of the reward plot. As you can see, when one of the agent works properly the other works very bad and vice verca.

here you can find the code:

oInfo1 = rlNumericSpec([3,1]);
oInfo2 = rlNumericSpec([3,1]);
oInfo.Name = 'observations';
numObservations = oInfo1.Dimension(1);
act1 = rlNumericSpec([3,1]);
act2 = rlNumericSpec([3,1]);
numActions = act1.Dimension(1);
obsInfo = {oInfo1,oInfo2};
actInfo = {act1,act2};
agentblk =["PV/Control_rll/Agent A", "PV/Control_rll/Agent B"];
env = rlSimulinkEnv(mdl,agentblk,obsInfo,actInfo);
Ts = 1e-2;
statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(32,'Name','CriticStateFC2')];
actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','Action')
fullyConnectedLayer(32,'Name','CriticActionFC1')];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(32, 'Name','fc3')
reluLayer('Name','relu3')
fullyConnectedLayer(16, 'Name','fc4')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
criticOpts = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
criticA = rlQValueRepresentation(criticNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},criticOpts);
criticB = rlQValueRepresentation(criticNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},criticOpts);
actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(64, 'Name','actorFC1')
tanhLayer('Name','actorTanh1')
fullyConnectedLayer(32, 'Name','actorFC2')
tanhLayer('Name','actorTanh2')
fullyConnectedLayer(numActions,'Name','Action')
];
actorOptions = rlRepresentationOptions('LearnRate',1e-02,'GradientThreshold',1);
actorA = rlDeterministicActorRepresentation(actorNetwork,oInfo1,act1,'Observation',{'State'},'Action',{'Action'},actorOptions);
actorB = rlDeterministicActorRepresentation(actorNetwork,oInfo2,act2,'Observation',{'State'},'Action',{'Action'},actorOptions);
agentOpts = rlTD3AgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',.997, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentA = rlTD3Agent(actorA,criticA,agentOpts);
agentB = rlTD3Agent(actorB,criticB,agentOpts)
maxsteps = ceil(6/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',5000,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',20, ...
'Verbose',true, ...

I know since R2020b, the agent neural networks are updated independently. However, I can see here that Since R2022a, Learning strategy for each agent group (specified as either "decentralized" or "centralized") could be selected, where I can use decentralized training, that agents collect their own set of experiences during the episodes and learn independently from other agents.

Now my question is that: Do I need to use R2022a or my problem is in envirenment difination?

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Esan freedom le 24 Mar 2023

@Emmanouil Tzorakoleftherakis

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Follow Question

Answer 1

Emmanouil Tzorakoleftherakis le 24 Mar 2023

0 votes

Centralized learning makes learning and exploration more efficient because the agents share things like experiences. If agents perform similar/collaborative tasks this could speed up training. If the tasks are inherently different, you should probably go with decentralized learning.

That said, training multiple agents simultaneously is challenging because the environment violates the markov assumption. To help with that you should make sure to share as much info between agents as possible. At they very minimum, the actions of one agent should be observations of the other and vice versa.

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Lin le 24 Juil 2024

Hello Emmanouil Tzorakoleftherakis:

Do you have references for examples of multiple agents?

Emmanouil Tzorakoleftherakis le 24 Juil 2024

Replied on the other thread

https://www.mathworks.com/matlabcentral/answers/2139171-references-to-multi-agent-reinforcement-learning-schemes-in-the-reinforcement-learning-toolbox?s_tid=prof_contriblnk

Connectez-vous pour commenter.

Independently working multiple reinforcement learning agents

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Plus de réponses (0)

Catégories

Tags

Community Treasure Hunt

Independently working multiple reinforcement learning agents

1 commentaire Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

Réponse acceptée

3 commentaires Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien

Plus de réponses (0)

Catégories

Tags

Voir également

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciens Masquer -1 commentaires plus anciens

3 commentaires
Afficher 1 commentaire plus ancien Masquer 1 commentaire plus ancien