When an agent makes more than one action sometimes these actions have opposite effects rather than coordinating effects

Question

Huixin le 14 Mar 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2094316-when-an-agent-makes-more-than-one-action-sometimes-these-actions-have-opposite-effects-rather-than-c

Réponse apportée : Subhajyoti le 22 Août 2024

I am now doing a deep reinforcement learning experiment based on multiple agents, each agent can emit 3 action signals, but why do these action signals always appear opposite effects instead of synergistic effects

Sincerely look forward to an answer

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Subhajyoti le 22 Août 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2094316-when-an-agent-makes-more-than-one-action-sometimes-these-actions-have-opposite-effects-rather-than-c#answer_1503434

Hi Huixin,

These kinds of challenges are common in reinforcement learning applications. You can address the issues you face with your RL Agent using some of the following ways:

Reward Function: It must reflect the desired cooperative behaviour. You can try incorporating shared rewards or team-based objectives to align individual agent goals with the overall system performance.
Training Stability: A stable ‘Average Reward Curve’ indicates consistent learning, while high variance might indicate instability or conflicting actions among agents. Ideally, the variance reduces as training progresses.
Hyperparameter Tuning: Often, tuning the hyperparameters like the Learning Rate (LR), Discount Factors (DF), etc. can significantly improve the performance of the RL Agents.

Amongst other things, you can also tweak the architecture of the network by increasing the depth with additional layers to improve the learning capability of your agent.

Also, during training, you can vary the ‘Epsilon-Greedy Parameter’. Begin with a high epsilon value (close to 1) to encourage exploration and allow agents to discover diverse strategies and state-action pairs. Gradually decay epsilon towards 0 in later phases to shift focus on exploiting the learned policy, ensuring agents make the most of their training experiences.

You may go through the following MathWorks documentation links to know more about Training RL Agents and training options.

Train RL Agents: https://www.mathworks.com/help/reinforcement-learning/ug/train-reinforcement-learning-agents.html
Agent Options: https://www.mathworks.com/help/reinforcement-learning/ref/rl.option.rltd3agentoptions.html
Training and Simulation: https://www.mathworks.com/help/reinforcement-learning/training-and-simulation.html

I hope this helps

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

When an agent makes more than one action sometimes these actions have opposite effects rather than coordinating effects

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

When an agent makes more than one action sometimes these actions have opposite effects rather than coordinating effects

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens