Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

Question

Ahmed R. Sayed le 18 Jan 2022

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1631210-modifying-the-control-actions-to-safe-ones-before-storing-in-the-experience-buffer-during-sac-agent

Réponse apportée : Ahmed R. Sayed le 21 Sep 2022

Hello everyone,

I am implementing a safe off-policy DRL SAC algorithm. Using an iterative convex optimization algorithm moves actions into a safe region. However, this algorithm is applied in the environment. Therefore, the existing rlSACAgent still store unsafe actions in the buffer, and the agent cannot learn the modified actions. Therefore, the iterative algorithm will be supplied with unlearned actions and takes more time to converge. My question is:

How can I store the modified actions in the experience buffer instead of the unsafe ones?

Illustrative figure:

Many thanks for your help.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Ahmed R. Sayed le 21 Sep 2022

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1631210-modifying-the-control-actions-to-safe-ones-before-storing-in-the-experience-buffer-during-sac-agent#answer_1057795

I found the solution: You need to use the Simulink environment and the RL Agent block with the last action port.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Modifying the control actions to safe ones before storing in the experience buffer during SAC agent training.

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens