RL DDPG, reward should be negative however episode Q0 reward is becoming positive

Question

Muhammad Nadeem le 18 Oct 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2035389-rl-ddpg-reward-should-be-negative-however-episode-q0-reward-is-becoming-positive

Réponse apportée : Muhammad Nadeem le 31 Oct 2023

Hello Everyone,

I am building LQR type controller. My reward is the negative of LQR quadratic cost given as x'Qx + u'Ru. When i train the DDPG agen the episode Q0 reward is becoming positive. Since according to my understanding Episode Q0 is the estimate of the discounted long-term reward at the start of each episode, given the initial observation of the environment. The how is it possible? why is episode q0 reward going positive bcz the reward function is designed to be negative!

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

UDAYA PEDDIRAJU le 26 Oct 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2035389-rl-ddpg-reward-should-be-negative-however-episode-q0-reward-is-becoming-positive#answer_1340911

Hello Muhammad,

I understand that you are facing an issue where the episode “Q0” reward is becoming positive though it was designed to achieve negative reward, to address this issue, I suggest considering the following solutions:

Scale between “Q0” and episode reward: It is possible that there is a significant difference in scale between the “Q0” estimate and the actual episode reward. This disparity may lead to unexpected results and impact the training process. To investigate this, you can try unchecking the "Show Episode Q0" option to see if it affects the episode reward values.
Another possibility is that there might be an issue with the implementation of the DDPG algorithm itself. The algorithm should be able to handle both positive and negative rewards. It is important to ensure that you are using the return, which is the sum of the rewards for a specific state-action pair from that point until the end of the trajectory.
Simplify the critic network: It might be helpful to simplify the critic network to ensure that it outputs values on a similar scale as the episode reward. This can help align the “Q0” estimates with the actual rewards, providing more accurate feedback for the agent's learning process.

Further you can have a refer to the MathWorks Documentation:

https://www.mathworks.com/matlabcentral/answers/532933-why-is-the-ddpg-episode-rewards-never-change-during-the-whole-training-process.

I hope this helps!

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Answer 2

Muhammad Nadeem le 31 Oct 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2035389-rl-ddpg-reward-should-be-negative-however-episode-q0-reward-is-becoming-positive#answer_1343871

Matlab codes.zip

Hello UDAYA,

Thank you for the details. I have tried all of your options but the problem still persists. The episode Q0 reward just doesnt make sense and becomes extremely huge and positive. What does episode Q0 signify can it be ignored?. According to my knowledge its a matric to tell how good the critic is given the initial observation of the enviroment.

I am attaching the details of my codes also if you want to have a look at them, please find it in the attachments.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

RL DDPG, reward should be negative however episode Q0 reward is becoming positive

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

RL DDPG, reward should be negative however episode Q0 reward is becoming positive

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (2)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens