Episode Q0 increases exponentially

16 Fév 2021

1 Réponse

11 Vues (30 jours)

0 votes

Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?

0 commentaires
Afficher -2 commentaires plus anciens Masquer -2 commentaires plus anciens

0 votes

Hello,

Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.

Hope this helps

DAMODARAN B.K le 17 Fév 2021

Modifié(e) : DAMODARAN B.K le 17 Fév 2021

is episode Q0, criticnetwork output or target value?

En savoir plus sur Reinforcement Learning dans Centre d'aide et File Exchange

Find the treasures in MATLAB Central and discover how the community can help you!

Translated by