Episode Q0 increases exponentially
18 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
Can anyone explain why episode Q0 in RL increases exponentially after convergence of reward to a suboptimal policy?
0 commentaires
Réponses (1)
Emmanouil Tzorakoleftherakis
le 16 Fév 2021
Hello,
Please take a look at this answer for some suggestions. Normalizing observations, rewards, and actions can also help avoid situations like these.
Hope this helps
1 commentaire
Voir également
Catégories
En savoir plus sur Training and Simulation dans Help Center et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!