How can I simulate Direct ADP?
Afficher commentaires plus anciens
I want to simulate this article ( On-Line Learning Control by Association and Reinforcement ) but I have a problem in obtaining optimum weight for critic neural network. The critic neural network error in this article is [ e_c = J(t) - (J(t-1) - r(t)) ] , and at the begining the critic weights are selected randomly. My question is that, at the begining we dont have any J(t-1) and also we know that J(t) and r(t) are positive functions, so if we consider J(t-1) = 0, then J(t) will converg to -r(t) and become a negative number that is false.

Réponses (0)
Catégories
En savoir plus sur Reinforcement Learning Toolbox dans Centre d'aide et File Exchange
Produits
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!