I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works? Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q} Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+... Any help is highly appreciated.

number of look ahead steps in DDPG Agent Options

3 vues (au cours des 30 derniers jours)

ALOK RANJAN SWAIN le 21 Fév 2020

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options

Commenté : Dingshan Sun le 1 Sep 2022

I want to know how does the parameter "NumStepsToLookAhead" in rlDDPGAgentOptions from reinforcement learning toolboxof matlab 2019b works?

Whether the look ahead is done on target networks? (like modification in critic objective, from {r+gamma*Qt - Q} to {r+ sum(gamma**i*Qt) -Q}
Or the look ahead is done on reward sampling itself? ( like changing reward "r" from each sample to "r+gamma*r_t+gamma**2*r_t+1+...

Any help is highly appreciated.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Réponses (1)

Anh Tran le 1 Mar 2020

1
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/506744-number-of-look-ahead-steps-in-ddpg-agent-options#answer_417996

I am not sure what does reward sampling mean. "NumStepsToLookAhead" in rlDDPGAgentOptions changes the critic's target values in step 5 of DDPG training algorithm.

Assume g is the discount factor, the critic target will be as followed

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

ALOK RANJAN SWAIN le 4 Mar 2020

Thanks for your help.??

Dingshan Sun le 1 Sep 2022

Could you give a hint how R_t,R_t_1,,R_t+2,...,R_t+n-1 can be obtained in an online off-policy algorithm? Especially for DRL methods that use an experience replay?

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Catégories

Control Systems Reinforcement Learning Toolbox Environments

En savoir plus sur Environments dans Help Center et File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by