Deep Q Learning - define an adaptive critic learning rate?

Hello,
at the moment i use Deep Q Learning for process planning and i would like to use an adaptive critic learning rate to speed up the training.
Is there any direct way (or workaround) to use a learning rate that lowers over the training process, e.g. depending on the number of epochs/steps, in DQL?
Thanks in advance and best wishes
Niklas

 Réponse acceptée

Hi Niklas,
I believe this is currently not supported. This is an interesting usecase though - I will inform the development team. Is there any particular model you have in mind that would work better? For example linear/exponential decay, etc.

7 commentaires

Hello Emmanouil,
thank you for your quik response.
I have no particular model in mind yet, because i only aimed at speeding up my trainings. But i think both approaches linear and exponential decays should be fine to do this and would be great extensions to the DQL-implementation.
Kind regards
Niklas
I had a similar idea that the maximum steps of each episode increases over time
For that there is a way to do it using the IsDone flag and some logic that will determine when an episode ends.You should be careful though since the episode rewards will not be directly comparable if the episode lengths are constantly different.
Hello Emmanouil,
could you explain a little bit further why different episode lengths could be problematic?
I am asking, because i am already using the isdone flag to control my episode length flexibly.
If for example, the IsDone flag is activated at the initial episodes of training, that would be fine (imagine stopping an episode if an agent violates some constraint). Eventually though, you would want the agent to learn how to respect these constraints and let the episode terminate naturally. If you, for whatever reason, manually and consistently stop/reduce the episode duration, you are reducing the potential maximum episode reward that can be collected. So, for a more "mature" agent, you may actually he inhibiting its learning potential, meaning that you may see the collected episode reward go down as you reduce the episode duration (which is kind of expected since there are less simulations steps, and thus less collected rewards). Hope that helps
Thanks for your advice, i think i got your point!
Magnify
Magnify le 29 Juil 2020
Modifié(e) : Magnify le 29 Juil 2020
There is one more question why the frequency of Agent's action outport is 0.05s rather than 0.025s specified by the agent sample time in my script createDDPGAgent.m, moreover, there is no way to modify it. there is a picture about sample time display as follow:sample time displayI would appreciate it if you give some tips to me.

Connectez-vous pour commenter.

Plus de réponses (0)

Produits

Version

R2020a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by