Is it possible to change RL action values under certain conditions?

1 vue (au cours des 30 derniers jours)
black_cat
black_cat le 18 Mai 2021
Modifié(e) : black_cat le 20 Mai 2021
I want my agent to output a target value, but in certain situations (reward drops dramatically), I would want the agent to look for a better solution by letting him change the target value. I tried to use initial condition block in order to use the target value in the first place. However, my agent (PPO) always outputs an average value after some training episodes.
  5 commentaires
black_cat
black_cat le 20 Mai 2021
Modifié(e) : black_cat le 20 Mai 2021
I've tried to create a minimal version that illustrates my problem. Here, I'm outputing numbers from 1-3. I hope it's more understandable that way.
black_cat
black_cat le 20 Mai 2021
Modifié(e) : black_cat le 20 Mai 2021
Okay, even though the attached example is supposed to be easy to understand, I think I'm able to put my problem in simple terms now:
  • I'm training my agent to output 3 discrete values (1, 2, 3)
  • I punish him for not outputing my target value
  • My target value is 1 for 50% of the time and 3 for the other 50% of the time
When training the agent is done (no matter which one, they all act the same in this case), it will output 1 or 3. For 100% of the time. It's not changing the output values at all. It's just using one. This is my problem.

Connectez-vous pour commenter.

Réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by