Is it possible to change RL action values under certain conditions?

4 vues (au cours des 30 derniers jours)

black_cat le 18 Mai 2021

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/833083-is-it-possible-to-change-rl-action-values-under-certain-conditions

Modifié(e) : black_cat le 20 Mai 2021

I want my agent to output a target value, but in certain situations (reward drops dramatically), I would want the agent to look for a better solution by letting him change the target value. I tried to use initial condition block in order to use the target value in the first place. However, my agent (PPO) always outputs an average value after some training episodes.

5 commentaires
Afficher 3 commentaires plus anciensMasquer 3 commentaires plus anciens

black_cat le 20 Mai 2021

Modifié(e) : black_cat le 20 Mai 2021

I've tried to create a minimal version that illustrates my problem. Here, I'm outputing numbers from 1-3. I hope it's more understandable that way.

black_cat le 20 Mai 2021

Modifié(e) : black_cat le 20 Mai 2021

Okay, even though the attached example is supposed to be easy to understand, I think I'm able to put my problem in simple terms now:

I'm training my agent to output 3 discrete values (1, 2, 3)
I punish him for not outputing my target value
My target value is 1 for 50% of the time and 3 for the other 50% of the time

When training the agent is done (no matter which one, they all act the same in this case), it will output 1 or 3. For 100% of the time. It's not changing the output values at all. It's just using one. This is my problem.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.