James Sorokhaibam

Last seen: plus d'un an il y a | Actif depuis 2024

Followers: 0 Following: 0

Statistiques

Feeds

Question

High fluctuation in Q0 value for TD3 agent while training.
I am training a TD3 RL agent for pick and place robot. The reward function is, reward = exp(-E/d) where E is the total energy co...

environ 2 ans il y a | 1 réponse | 0

1

réponse