- Action Bounds: Ensure that the action bounds are correctly defined. If the boundaries are too restrictive, the agent might struggle to learn effective actions.
- Normalization: Normalizing the inputs and outputs can significantly impact training stability. Consider normalizing both state and action values to a common range (e.g., [0, 1]).
- Custom Environment: Verify that your custom environment is correctly implemented. Double-check the reward function, state representation, and action space.
- Exploration Noise: TD3 relies on exploration noise to encourage exploration. Ensure that the noise level is appropriate during training.
TD3算法训练时动作总是输出边界值
28 vues (au cours des 30 derniers jours)
Afficher commentaires plus anciens
我在使用TD3算法训练完成后,无论训练过程中奖励曲线是否收敛,动作总是输出边界值或者输出完全不正确。我的state的值在0-20000,动作边界在0-15000.是哪里出了问题,是自定义环境创建的不正确还是哪里?需要对输入输出进行归一化吗
0 commentaires
Réponses (1)
UDAYA PEDDIRAJU
le 14 Mar 2024
Hi 泽宇,
Regarding your issue with the TD3 algorithm where actions always output at boundary values regardless of whether the reward curve converges.
It’s essential to investigate a few potential factors:
you can refer to the documentation TD3: https://www.mathworks.com/help/reinforcement-learning/ug/td3-agents.html.
Voir également
Catégories
En savoir plus sur Big Data Processing dans Help Center et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!