How does the "RL external action" is supposed to work?

8 vues (au cours des 30 derniers jours)
Leonardo Molino
Leonardo Molino le 25 Juil 2024
Commenté : Leonardo Molino le 29 Juil 2024
Hi all,
As some of you may already know, I have been working for a while with a 3DOF model of a business jet. This model is successfully controlled by the TECS algorithm that gives actions to reach some speed and altitude setpoints. The original idea was to train a DDPG agent to emulate these actions, rewarding it appropriately using the specifications of the TECS algorithm. After many weeks of failures, I would like to abandon this path and make one last test using the external action port of the RL block. The idea would be to run the same system with the TECS in parallel with the agent. The latter receives commands from the TECS directly. So I was wondering how learning with external actions works. Do neural networks update their weights and biases by observing the actions of the external agent? Also, can the action be injected continuously or is it better to proceed with an "on-off" approach? For example, I can start with external actions, then after a certain number of seconds they turn off and there is the agent alone. Are there any documents I can consult on this? Thanks

Réponse acceptée

Shubham
Shubham le 25 Juil 2024
Hi Leonardo,
Using an external action port in a Reinforcement Learning (RL) block can be a powerful method to facilitate the training of RL agents by leveraging an existing control system, such as your TECS (Total Energy Control System) algorithm. This approach can help guide the RL agent by providing it with actions that are known to be effective, potentially speeding up the learning process.
How Learning with External Actions Works:
When using external actions in the context of RL, the neural network can indeed update its weights and biases by observing the actions of the external agent. The idea is to use the external actions as a form of supervised learning signal, where the RL agent learns to mimic the external controller initially and then gradually takes over the control as it becomes more proficient.
Steps to Implement Learning with External Actions
  1. Parallel Execution: Run the TECS algorithm in parallel with the RL agent. The TECS algorithm provides the actions that are used as a reference for the RL agent.
  2. External Actions Input: Use the external action port of the RL block to feed the actions from the TECS algorithm into the RL agent. This allows the RL agent to observe both the state of the system and the actions taken by the TECS algorithm.
  3. Warm-up Phase: Start with the RL agent observing and learning from the TECS actions. During this phase, the agent tries to mimic the TECS actions as closely as possible.
  4. Gradual Transition: Gradually reduce the dependency on the TECS actions and allow the RL agent to take more control. This can be done by slowly decreasing the weight of the external actions in the loss function or by using an "on-off" approach where the external actions are turned off after a certain period.
On-Off Approach vs Continuous Injection
  • Continuous Injection: Continuously feeding the TECS actions to the RL agent can provide a consistent learning signal. However, it might make it difficult for the agent to learn to act independently.
  • On-Off Approach: Starting with external actions and then turning them off after a certain period can be effective. This allows the RL agent to learn from the TECS initially and then gradually take over control. This approach can help the agent transition from supervised learning to pure reinforcement learning.
  3 commentaires
Shubham
Shubham le 25 Juil 2024
Yes, the "on-off" approach is effective. If the agent deviates significantly after t_end_mimic, it's better to let the TECS take control and correct the course. This stabilizes training and prevents reinforcing bad behavior.
Impact on Learning:
  • Penalize the agent for deviations and TECS interventions. Reward for maintaining control independently.
  • Example Reward: reward = baseReward - deviationPenalty - interventionPenalty.
By managing deviations and structuring rewards, you can help the RL agent learn effectively while ensuring stability.
Leonardo Molino
Leonardo Molino le 29 Juil 2024
Hi again @Shubham, I was wondering, does it make sense to stop the episode when the external TECS needs to intervene more than the allowed_interventions value? Or should I simply continue the train until t == t_fin? In the latter case, should the isdone signal be modified? Thanks

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by