Training 4 TD3 RL Agents in Simulink to control Buck Converter . They need new observations, initialized from Buck Converter outputs. How to learn continuously from 1s to 5s?

6 vues (au cours des 30 derniers jours)

mohsen le 14 Juin 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2128726-training-4-td3-rl-agents-in-simulink-to-control-buck-converter-they-need-new-observations-initia

Commenté : mohsen le 17 Juin 2024

I am trying to train 4 TD3 RL Agents in the Simulink Environment. Each Agent is supposed to control the output voltage of a Buck Converter by sending its action signal to the input of the Buck Converter (as a reference voltage). For the sake of improving the learning process and enhancing the exploration of the agents, I want to initiate the environment such that at the beginning of each training episode, the agents observe a new set of observations. The issue is that all 9 elements of the observation vectors depend on the output voltages of the Buck Converters (the Actions). So I need to initialize the model at the beginning of each training episode by initializing the inputs of the Buck Converters, then as the agent starts sampling from the environment, replace the initilizing parameters with the actions of the agents. To implement that, I have put the RL Agent blocks in the Triggered Subsystems, and connected their outputs to Swich Blocks for alternating between the initializing parameter and the output of Triggered Subsystems (Action Signals). from the beginning of the episode till the second 1 of the simulation, the model gets initialize with the initializing parameter, then in the second 1, switch will work, and triggered subsystem will be activated. My question is: How can I modify my code so that the agents start the learning process from seconde 1 till the end of simulation time at seconde 5 (4 seconds of training for each episode)?

@ Tzorakoleftherakis I would greatly appreciate your kind help.

4 commentaires
Afficher 2 commentaires plus anciensMasquer 2 commentaires plus anciens

mohsen le 14 Juin 2024

Thank you so much for your prompt response.

To clarify, I have four independent converters, each controlled by its own TD3 RL agent. Therefore, there are four converters and four TD3 agents. These converters are part of a same grid, necessitating cooperative operation, so they share some of their local observations. Actually, this is a Multi-Agent Reinforcement Learning (MARL) problem.

From your previous reply, I understand that the approach of using the triggered subsystems can potentially work. I have prepared two .m files: one for the parameters of my model and another for creating the environment, observation spaces, action spaces, and other aspects related to the RL agents.

My question is, if I run the RL .m file, will the training process automatically start from the 1-second mark of the simulation at each training episode (after the clock triggers the RL subsystem in Simulink), or do I need to modify my RL code to ensure this?

Thank you for your kind help, I really appreciate it.

Emmanouil Tzorakoleftherakis le 17 Juin 2024

Assuming your triggered subsystem is set up properly, the only thing I can think of that's left is to make sure the episode duration/steps in training options accounts for the time the RL training is active/inactive.

mohsen le 17 Juin 2024

I will now ensure that the episode duration/steps in the training options are appropriately adjusted to account for the active and inactive periods of the RL training. Appreciate your guidance!

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.