Reinforcement Learning for Developing Field-Oriented Control
Use reinforcement learning and the DDPG algorithm for field-oriented control of a Permanent Magnet Synchronous Motor. This demonstration replaces two PI controllers with a reinforcement learning agent in the inner loop of the standard field-oriented control architecture and shows how to set up and train an agent using the reinforcement learning workflow.
In this video, we show how to use reinforcement learning for field oriented control of a permanent magnet synchronous motor.
To showcase this, we start with an example that uses the typical field oriented control architecture, where the outer loop controller is responsible for speed control; whereas the inner loop PI controllers are responsible for controlling the d-axis and q-axis currents.
We then create and validate a reinforcement learning agent that replaces the inner loop controllers of this architecture.
The use of RL agent is especially beneficial when the system is nonlinear, in which case we can train a single RL agent instead of tuning PI controllers at multiple operating conditions.
In this example, we use a linear motor model to showcase the workflow of field oriented control using reinforcement learning, and this workflow remains the same for a complex nonlinear motor as well.
Let’s look at the Simulink model that implements the field oriented control architecture.
This model contains two control loops: an outer speed loop and an inner current loop.
The outer loop is implemented in the ‘Speed Control’ subsystem, and it contains a PI controller that is responsible for generating reference currents for the inner loop.
The inner loop is implemented in the ‘Current Control’ subsystem and contains two PI controllers to determine reference voltages in the dq frame.
The reference voltage is then used to generate appropriate PWM signals that control the semiconductor switches of the inverter, which then drives the permanent magnet synchronous motor to achieve desired torque and flux.
Let’s go ahead and run the Simulink model.
We can see that the tracking performance of the controllers are good and are able to track the desired speed.
Let’s save this result for later comparison with the reinforcement learning controller.
Now we update the existing model by replacing the two PI controllers in the current loop with a Reinforcement Learning Agent block.
In this example we use DDPG as the reinforcement learning algorithm, which trains an actor and a critic simultaneously to learn an optimal policy that maximizes long-term reward.
Once the Simulink model is updated with the reinforcement learning block, we then follow the reinforcement learning workflow to setup, train, and simulate the controller.
Reinforcement learning workflow is as follows:
First step is to create an environment. In this example, we already have a Simulink model that contains the permanent magnet synchronous motor modeled using Motor Control Blockset and Simscape Electrical within the ‘Plant and Inverter’ subsystem.
We then use this Simulink model to create a reinforcement learning environment interface with appropriate observations and actions.
Here the observations to the reinforcement learning block are error in the stator currents ‘id error’ and ‘iq error’ and the stator currents ‘id’ and ‘iq’.
Actions are the stator voltages ‘vd’ and ‘vq’.
Next we create the reward signal to let the reinforcement learning agent know how good or bad the actions it selects during training are, based on its interaction with the environment.
Here we shape a reward based on the quadratic reward penalty that penalizes distance from goal and control effort.
Then we move on to creating network architecture.
Here we construct the actor and the critic networks as required by the DDPG algorithm programmatically using MATLAB functions for layers and representations.
The neural networks can also be constructed using the Deep Network Designer app and then imported into MATLAB.
The critic network in this example takes in observations and actions as the input and gives estimated Q values as the output.
The actor network, on the other hand, takes in observations as the input and gives actions as the output.
With actor and critic representations created, we can create a DDPG agent.
The sample time of the DDPG agent is configured depending on the execution requirement of the control loop.
In general, agents with smaller sample time take longer time to train as it involves a greater number of simulation steps each episode.
We are now ready to train the agent.
First, we specify the training options.
Here we specify that we want to run training for at most 2000 episodes and stop training if the average reward exceeds the provided value.
We then use the ‘train’ command to start the training process.
In general, it is best practice to randomize reference signals to the controller during the training process to obtain a more robust policy. This can be done by writing a local reset function for the environment.
During the training process, progress can be monitored in the Episode Manager.
Once the training is complete, we can simulate and verify the control policy from the trained agent.
By simulating the model with the trained agent, we see that the speed tracking performance of field oriented control is good with reinforcement learning agent controlling the stator currents.
On viewing this performance with the previously saved output, we see that performance of field oriented control with reinforcement learning agent is comparable to its PI controller counterpart.
This concludes the video.
Sélectionner un site web
Choisissez un site web pour accéder au contenu traduit dans votre langue (lorsqu'il est disponible) et voir les événements et les offres locales. D’après votre position, nous vous recommandons de sélectionner la région suivante : .
Vous pouvez également sélectionner un site web dans la liste suivante :
Comment optimiser les performances du site
Pour optimiser les performances du site, sélectionnez la région Chine (en chinois ou en anglais). Les sites de MathWorks pour les autres pays ne sont pas optimisés pour les visites provenant de votre région.