reinforcement learning algorithm that is deployed online on a time varying system

Question

hamad alduaij le 8 Mai 2021

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/824845-reinforcement-learning-algorithm-that-is-deployed-online-on-a-time-varying-system

Commenté : hamad alduaij le 14 Mai 2021

Hello,

I asked a similar question https://www.mathworks.com/matlabcentral/answers/822700-reinforcement-learning-lqr-example-question?s_tid=srchtitle but I want to focus on a very specific part.

I am trying to program a reinforcement learning algorithm that has to learn and control a system at the same time. I have one year hourly load data that impact the system, and I want to design an agent that needs to learn and control for that specific year given that data.

My problem is it seems to me the training phase is isolated from the simulation phase, so it's not clear to me how can I make the system advance through time during the training phase, record the state variables, and continue to the deployement phase. So my system is characterized by 8760 matrices each for every year of the hour. I want to make sure that:

1) when I train my agent for each training step that passes it advances to a new hour, basically I want to make sure my system isn't getting to try two different things in the same hour. To achieve that, do I need to make the episode length 1 ?

2) I need to have a history profile of all the actions my agent took during the training time, and the state values. In the training stats I can only see the reward per episode. I can't see the actions my agent took, nor the corrosponding state values.

3) my confusion is the reset function, I don't want the environment to reset the states, it has to continue from the last values. But why am i forced to use a reset function?

I think as long as I have these three things, I should be able to deploy my system to actively learn for x many hours then control for 8760-x many hours. From the way I see in the toolbox, there doesn't seem to be a clear way to do that. Can someone clarify for me?

What I think I need to do is make my states global variables, so I can try to train for one hour only, then advance the system to next time iteration, and also store the action variable to an indexed global variable?

2 commentaires
Afficher AucuneMasquer Aucune

Emmanouil Tzorakoleftherakis le 13 Mai 2021

Modifié(e) : Emmanouil Tzorakoleftherakis le 13 Mai 2021

Looks like you want to do model-based RL which is not supported out of the box right now. I would recommend learning the dynamics and policy in separate training sessions. For learning the dynamics you can use supervised learning since you have data already.

There is also no out of he box way to deploy "learning" yet (we are working on that).

hamad alduaij le 14 Mai 2021

I want to clarify something, I am trying to compare the performance of an RL algorithm to a model based estimation algorithm. I need my RL algorithm to only have access to specific number of data points (hours) before it has to be used for control because I want to see its performance as an online learner. Also the action taken by the agent modify the environment. Basically I am doing control of power system, and I want to evaluate the performance of an RL algorithm after it has N number of data to work with let say 200 hours of data. Hence the framework of reset function seem to be counter intuitive to me,as I don't get to modify the environment I can only work with I can only change settings of regulators and observe the voltage. But after the training finishes my environment has to continue from the last state and deploy the optimal policy for the remainder of data points hours. Just to clarify let's say I allow the reinforcement algorithm to learn and explore for maximum of two weeks then it has to control for one year.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

reinforcement learning algorithm that is deployed online on a time varying system

2 commentaires
Afficher AucuneMasquer Aucune

Réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

reinforcement learning algorithm that is deployed online on a time varying system

2 commentaires Afficher AucuneMasquer Aucune

Réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

2 commentaires
Afficher AucuneMasquer Aucune