Reinforcement learning LQR example question

Question

hamad alduaij le 6 Mai 2021

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/822700-reinforcement-learning-lqr-example-question

Commenté : hamad alduaij le 7 Mai 2021

Ouvrir dans MATLAB Online

In the reinforcement learning guide, there is an example for training RL for solving discrete LQR problem.

https://www.mathworks.com/help/reinforcement-learning/ug/train-custom-lqr-agent.html

I want to modify the example to the case where not all the states are controllable nor observable.

This is the code in the given function

function env = myDiscreteEnv(A,B,Q,R)
% This function creates a discrete-time linear system environment.
%
% (A,B) are the system matrices, where dx = Ax + Bu.
% (Q,R) defines the quadratic cost, where r = x'Qx + u'Ru.
% Copyright 2018-2019 The MathWorks Inc.
% observation info
OINFO = rlNumericSpec([size(A,1),1]);
% action info
AINFO = rlNumericSpec([size(B,2),1]);
% environment
env = rlFunctionEnv(OINFO,AINFO,...
    @(action,loggedSignals) myStepFunction(action,loggedSignals,A,B,Q,R),@() myResetFunction(Q));
end
function [Observation, Reward, IsDone, LoggedSignals] = myStepFunction(Action,LoggedSignals,A,B,Q,R)
% This is the step function for the environment, which returns the next
% observation for a given action.
% observations
x = LoggedSignals;
% dynamics
dx = A*x+B*Action;
Observation = dx;
LoggedSignals = dx;
% isDone
IsDone = false; 
% Reward
Reward = -x'*Q*x -Action'*R*Action;
end
function [InitialObservation, LoggedSignals] = myResetFunction(Q)
% This is the reset function for the environment, which sets random initial
% conditions for the observation.
n = size(Q,1);
x0 = rand(n,1);
InitialObservation = x0;
LoggedSignals=InitialObservation;
end

If my system cannot observe all the states do I change observation or logged signal by selecting the elements of dx I want to observe?

I assume I need to change the observation because the x = logged signal line, implies that the actual system dynamics is obtained from logged signal. Obviously I need the system to simulate the entire dynamics but the controller to only have access to some of the state variables.

Also, does the system automatically know the number of controllers (actions) by the dimensions of R,B?

2)

I have a second question, how do I obtain the state values during the training steps? I am trying to simulate an online learner, thus I need the performance during the training stages too. The "trainingStats" commands only give me what the reward was, but I need to know the value of the observations(state variables) too. As I need to know the performance during the training and the deployement, not just during deployement like as shown in the example.

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

hamad alduaij le 7 Mai 2021

What I was able to figure out is maybe I can deploy as online learning by passing the state variables as global variables then tracking how the learning process affected them correct? then continue from learning to deployement with the same variables?

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Reinforcement learning LQR example question

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Reinforcement learning LQR example question

1 commentaire Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens

Réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

1 commentaire
Afficher -1 commentaires plus anciensMasquer -1 commentaires plus anciens