Reinforcement learning LQR example question

4 vues (au cours des 30 derniers jours)
hamad alduaij
hamad alduaij le 6 Mai 2021
Commenté : hamad alduaij le 7 Mai 2021
In the reinforcement learning guide, there is an example for training RL for solving discrete LQR problem.
I want to modify the example to the case where not all the states are controllable nor observable.
This is the code in the given function
function env = myDiscreteEnv(A,B,Q,R)
% This function creates a discrete-time linear system environment.
%
% (A,B) are the system matrices, where dx = Ax + Bu.
% (Q,R) defines the quadratic cost, where r = x'Qx + u'Ru.
% Copyright 2018-2019 The MathWorks Inc.
% observation info
OINFO = rlNumericSpec([size(A,1),1]);
% action info
AINFO = rlNumericSpec([size(B,2),1]);
% environment
env = rlFunctionEnv(OINFO,AINFO,...
@(action,loggedSignals) myStepFunction(action,loggedSignals,A,B,Q,R),@() myResetFunction(Q));
end
function [Observation, Reward, IsDone, LoggedSignals] = myStepFunction(Action,LoggedSignals,A,B,Q,R)
% This is the step function for the environment, which returns the next
% observation for a given action.
% observations
x = LoggedSignals;
% dynamics
dx = A*x+B*Action;
Observation = dx;
LoggedSignals = dx;
% isDone
IsDone = false;
% Reward
Reward = -x'*Q*x -Action'*R*Action;
end
function [InitialObservation, LoggedSignals] = myResetFunction(Q)
% This is the reset function for the environment, which sets random initial
% conditions for the observation.
n = size(Q,1);
x0 = rand(n,1);
InitialObservation = x0;
LoggedSignals=InitialObservation;
end
If my system cannot observe all the states do I change observation or logged signal by selecting the elements of dx I want to observe?
I assume I need to change the observation because the x = logged signal line, implies that the actual system dynamics is obtained from logged signal. Obviously I need the system to simulate the entire dynamics but the controller to only have access to some of the state variables.
Also, does the system automatically know the number of controllers (actions) by the dimensions of R,B?
2)
I have a second question, how do I obtain the state values during the training steps? I am trying to simulate an online learner, thus I need the performance during the training stages too. The "trainingStats" commands only give me what the reward was, but I need to know the value of the observations(state variables) too. As I need to know the performance during the training and the deployement, not just during deployement like as shown in the example.
  1 commentaire
hamad alduaij
hamad alduaij le 7 Mai 2021
What I was able to figure out is maybe I can deploy as online learning by passing the state variables as global variables then tracking how the learning process affected them correct? then continue from learning to deployement with the same variables?

Connectez-vous pour commenter.

Réponses (0)

Catégories

En savoir plus sur Environments dans Help Center et File Exchange

Produits


Version

R2021a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by