Hi,
I am using the Matlab Reinforcement Learning toolbox to train an rlQAgent.
The issue that I am facing is that the corresponding QTable, i.e., the output of the command getLearnableParameters(getCritic(qAgent)), is reset each time the train command is used.
Is it possible to avoid this reset so to train further a previously trained agent?
Thank you
Corrado

 Réponse acceptée

Emmanouil Tzorakoleftherakis
Modifié(e) : Emmanouil Tzorakoleftherakis le 20 Mai 2020

0 votes

If you stop training, you should be able to continue from where you left off. I called 'train' on the basic grid world example a couple of times in a row and the output of 'getLearnableParameters(getCritic(qAgent))' was different. You can always save the trained agent and reload it as well to make sure you don't accidentally delete it.
Update:
There is a regularization term added to the loss which causes the other entries to change slightly. To avoid this, you can type:
qRepresentation.Options.L2RegularizationFactor=0;

5 commentaires

Corrado Possieri
Corrado Possieri le 20 Mai 2020
Modifié(e) : Corrado Possieri le 20 Mai 2020
I am actually traying to set the initial Qtable for the agent.
If I run the code
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qTable.Table = randn(size(qTable.Table));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
agentOpts = rlQAgentOptions;
agentOpts.DiscountFactor = 1;
qAgent = rlQAgent(qRepresentation,agentOpts);
trainOpts = rlTrainingOptions;
trainOpts.Plots = 'none';
trainOpts.MaxEpisodes = 1;
trainOpts.MaxStepsPerEpisode = 1;
trainOpts.Verbose = 1;
QTable0 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable1 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable2 = getLearnableParameters(getCritic(qAgent));
disp(find(QTable0{1} ~= QTable1{1}))
disp(find(QTable1{1} ~= QTable2{1}))
I get what I expect, that is just one and two entries of the QTable are changed.
However, if I try to force the initial value of the QTable
env = rlPredefinedEnv("BasicGridWorld");
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qTable.Table = randn(size(qTable.Table));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
agentOpts = rlQAgentOptions;
agentOpts.DiscountFactor = 1;
qAgent = rlQAgent(qRepresentation,agentOpts);
trainOpts = rlTrainingOptions;
trainOpts.Plots = 'none';
trainOpts.MaxEpisodes = 1;
trainOpts.MaxStepsPerEpisode = 1;
trainOpts.Verbose = 1;
QTable0 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable1 = getLearnableParameters(getCritic(qAgent));
train(qAgent,env,trainOpts);
QTable2 = getLearnableParameters(getCritic(qAgent));
disp(find(QTable0{1} ~= QTable1{1}))
disp(find(QTable1{1} ~= QTable2{1}))
all its entries are perturbed as if the QTable is somehow reinitialized.
Maybe I am missing something but looks like the two scripts posted are exactly the same
The difference is that in the second script the QTable is initialized randomly with the following additional line
qTable.Table = randn(size(qTable.Table));
If you run the two script you will see that just one entry of the QTable is modified by the training algorithm, whereas, in the second, the whole QTable is changed by just a single step of the training algorithm.
Updated my answer above with a solution - hope that helps.
Corrado Possieri
Corrado Possieri le 20 Mai 2020
Thank you Emmanouil, this solved the issue.

Connectez-vous pour commenter.

Plus de réponses (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by