How does the Q-Learning update the qTable by using the reinforcement learning toolbox?

2 vues (au cours des 30 derniers jours)
The 'MaxEpisodes' and "maxStepPerEpisode' are set to 1.
I ran the following code. After the first episode, the Q(4,1) is set to -1.
However, I ran the “train section" and the both Q(4,1) and Q(4,2) are updated, as shown in the following figure.
In the second episode, the action 2 is executed in state 4. Therefore, In my opion, only Q(4,2) should be updated as -1.
Why is Q(4,2) set to 0.7441?
Why is Q(4,1) is updated too and set to -1.67?
clear
GW = createGridWorld(4,4);
GW.CurrentState = '[2,1]';
GW.TerminalStates = '[4,4]';
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW);
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
critic = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
critic.Options.LearnRate =1;
agentOpt = rlQAgentOptions;
agentOpt.EpsilonGreedyExploration.Epsilon = 0.05;
agentOpt.DiscountFactor = 1;
agent = rlQAgent(critic, agentOpt);
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
%% train section
rng(0)
opt = rlTrainingOptions(...
'MaxEpisodes',1,...
'MaxStepsPerEpisode',1,...
'StopTrainingCriteria',"AverageReward",...
'Plots', "none",...
'StopTrainingValue',480);
trainStats = train(agent,env,opt);
%%
aa = getLearnableParameters(getCritic(agent));

Réponses (1)

Emmanouil Tzorakoleftherakis
Can you try
critic.Options.L2RegularizationFactor=0;
This parameter is nonzero by default and likely the reason for the discrepancy you are observing
  2 commentaires
Tracy Shang
Tracy Shang le 4 Mai 2021
Modifié(e) : Tracy Shang le 4 Mai 2021
Thanks for your answer!
I tried the code you suggested. The resut showed no difference.
But you inspired me!
I tried another parameter just like as follows. The qTable was updated as shown in the following figure.
critic.Options.OptimizerParameters.GradientDecayFactor =0;
I tried both parameters by add the following codes and the qTable was updated as shown in the following figure. At least, the question about Q(4,1) is solved.
According the parameters I set, the equtation of calculating Qvalue is simplified as follows.
That is, .
Why is Q(4,2) set to -1.4139?
critic.Options.OptimizerParameters.GradientDecayFactor =0;
critic.Options.L2RegularizationFactor=0;
Looking forward to your further answer. Thank you very much!

Connectez-vous pour commenter.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by