Randomized position of obstacles in Grid World
Afficher commentaires plus anciens
Hello everyone!
I'm working on training a Q-learning agent using a standard 5x5 gridworld environment. I would like to implement in my environment obstacles such that they change at every episode in the training without ever coinciding with the target state of course. Anyone got any intel?
Here is my code:
GW = createGridWorld(5,5);
GW.CurrentState = '[1,1]';
GW.TerminalStates = '[3,3]';
GW.ObstacleStates = ["[3,2]";"[2,2]";"[2,3]";"[2,4]"; "[3,4]"];
updateStateTranstionForObstacles(GW);
nS = numel(GW.States);
nA = numel(GW.Actions);
GW.R = -1*ones(nS,nS,nA);
% GW.R(state2idx(GW,"[2,4]"),state2idx(GW,"[4,4]"),:) = 5;
GW.R(:,state2idx(GW,GW.TerminalStates),:) = 10;
env = rlMDPEnv(GW)
env.ResetFcn = @() 1;
rng(0)
qTable = rlTable(getObservationInfo(env),getActionInfo(env));
qRepresentation = rlQValueRepresentation(qTable,getObservationInfo(env),getActionInfo(env));
qRepresentation.Options.LearnRate = 1;
agentOpts = rlQAgentOptions;
agentOpts.EpsilonGreedyExploration.Epsilon = .04;
qAgent = rlQAgent(qRepresentation,agentOpts);
%training
trainOpts = rlTrainingOptions;
trainOpts.MaxStepsPerEpisode = 50;
trainOpts.MaxEpisodes= 200;
trainOpts.StopTrainingCriteria = "AverageReward";
trainOpts.StopTrainingValue = 11;
trainOpts.ScoreAveragingWindowLength = 30;
doTraining = true;
if doTraining
% Train the agent.
trainingStats = train(qAgent,env,trainOpts);
else
% Load the pretrained agent for the example.
load('basicGWQAgent.mat','qAgent')
end
plot(env)
env.Model.Viewer.ShowTrace = true;
env.Model.Viewer.clearTrace;
sim(qAgent,env)
1 commentaire
Francesco Rizzo
le 23 Mai 2022
Did you manage to do it? I have the same problem
Réponses (1)
StevenKlein
le 8 Août 2022
Modifié(e) : StevenKlein
le 8 Août 2022
0 votes
Catégories
En savoir plus sur Parallel and Cloud dans Centre d'aide et File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!