Why reinforcement learning has different results of action between sim() and getAction()?

Question

Shuyue Li le 7 Sep 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2018301-why-reinforcement-learning-has-different-results-of-action-between-sim-and-getaction

Réponse apportée : Emmanouil Tzorakoleftherakis le 25 Sep 2023

Hi Matlab reinforcement learning team

I have a well-trained PPO actor-critic agent and turned UseExplorationPolicy to 0 to obtain actions from sim() and getAction() respectively without any random setting in env. They share the same observations and agents.

However, the actions obtained from sim() and getAction() are different, though the actions can be reproduced respectively.

Thus, I would like to know how sim() generates actions. Does action come from actor network? If so, why the results are different with the same network?

code

actoraction = getAction(saved_agent,{testobstate});

ResetHandleT = @() myResetFunctionCNsim(testData,testobstate);

StepHandleT = @(Action,StockSaved) myStepFunctionCNsim(Action,StockSaved,testData,testobstate);

envT = rlFunctionEnv(observationInfo,actionInfo,StepHandleT,ResetHandleT);

experience = sim(envT,saved_agent,simOpts);

Look forward to your reply.

Sincerely,

Shuyue

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Emmanouil Tzorakoleftherakis le 25 Sep 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2018301-why-reinforcement-learning-has-different-results-of-action-between-sim-and-getaction#answer_1317957

Hi,

Which release are you using? We tried in R2023a and R2023b with UseExplorationPolicy =0 and getAction and sim provide the same results. A reproduction model would be great.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Why reinforcement learning has different results of action between sim() and getAction()?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Why reinforcement learning has different results of action between sim() and getAction()?

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens