PPO agent - Experience Horizon in MATLAB

Question

Harry Dunn le 19 Avr 2023

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/1949658-ppo-agent-experience-horizon-in-matlab

Modifié(e) : Piyush Dubey le 29 Mai 2023

On the PPO examples it states that the agent interacts with the environment until it interacts with the environment for the experience horizon amount of steps at which point it will split the experiences into minibatches each of which are put through the neural network 'epochs' times. e.g.,

If experience horizion was 512, mini batch 128, and epoch 3. The agent would interact with the environment for 512 steps (512 experiences) split this into 4 mini batches of size 128 each. Put each batch through the neural network 3 times. Is this correct? I ask because elsewhere the terms buffer size, mini batch size and time horizon are used.

I also saw in the example which used experience horizon of 200 and mini batch of 64 that it would collect experiences until it reaches 200 or the episode terminates at which point it will learn. So is it correct that if only 100 experiences were collected then this would be split into 2 batches, one of 64 and one of 36 rather than waiting for the next epsiode for the experience horizon to reach 200. i.e., the experience horizon resets each episode and the experiences are put through the neural network and the end of each episode regardless of whether the experience horizon has been reached?

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Piyush Dubey le 29 Mai 2023

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/1949658-ppo-agent-experience-horizon-in-matlab#answer_1246319

Modifié(e) : Piyush Dubey le 29 Mai 2023

Hi Harry,

The experience horizon is the maximum number of time steps that an agent can collect experience for during a single episode. It is a hyperparameter that the user sets before training the agent and it depends on the specific task/environment in which the agent will operate. If the agent reaches the experience horizon without terminating the episode, it will reset and start a new episode.

Time horizon refers to the maximum number of time steps that an agent can take in a single episode of interaction with the environment. This is a common concept when dealing with sequential decision-making problems in which the agent must choose actions over a certain period of time to achieve some goal. For example, in a game of chess, the time horizon is the maximum number of moves that a player can make before the game ends.

In many cases, the time horizon and experience horizon can be set to the same value, especially for simple problems or when collecting experience is relatively fast. However, for more complex problems, it may not be feasible to set the time horizon and experience horizon to the same value, due to computational or memory limitations.

If the experience horizon is 200 and the mini-batch size is 64, there will be three mini-batches per epoch, with a remaining of 8 experiences per epoch that cannot be included in a complete mini-batch. This is because 200 (the experience horizon) divided by 64 (the mini-batch size) is approximately 3, with 8 remaining experiences.

Epoch-based training in PPO with minibatches is typically done by first collecting a batch of experience samples by interacting with the environment using the current policy. Then, the experience samples are randomly sampled to create a mini-batch for updating the policy.

You can follow the following documentation for more information:

https://in.mathworks.com/help/reinforcement-learning/ug/ppo-agents.html

https://in.mathworks.com/help/reinforcement-learning/ug/train-ppo-agent-to-land-rocket.html

Hope this helps!

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

PPO agent - Experience Horizon in MATLAB

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

PPO agent - Experience Horizon in MATLAB

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens