freezing layers of actor and critic of RL agent

Question

Sourabh le 30 Jan 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2076011-freezing-layers-of-actor-and-critic-of-rl-agent

Modifié(e) : Karanjot le 30 Jan 2024

rewards_refer.png

After training ,I have freezed every layer of my actor and crtitc network of my RL agent (by using setLearnRateFactor(neuralnet,'layers','parameters',0);) and then I am retraining my agent in same enviornment and I am getting rewards like as shown in image file.

My ques is is it normal to get rewards like this? (I mean shouldnt there should be no variation or very little variation in rewards.)

my reward function is 10 - e^2 (error).

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

Karanjot le 30 Jan 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2076011-freezing-layers-of-actor-and-critic-of-rl-agent#answer_1399471

Modifié(e) : Karanjot le 30 Jan 2024

Observing fluctuations in rewards is a common occurrence when retraining a reinforcement learning (RL) agent, despite having locked the parameters of both the actor and critic architectures. The agent continues its exploration and learning within the given environment, and the specified reward function significantly influences the reward outcomes.

The variation in rewards can be influenced by several factors, such as the exploration-exploitation trade-off, the complexity of the environment, and the learning rate of the agent. It is possible that the agent is still trying to optimize its policy and may encounter different states or actions that result in varying rewards.

The environment’s inherent stochasticity can lead to different state transitions and rewards for similar actions. Additionally, if there are other unfrozen parameters or noise processes involved in action selection, they could contribute to the observed variations

You may consider the following steps:

Plot the rewards over time during the retraining process to observe the trend. This can help you understand if the rewards are converging or not.
Experiment with different learning rates for the agent. A higher learning rate may lead to faster convergence but could also result in more variation initially.
You can also try modifying the reward function to see if it reduces the variation in rewards.

Keep in mind that RL training is inherently iterative, and achieving an optimal policy often requires multiple iterations. While some degree of reward variation is to be anticipated, it may indicate a need for further investigation or adjustments in your training setup.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

freezing layers of actor and critic of RL agent

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

freezing layers of actor and critic of RL agent

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Réponses (1)

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens