Definition of transitions and rewards in a MDP environment for a reinforcement learning problem.

Question

Bianca Grieco le 8 Jan 2024

0
Lien

Utiliser le lien direct vers cette question

https://fr.mathworks.com/matlabcentral/answers/2067791-definition-of-transitions-and-rewards-in-a-mdp-environment-for-a-reinforcement-learning-problem

Réponse apportée : arushi le 18 Jan 2024

Good morning,

I have an environment with 50 states and 5 actions. From state 1, I transition to states 2 to 25 with action 1; from state 1, I transition to states 26 to 49 with action 2; from states 2 to 25, I transition to state 50 with action 3; from states 26 to 49, I transition to state 50 with action 4. I define the matrices MDP.T as follows:

MDP.T(1,2:25,1) = 1/24;

MDP.T(1,26:49,2) = 1/24;

MDP.T(2:25,50,3) = 1;

MDP.T(26:49,50,4) = 1;

MDP.T(50,50,5) = 0;

The code works only if I consider also these other transitions, which are not feasible in my example.

MDP.T(1,1,3) = 1;

MDP.T(1,1,4) = 1;

MDP.T(2:25,2:25,1) = 1/24;

MDP.T(2:25,2:25,2) = 1/24;

MDP.T(2:25,2:25,4) = 1/24;

MDP.T(2:25,2:25,5) = 1/24;

MDP.T(26:49,26:49,1) = 1/24;

MDP.T(26:49,26:49,2) = 1/24;

MDP.T(26:49,26:49,3) = 1/24;

MDP.T(26:49,26:49,5) = 1/24;

May you help me to understand how to define only feabile transitions?

Thank you in advance

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Bianca Grieco le 12 Jan 2024

Dear Sir, I am implementing a reinforcement learning method in MATLAB for my thesis. Prior to my previous message, I am writing to ask if you could provide me with information regarding both the maximum dimensions of each MDP.T matrix and the total number of MDP.T matrices (and consequently actions) that can be handled by the MATLAB RL toolbox. Thank you in advance for your response.

Bianca Grieco le 14 Jan 2024

Dear Sir Tzorakoleftherakis, have you had time to check the answer respect to the maximum dimension of the transition matrix? Thank you in advance.

Connectez-vous pour commenter.

Connectez-vous pour répondre à cette question.

Answer 1

arushi le 18 Jan 2024

0
Lien

Utiliser le lien direct vers cette réponse

https://fr.mathworks.com/matlabcentral/answers/2067791-definition-of-transitions-and-rewards-in-a-mdp-environment-for-a-reinforcement-learning-problem#answer_1392011

Hi Bianca,

As per my understanding, it seems that you want to define a transition matrix for an MDP with 50 states and 5 actions, where certain transitions are possible based on the current state and the action taken. The code does not work if you do not consider the other transitions. In some MDP implementations, especially when using certain toolboxes or libraries, it's necessary to define transitions for all state-action pairs, even if some actions are not applicable in certain states. This is typically done to avoid undefined behavior in the MDP solver or to comply with the requirements of the specific MDP framework being used.

To handle this in the code, you can assign a self-transition (transition back to the same state) with a probability of 1 for actions that are not applicable in each state. This effectively means that if you take an infeasible action in a state, you stay in the same state with certainty.

Hope this helps.

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Connectez-vous pour commenter.

Definition of transitions and rewards in a MDP environment for a reinforcement learning problem.

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Réponse acceptée

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

Definition of transitions and rewards in a MDP environment for a reinforcement learning problem.

3 commentaires Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

Réponse acceptée

0 commentaires Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens

Plus de réponses (0)

Voir également

Catégories

Tags

Produits

Version

Community Treasure Hunt

3 commentaires
Afficher 1 commentaire plus ancienMasquer 1 commentaire plus ancien

0 commentaires
Afficher -2 commentaires plus anciensMasquer -2 commentaires plus anciens