Deep Q-Network Rewards incorporation?

2 vues (au cours des 30 derniers jours)
Zonghao zou
Zonghao zou le 19 Sep 2020
I have read through most of the current documentations on the Deep Q-Network in Matlab, but it is still not very clear to me how to construct a Deep Q-Network in my case.
I previously wrote my own code for implementing a simple Q-learning, for which, I constructed a Q-matrix with corresponding states and actions. I am now trying to explore how to do the same with Deep Q-Network.
The overall goal is to trying to work out a best policy for an object to move from location A to location B (assuming it is in 2-D)
I have a specific function that has all the necessary physical relationship which will return the corresponding rewards given the current state and action. (lets' say it is called the function F).
I see on the documentations: https://www.mathworks.com/help/reinforcement-learning/ref/rldqnagent.html#d122e15363, to create an agent I must create an observation and an action sets.
In my case, since I can return the specific rewards per action given current state, what should I put down as my observation? (How should I incorporate my function F into the agent?)
Also, in the documentations, I don't see anywhere it takes rewards or calculate rewards for certain actions.
Could somone help me please?
Thanks

Réponse acceptée

Emmanouil Tzorakoleftherakis
Hello,
If you have a look at this page, it shows where the reward is incorporated in a custom MATLAB environment. As you can see, the reward is included in the 'step' method, which plays the same role as your F function, so you do not have to do anything different thatn what you are doing already - you just need to create an environment object.

Plus de réponses (3)

Madhav Thakker
Madhav Thakker le 23 Sep 2020
Modifié(e) : Madhav Thakker le 23 Sep 2020
Hi Zonghao,
I understand you want to construct a Deep Q-Network. The observationInfo tells you the behaviour for your observations. In your case, you want to move an object on a grid. The observations can be the position of the object on the grid. So, your observationInfo will be rlFiniteSetSpec.
obsInfo = rlNumericSpec([2 1])
This creates observation of dimension [2,1]. If required, you can also specify upper and lower limits for your observations.
Hopet this helps.
  1 commentaire
Zonghao zou
Zonghao zou le 24 Sep 2020
Hi Madhav,
The thing is I don't want to specify the type of movements I will get by choosing one action. For example, in the grid situation when I choose right I might move right. One specific action determines a specific choose. However, in my case, I have no idea where the object will move by taking one out of all possible actions.
Therefore, all resutls come from my physics governing equation. I have attached a graph. This explains what I want to achieve.
Any help will be appreciated! Thank you

Connectez-vous pour commenter.


Sabiya Hussain
Sabiya Hussain le 29 Août 2022
Hello there! I'm working on a project based on Q-learning i really need some help regarding a Markov decision process matlab program it is an example of Recycling robot i need your help

Sabiya Hussain
Sabiya Hussain le 29 Août 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!

Translated by