Q-Value function critic representation for reinforcement learning agents

This object implements a Q-value function approximator to be used as a critic
within a reinforcement learning agent. A Q-value function is a function that maps an
observation-action pair to a scalar value representing the expected total long-term rewards
that the agent is expected to accumulate when it starts from the given observation and
executes the given action. Q-value function critics therefore need both observations and
actions as inputs. After you create an `rlQValueRepresentation`

critic, use it
to create an agent relying on a Q-value function critic, such as an `rlQAgent`

, `rlDQNAgent`

, `rlSARSAAgent`

, `rlDDPGAgent`

, or `rlTD3Agent`

. For more
information on creating representations, see Create Policy and Value Function Representations.

creates the Q-value function `critic`

= rlQValueRepresentation(`net`

,`observationInfo`

,`actionInfo`

,'Observation',`obsName`

,'Action',`actName`

)`critic`

. `net`

is
the deep neural network used as an approximator, and must have both observations and
action as inputs, and a single scalar output. This syntax sets the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the inputs
`observationInfo`

and `actionInfo`

, containing
the observations and action specifications. `obsName`

must contain
the names of the input layers of `net`

that are associated with the
observation specifications. The action name `actName`

must be the
name of the input layer of `net`

that is associated with the action
specifications.

creates the Q-value function based `critic`

= rlQValueRepresentation(`tab`

,`observationInfo`

,`actionInfo`

)`critic`

with *discrete
action and observation spaces* from the Q-value table
`tab`

. `tab`

is a `rlTable`

object
containing a table with as many rows as the possible observations and as many columns as
the possible actions. This syntax sets the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the inputs
`observationInfo`

and `actionInfo`

, which must
be `rlFiniteSetSpec`

objects containing the specifications for the discrete observations and action spaces,
respectively.

creates a Q-value function based `critic`

= rlQValueRepresentation({`basisFcn`

,`W0`

},`observationInfo`

,`actionInfo`

)`critic`

using a custom basis
function as underlying approximator. The first input argument is a two-elements cell in
which the first element contains the handle `basisFcn`

to a custom
basis function, and the second element contains the initial weight vector
`W0`

. Here the basis function must have both observations and
action as inputs and `W0`

must be a column vector. This syntax sets
the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the inputs
`observationInfo`

and `actionInfo`

.

creates the `critic`

= rlQValueRepresentation(`net`

,`observationInfo`

,`actionInfo`

,'Observation',`obsName`

)*multi-output* Q-value function
`critic`

*for a discrete action space*. `net`

is the deep
neural network used as an approximator, and must have only the observations as input and
a single output layer having as many elements as the number of possible discrete
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the inputs
`observationInfo`

and `actionInfo`

, containing
the observations and action specifications. Here, `actionInfo`

must
be an `rlFiniteSetSpec`

object containing the specifications for the discrete action space. The observation
names `obsName`

must be the names of the input layers of
`net`

.

creates the `critic`

= rlQValueRepresentation({`basisFcn`

,`W0`

},`observationInfo`

,`actionInfo`

)*multi-output* Q-value function
`critic`

*for a discrete action space* using a custom basis function as
underlying approximator. The first input argument is a two-elements cell in which the
first element contains the handle `basisFcn`

to a custom basis
function, and the second element contains the initial weight matrix
`W0`

. Here the basis function must have only the observations as
inputs, and `W0`

must have as many columns as the number of possible
actions. This syntax sets the ObservationInfo
and ActionInfo
properties of `critic`

respectively to the inputs
`observationInfo`

and `actionInfo`

.

creates the value function based `critic`

= rlQValueRepresentation(___,`options`

)`critic`

using the additional option
set `options`

, which is an `rlRepresentationOptions`

object. This syntax sets the Options
property of `critic`

to the `options`

input
argument. You can use this syntax with any of the previous input-argument
combinations.

`rlDDPGAgent` | Deep deterministic policy gradient reinforcement learning agent |

`rlTD3Agent` | Twin-delayed deep deterministic policy gradient reinforcement learning agent |

`rlDQNAgent` | Deep Q-network reinforcement learning agent |

`rlQAgent` | Q-learning reinforcement learning agent |

`rlSARSAAgent` | SARSA reinforcement learning agent |

`rlSACAgent` | Soft actor-critic reinforcement learning agent |

`getValue` | Obtain estimated value function representation |

`getMaxQValue` | Obtain maximum state-value function estimate for Q-value function representation with discrete action space |