Deploy Trained Reinforcement Learning Policies
Once you train a reinforcement learning agent, you can generate code to deploy the optimal policy. You can generate:
- CUDA® code for policies relying on deep neural networks, using GPU Coder™ 
- C/C++ code for policies relying on tables, deep neural networks, or linear basis functions, using MATLAB® Coder™ 
Code generation is supported for policies using feedforward neural networks in any of the input paths, provided that all the used layers are supported. Code generation is not supported for policies containing a recurrent neural network (RNN).
To generate a policy evaluation function that selects an action based on a given
      observation, use generatePolicyFunction. You can generate code to deploy this policy function
      using GPU Coder or MATLAB
            Coder. This function also creates a data file which stores policy information (such as
      for example the trained neural network for the actor). The evaluation function loads this data
      file to properly initialize itself the first time it is called.
To generate a Simulink® policy evaluation block that selects an action based on a given observation, use
        generatePolicyBlock.
      You can generate code to deploy this policy block using Simulink
            Coder. This function also creates a data file which stores policy information (such as
      for example the trained neural network for the actor). The generated policy block loads this
      data file to properly initialize itself prior to simulation. You can use the block to simulate
      the policy and generate code for deployment purposes.
Note
The deployed policy might have an internal state. In this case, the internal state must be consistent when comparing the simulated and deployed policy.
Generate Code Using GPU Coder
If your trained optimal policy uses a deep neural network, you can generate CUDA code for the policy using GPU Coder. For more information on supported GPUs see GPU Computing Requirements (Parallel Computing Toolbox). There are several required and recommended prerequisite products for generating CUDA code for deep neural networks. For more information, see Installing Prerequisite Products (GPU Coder) and Setting Up the Prerequisite Products (GPU Coder).
Not all deep neural network layers support GPU code generation. For a list of supported layers, see Supported Networks, Layers, and Classes (GPU Coder). For more information and examples on GPU code generation, see Deep Learning with GPU Coder (GPU Coder).
Generate CUDA Code for Deep Neural Network Policy
As an example, generate GPU code for the policy gradient agent trained in Train PG Agent to Balance Discrete Cart-Pole System.
Load the trained agent.
load('MATLABCartpolePG.mat','agent')
Create a policy evaluation function for this agent.
generatePolicyFunction(agent)
This command creates the evaluatePolicy.m file, which contains
              the policy function, and the agentData.mat file, which contains the
              trained deep neural network actor. In general, for a given observation, the policy
              function evaluates a probability for each potential action using the actor network.
              Then, the policy function randomly selects an action based on these probabilities (for
              SAC agents the policy function samples an unbounded action using the mean and the
              standard deviation obtained from the network, and then applies bounds, scale and bias
              to get the correct action).
You can generate code for this network using GPU Coder. For example, you can generate a CUDA compatible MEX function.
Configure the codegen function to create a CUDA compatible C++ MEX function.
cfg = coder.gpuConfig('mex'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
Set an example input value for the policy evaluation function. To find the
              observation dimension, use the getObservationInfo function. In
              this case, the observations are in a four-element vector.
argstr = '{ones(4,1)}';Generate code using the codegen function.
codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');
This command generates the MEX function
              evaluatePolicy_mex.
Generate Code Using MATLAB Coder
You can generate C/C++ code for table, deep neural network, or linear basis function policies using MATLAB Coder.
Using MATLAB Coder, you can generate:
- C/C++ code for policies that use Q tables, value tables, or linear basis functions. For more information on general C/C++ code generation, see Generating Code (MATLAB Coder). 
- C++ code for policies that use deep neural networks. Note that code generation is not supported for continuous actions PG, AC, PPO, and SAC agents using a recurrent neural network (RNN). For a list of supported layers, see Networks and Layers Supported for Code Generation (MATLAB Coder). For more information, see Prerequisites for Deep Learning with MATLAB Coder (MATLAB Coder) and Deep Learning with MATLAB Coder (MATLAB Coder). 
Generate C Code for Deep Neural Network Policy Without Using Any Third-Party Library
As an example, generate C code without dependencies on third-party libraries for the policy gradient agent trained in Train PG Agent to Balance Discrete Cart-Pole System.
Load the trained agent.
load('MATLABCartpolePG.mat','agent')
Create a policy evaluation function for this agent.
generatePolicyFunction(agent)
This command creates the evaluatePolicy.m file, which contains
              the policy function, and the agentData.mat file, which contains the
              trained deep neural network actor. For a given observation, the policy function
              evaluates a probability for each potential action using the actor network. Then, the
              policy function randomly selects an action based on these probabilities (for SAC
              agents the policy function samples an unbounded action using the mean and the standard
              deviation obtained from the network, and then applies bounds, scale and bias to get
              the correct action).
Configure the codegen function to generate code suitable for
              building a MEX file.
cfg = coder.config('mex');On the configuration object, set the target language to C, and set
                DeepLearningConfig to 'none'. This option
              generates code without using any third-party library. 
cfg.TargetLang = 'C'; cfg.DeepLearningConfig = coder.DeepLearningConfig('none');
Set an example input value for the policy evaluation function. To find the
              observation dimension, use the getObservationInfo function. In
              this case, the observations are in a four-element vector.
argstr = '{ones(4,1)}';Generate code using the codegen function.
codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');
This command generates the C code for the policy gradient agent containing the deep neural network actor.
Generate C++ Code for Deep Neural Network Policy Using Third-Party Libraries
As an example, generate C++ code for the policy gradient agent trained in Train PG Agent to Balance Discrete Cart-Pole System using the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN).
Load the trained agent.
load('MATLABCartpolePG.mat','agent')
Create a policy evaluation function for this agent.
generatePolicyFunction(agent)
This command creates the evaluatePolicy.m file, which contains
              the policy function, and the agentData.mat file, which contains the
              trained deep neural network actor. For a given observation, the policy function
              evaluates a probability for each potential action using the actor network. Then, the
              policy function randomly selects an action based on these probabilities.
Configure the codegen function to generate code suitable for
              building a MEX file.
cfg = coder.config('mex');On the configuration object, set the target language to C++, and set
                DeepLearningConfig to the target library
                'mkldnn'. This option generates code using the Intel Math
              Kernel Library for Deep Neural Networks (Intel MKL-DNN). 
cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn');
Set an example input value for the policy evaluation function. To find the
              observation dimension, use the getObservationInfo function. In
              this case, the observations are in a four-element vector.
argstr = '{ones(4,1)}';Generate code using the codegen function.
codegen('-config','cfg','evaluatePolicy','-args',argstr,'-report');
This command generates the C++ code for the policy gradient agent containing the deep neural network actor.
Generate C Code for Q Table Policy
As an example, generate C code for the Q-learning agent trained in Train Reinforcement Learning Agent in Basic Grid World.
Load the trained agent.
load('basicGWQAgent.mat','qAgent')
Create a policy evaluation function for this agent.
generatePolicyFunction(qAgent)
This command creates the evaluatePolicy.m file, which contains
              the policy function, and the agentData.mat file, which contains the
              trained Q table value function. For a given observation, the policy function looks up
              the value function for each potential action using the Q table. Then, the policy
              function selects the action for which the value function is greatest.
Set an example input value for the policy evaluation function. To find the
              observation dimension, use the getObservationInfo function. In
              this case, there is a single one dimensional observation (belonging to a discrete set
              of possible values).
argstr = '{[1]}';Configure the codegen function to generate embeddable C code
              suitable for targeting a static library, and set the output folder to
                buildFolder.
cfg = coder.config('lib'); outFolder = 'buildFolder';
Generate C code using the codegen function.
codegen('-c','-d',outFolder,'-config','cfg',... 'evaluatePolicy','-args',argstr,'-report');
Deploy Trained Reinforcement Learning Policy as Microservice Docker Image
To deploy a trained RL policy as a microservice Docker® image, follow three steps.
- Package a MATLAB function that evaluates a reinforcement learning policy into a deployable archive. 
- Create a Docker image that contains the archive and a minimal MATLAB Runtime package. 
- Run the image in Docker and make calls to the service using any of the MATLAB Production Server™ client APIs. 
For an example on how to do this, see Deploy Trained Reinforcement Learning Policy as Microservice Docker Image (MATLAB Compiler SDK).