Define Custom Deep Learning Layer with Multiple Inputs

If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of built-in layers, see List of Deep Learning Layers.

To define a custom deep learning layer, you can use the template provided in this example, which takes you through the following steps:

1. Name the layer — Give the layer a name so that you can use it in MATLAB®.

2. Declare the layer properties — Specify the properties of the layer, including learnable parameters and state parameters.

3. Create a constructor function (optional) — Specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then at creation, the software initializes the `Name`, `Description`, and `Type` properties with `[]` and sets the number of layer inputs and outputs to 1.

4. Create forward functions — Specify how data passes forward through the layer (forward propagation) at prediction time and at training time.

5. Create reset state function (optional) — Specify how to reset state parameters.

6. Create a backward function (optional) — Specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation). If you do not specify a backward function, then the forward functions must support `dlarray` objects.

This example shows how to create a weighted addition layer, which is a layer with multiple inputs and learnable parameter, and use it in a convolutional neural network. A weighted addition layer scales and adds inputs from multiple neural network layers element-wise.

Intermediate Layer Template

Copy the intermediate layer template into a new file in MATLAB. This template gives the structure of an intermediate layer class definition. It outlines:

• The optional `properties` blocks for the layer properties, learnable parameters, and state parameters.

• The layer constructor function.

• The `predict` function and the optional `forward` function.

• The optional `resetState` function for layers with state properties.

• The optional `backward` function.

```classdef myLayer < nnet.layer.Layer % ... % & nnet.layer.Formattable ... % (Optional) % & nnet.layer.Acceleratable % (Optional) properties % (Optional) Layer properties. % Declare layer properties here. end properties (Learnable) % (Optional) Layer learnable parameters. % Declare learnable parameters here. end properties (State) % (Optional) Layer state parameters. % Declare state parameters here. end properties (Learnable, State) % (Optional) Nested dlnetwork objects with both learnable % parameters and state parameters. % Declare nested networks with learnable and state parameters here. end methods function layer = myLayer() % (Optional) Create a myLayer. % This function must have the same name as the class. % Define layer constructor function here. end function [Z,state] = predict(layer,X) % Forward input data through the layer at prediction time and % output the result and updated state. % % Inputs: % layer - Layer to forward propagate through % X - Input data % Outputs: % Z - Output of layer forward function % state - (Optional) Updated layer state % % - For layers with multiple inputs, replace X with X1,...,XN, % where N is the number of inputs. % - For layers with multiple outputs, replace Z with % Z1,...,ZM, where M is the number of outputs. % - For layers with multiple state parameters, replace state % with state1,...,stateK, where K is the number of state % parameters. % Define layer predict function here. end function [Z,state,memory] = forward(layer,X) % (Optional) Forward input data through the layer at training % time and output the result, the updated state, and a memory % value. % % Inputs: % layer - Layer to forward propagate through % X - Layer input data % Outputs: % Z - Output of layer forward function % state - (Optional) Updated layer state % memory - (Optional) Memory value for custom backward % function % % - For layers with multiple inputs, replace X with X1,...,XN, % where N is the number of inputs. % - For layers with multiple outputs, replace Z with % Z1,...,ZM, where M is the number of outputs. % - For layers with multiple state parameters, replace state % with state1,...,stateK, where K is the number of state % parameters. % Define layer forward function here. end function layer = resetState(layer) % (Optional) Reset layer state. % Define reset state function here. end function [dLdX,dLdW,dLdSin] = backward(layer,X,Z,dLdZ,dLdSout,memory) % (Optional) Backward propagate the derivative of the loss % function through the layer. % % Inputs: % layer - Layer to backward propagate through % X - Layer input data % Z - Layer output data % dLdZ - Derivative of loss with respect to layer % output % dLdSout - (Optional) Derivative of loss with respect % to state output % memory - Memory value from forward function % Outputs: % dLdX - Derivative of loss with respect to layer input % dLdW - (Optional) Derivative of loss with respect to % learnable parameter % dLdSin - (Optional) Derivative of loss with respect to % state input % % - For layers with state parameters, the backward syntax must % include both dLdSout and dLdSin, or neither. % - For layers with multiple inputs, replace X and dLdX with % X1,...,XN and dLdX1,...,dLdXN, respectively, where N is % the number of inputs. % - For layers with multiple outputs, replace Z and dlZ with % Z1,...,ZM and dLdZ,...,dLdZM, respectively, where M is the % number of outputs. % - For layers with multiple learnable parameters, replace % dLdW with dLdW1,...,dLdWP, where P is the number of % learnable parameters. % - For layers with multiple state parameters, replace dLdSin % and dLdSout with dLdSin1,...,dLdSinK and % dLdSout1,...,dldSoutK, respectively, where K is the number % of state parameters. % Define layer backward function here. end end end```

Name Layer and Specify Superclasses

First, give the layer a name. In the first line of the class file, replace the existing name `myLayer` with `weightedAdditionLayer`.

```classdef weightedAdditionLayer < nnet.layer.Layer % ... % & nnet.layer.Formattable ... % (Optional) % & nnet.layer.Acceleratable % (Optional) ... end```

If you do not specify a backward function, then the layer functions, by default, receive unformatted `dlarray` objects as input. To specify that the layer receives formatted `dlarray` objects as input and also outputs formatted `dlarray` objects, also inherit from the `nnet.layer.Formattable` class when defining the custom layer.

The layer functions support acceleration, so also inherit from `nnet.layer.Acceleratable`. For more information about accelerating custom layer functions, see Custom Layer Function Acceleration. The layer does not require formattable inputs, so remove the optional `nnet.layer.Formattable` superclass.

```classdef weightedAdditionLayer < nnet.layer.Layer ... & nnet.layer.Acceleratable ... end```

Next, rename the `myLayer` constructor function (the first function in the `methods` section) so that it has the same name as the layer.

``` methods function layer = weightedAdditionLayer() ... end ... end```

Save the Layer

Save the layer class file in a new file named `weightedAdditionLayer.m`. The file name must match the layer name. To use the layer, you must save the file in the current folder or in a folder on the MATLAB path.

Declare Properties and Learnable Parameters

Declare the layer properties in the `properties` section and declare learnable parameters by listing them in the `properties (Learnable)` section.

By default, custom intermediate layers have these properties. Do not declare these properties in the `properties` section.

PropertyDescription
`Name`Layer name, specified as a character vector or a string scalar. For `Layer` array input, the `trainNetwork`, `assembleNetwork`, `layerGraph`, and `dlnetwork` functions automatically assign names to layers with name `''`.
`Description`

One-line description of the layer, specified as a string scalar or a character vector. This description appears when the layer is displayed in a `Layer` array.

If you do not specify a layer description, then the software displays the layer class name.

`Type`

Type of the layer, specified as a character vector or a string scalar. The value of `Type` appears when the layer is displayed in a `Layer` array.

If you do not specify a layer type, then the software displays the layer class name.

`NumInputs`Number of inputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets `NumInputs` to the number of names in `InputNames`. The default value is 1.
`InputNames`Input names of the layer, specified as a cell array of character vectors. If you do not specify this value and `NumInputs` is greater than 1, then the software automatically sets `InputNames` to `{'in1',...,'inN'}`, where `N` is equal to `NumInputs`. The default value is `{'in'}`.
`NumOutputs`Number of outputs of the layer, specified as a positive integer. If you do not specify this value, then the software automatically sets `NumOutputs` to the number of names in `OutputNames`. The default value is 1.
`OutputNames`Output names of the layer, specified as a cell array of character vectors. If you do not specify this value and `NumOutputs` is greater than 1, then the software automatically sets `OutputNames` to `{'out1',...,'outM'}`, where `M` is equal to `NumOutputs`. The default value is `{'out'}`.

If the layer has no other properties, then you can omit the `properties` section.

Tip

If you are creating a layer with multiple inputs, then you must set either the `NumInputs` or `InputNames` properties in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the `NumOutputs` or `OutputNames` properties in the layer constructor.

A weighted addition layer does not require any additional properties, so you can remove the `properties` section.

A weighted addition layer has only one learnable parameter, the weights. Declare this learnable parameter in the `properties (Learnable)` section and call the parameter `Weights`.

``` properties (Learnable) % Layer learnable parameters % Scaling coefficients Weights end```

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

The weighted addition layer constructor function requires two inputs: the number of inputs to the layer and the layer name. This number of inputs to the layer specifies the size of the learnable parameter `Weights`. Specify two input arguments named `numInputs` and `name` in the `weightedAdditionLayer` function. Add a comment to the top of the function that explains the syntax of the function.

``` function layer = weightedAdditionLayer(numInputs,name) % layer = weightedAdditionLayer(numInputs,name) creates a % weighted addition layer and specifies the number of inputs % and the layer name. ... end```

Initialize Layer Properties

Initialize the layer properties, including learnable parameters, in the constructor function. Replace the comment ```% Layer constructor function goes here``` with code that initializes the layer properties.

Set the `NumInputs` property to the input argument `numInputs`.

``` % Set number of inputs. layer.NumInputs = numInputs;```

Set the `Name` property to the input argument `name`.

``` % Set layer name. layer.Name = name;```

Give the layer a one-line description by setting the `Description` property of the layer. Set the description to describe the type of layer and its size.

``` % Set layer description. layer.Description = "Weighted addition of " + numInputs + ... " inputs";```

A weighted addition layer multiplies each layer input by the corresponding coefficient in `Weights` and adds the resulting values together. Initialize the learnable parameter `Weights` to be a random vector of size 1-by-`numInputs`. `Weights` is a property of the layer object, so you must assign the vector to `layer.Weights`.

``` % Initialize layer weights layer.Weights = rand(1,numInputs);```

View the completed constructor function.

``` function layer = weightedAdditionLayer(numInputs,name) % layer = weightedAdditionLayer(numInputs,name) creates a % weighted addition layer and specifies the number of inputs % and the layer name. % Set number of inputs. layer.NumInputs = numInputs; % Set layer name. layer.Name = name; % Set layer description. layer.Description = "Weighted addition of " + numInputs + ... " inputs"; % Initialize layer weights. layer.Weights = rand(1,numInputs); end```

With this constructor function, the command `weightedAdditionLayer(3,'add')` creates a weighted addition layer with three inputs and the name `'add'`.

Create Forward Functions

Create the layer forward functions to use at prediction time and training time.

Create a function named `predict` that propagates the data forward through the layer at prediction time and outputs the result.

The `predict` function syntax depends on the type of layer.

• `Z = predict(layer,X)` forwards the input data `X` through the layer and outputs the result `Z`, where `layer` has a single input and a single output.

• `[Z,state] = predict(layer,X)` also outputs the updated state parameter `state`, where `layer` has a single state parameter.

You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:

• For layers with multiple inputs, replace `X` with `X1,...,XN`, where `N` is the number of inputs. The `NumInputs` property must match `N`.

• For layers with multiple outputs, replace `Z` with `Z1,...,ZM`, where `M` is the number of outputs. The `NumOutputs` property must match `M`.

• For layers with multiple state parameters, replace `state` with `state1,...,stateK`, where `K` is the number of state parameters.

Tip

If the number of inputs to the layer can vary, then use `varargin` instead of `X1,…,XN`. In this case, `varargin` is a cell array of the inputs, where `varargin{i}` corresponds to `Xi`.

If the number of outputs can vary, then use `varargout` instead of `Z1,…,ZN`. In this case, `varargout` is a cell array of the outputs, where `varargout{j}` corresponds to `Zj`.

Tip

If the custom layer has a `dlnetwork` object for a learnable parameter, then in the `predict` function of the custom layer, use the `predict` function for the `dlnetwork`. When you do so, the `dlnetwork` object `predict` function uses the appropriate layer operations for prediction.

Because a weighted addition layer has only one output and a variable number of inputs, the syntax for `predict` for a weighted addition layer is ```Z = predict(layer,varargin)```, where `varargin{i}` corresponds to `Xi` for positive integers `i` less than or equal to `NumInputs`.

By default, the layer uses `predict` as the forward function at training time. To use a different forward function at training time, or retain a value required for the backward function, you must also create a function named `forward`.

The dimensions of the inputs depend on the type of data and the output of the connected layers:

Layer InputInput SizeObservation Dimension
Feature vectorsc-by-N, where c corresponds to the number of channels and N is the number of observations2
2-D imagesh-by-w-by-c-by-N, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, and N is the number of observations4
3-D imagesh-by-w-by-d-by-c-by-N, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images, respectively, and N is the number of observations5
Vector sequencesc-by-N-by-S, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length2
2-D image sequencesh-by-w-by-c-by-N-by-S, where h, w, and c correspond to the height, width, and number of channels of the images, respectively, N is the number of observations, and S is the sequence length4
3-D image sequencesh-by-w-by-d-by-c-by-N-by-S, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images, respectively, N is the number of observations, and S is the sequence length5

For layers that output sequences, the layers can output sequences of any length or output data with no time dimension. Note that when you train a network that outputs sequences using the `trainNetwork` function, the lengths of the input and output sequences must match.

The `forward` function propagates the data forward through the layer at training time and also outputs a memory value.

The `forward` function syntax depends on the type of layer:

• `Z = forward(layer,X)` forwards the input data `X` through the layer and outputs the result `Z`, where `layer` has a single input and a single output.

• `[Z,state] = forward(layer,X)` also outputs the updated state parameter `state`, where `layer` has a single state parameter.

• `[__,memory] = forward(layer,X)` also returns a memory value for a custom `backward` function using any of the previous syntaxes. If the layer has both a custom `forward` function and a custom `backward` function, then the forward function must return a memory value.

You can adjust the syntaxes for layers with multiple inputs, multiple outputs, or multiple state parameters:

• For layers with multiple inputs, replace `X` with `X1,...,XN`, where `N` is the number of inputs. The `NumInputs` property must match `N`.

• For layers with multiple outputs, replace `Z` with `Z1,...,ZM`, where `M` is the number of outputs. The `NumOutputs` property must match `M`.

• For layers with multiple state parameters, replace `state` with `state1,...,stateK`, where `K` is the number of state parameters.

Tip

If the number of inputs to the layer can vary, then use `varargin` instead of `X1,…,XN`. In this case, `varargin` is a cell array of the inputs, where `varargin{i}` corresponds to `Xi`.

If the number of outputs can vary, then use `varargout` instead of `Z1,…,ZN`. In this case, `varargout` is a cell array of the outputs, where `varargout{j}` corresponds to `Zj`.

Tip

If the custom layer has a `dlnetwork` object for a learnable parameter, then in the `forward` function of the custom layer, use the `forward` function of the `dlnetwork` object. When you do so, the `dlnetwork` object `forward` function uses the appropriate layer operations for training.

The forward function of a weighted addition layer is

`$f\left({X}^{\left(1\right)},\dots ,{X}^{\left(n\right)}\right)=\sum _{i=1}^{n}{W}_{i}{X}^{\left(i\right)}$`

where X(1), …, X(n) correspond to the layer inputs and W1,…,Wn are the layer weights.

Implement the forward function in `predict`. In `predict`, the output `Z` corresponds to $f\left({X}^{\left(1\right)},\dots ,{X}^{\left(n\right)}\right)$. The weighted addition layer does not require memory or a different forward function for training, so you can remove the `forward` function from the class file. Add a comment to the top of the function that explains the syntaxes of the function.

Tip

If you preallocate arrays using functions such as `zeros`, then you must ensure that the data types of these arrays are consistent with the layer function inputs. To create an array of zeros of the same data type as another array, use the `"like"` option of `zeros`. For example, to initialize an array of zeros of size `sz` with the same data type as the array `X`, use `Z = zeros(sz,"like",X)`.

``` function Z = predict(layer, varargin) % Z = predict(layer, X1, ..., Xn) forwards the input data X1, % ..., Xn through the layer and outputs the result Z. X = varargin; W = layer.Weights; % Initialize output X1 = X{1}; sz = size(X1); Z = zeros(sz,'like',X1); % Weighted addition for i = 1:layer.NumInputs Z = Z + W(i)*X{i}; end end```

Because the `predict` function uses only functions that support `dlarray` objects, defining the `backward` function is optional. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support.

Completed Layer

View the completed layer class file.

```classdef weightedAdditionLayer < nnet.layer.Layer ... & nnet.layer.Acceleratable % Example custom weighted addition layer. properties (Learnable) % Layer learnable parameters % Scaling coefficients Weights end methods function layer = weightedAdditionLayer(numInputs,name) % layer = weightedAdditionLayer(numInputs,name) creates a % weighted addition layer and specifies the number of inputs % and the layer name. % Set number of inputs. layer.NumInputs = numInputs; % Set layer name. layer.Name = name; % Set layer description. layer.Description = "Weighted addition of " + numInputs + ... " inputs"; % Initialize layer weights. layer.Weights = rand(1,numInputs); end function Z = predict(layer, varargin) % Z = predict(layer, X1, ..., Xn) forwards the input data X1, % ..., Xn through the layer and outputs the result Z. X = varargin; W = layer.Weights; % Initialize output X1 = X{1}; sz = size(X1); Z = zeros(sz,'like',X1); % Weighted addition for i = 1:layer.NumInputs Z = Z + W(i)*X{i}; end end end end```

GPU Compatibility

If the layer forward functions fully support `dlarray` objects, then the layer is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs and return outputs of type `gpuArray` (Parallel Computing Toolbox).

Many MATLAB built-in functions support `gpuArray` (Parallel Computing Toolbox) and `dlarray` input arguments. For a list of functions that support `dlarray` objects, see List of Functions with dlarray Support. For a list of functions that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox). To use a GPU for deep learning, you must also have a supported GPU device. For information on supported devices, see GPU Support by Release (Parallel Computing Toolbox). For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).

In this example, the MATLAB functions used in `predict` all support `dlarray` objects, so the layer is GPU compatible.

Check Validity of Layer with Multiple Inputs

Check the layer validity of the custom layer `weightedAdditionLayer`.

Define a custom weighted addition layer. To create this layer, save the file `weightedAdditionLayer.m` in the current folder.

Create an instance of the layer and check its validity using `checkLayer`. Specify the valid input sizes to be the typical sizes of a single observation for each input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.

Specify the typical size of the input of an observation and set `'ObservationDimension'` to 4.

```layer = weightedAdditionLayer(2,'add'); validInputSize = {[24 24 20],[24 24 20]}; checkLayer(layer,validInputSize,'ObservationDimension',4)```
```Skipping GPU tests. No compatible GPU device found. Skipping code generation compatibility tests. To check validity of the layer for code generation, specify the 'CheckCodegenCompatibility' and 'ObservationDimension' options. Running nnet.checklayer.TestLayerWithoutBackward .......... ........ Done nnet.checklayer.TestLayerWithoutBackward __________ Test Summary: 18 Passed, 0 Failed, 0 Incomplete, 10 Skipped. Time elapsed: 0.32426 seconds. ```

Here, the function does not detect any issues with the layer.

Use Custom Weighted Addition Layer in Network

You can use a custom layer in the same way as any other layer in Deep Learning Toolbox. This section shows how to create and train a network for digit classification using the weighted addition layer you created earlier.

`[XTrain,TTrain] = digitTrain4DArrayData;`

Define a custom weighted addition layer. To create this layer, save the file `weightedAdditionLayer.m` in the current folder.

Create a layer graph including the custom layer `weightedAdditionLayer`.

```layers = [ imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer('Name',"relu1") convolution2dLayer(3,20,'Padding',1) reluLayer convolution2dLayer(3,20,'Padding',1) reluLayer weightedAdditionLayer(2,"add") fullyConnectedLayer(10) softmaxLayer classificationLayer]; lgraph = layerGraph(layers); lgraph = connectLayers(lgraph,"relu1","add/in2");```

Set the training options and train the network.

```options = trainingOptions("adam",'MaxEpochs',10); net = trainNetwork(XTrain,TTrain,lgraph,options);```
```Training on single CPU. Initializing input data normalization. |========================================================================================| | Epoch | Iteration | Time Elapsed | Mini-batch | Mini-batch | Base Learning | | | | (hh:mm:ss) | Accuracy | Loss | Rate | |========================================================================================| | 1 | 1 | 00:00:01 | 12.50% | 2.2951 | 0.0010 | | 2 | 50 | 00:00:14 | 72.66% | 0.7879 | 0.0010 | | 3 | 100 | 00:00:26 | 89.84% | 0.2982 | 0.0010 | | 4 | 150 | 00:00:38 | 94.53% | 0.1559 | 0.0010 | | 6 | 200 | 00:00:52 | 99.22% | 0.0391 | 0.0010 | | 7 | 250 | 00:01:06 | 99.22% | 0.0363 | 0.0010 | | 8 | 300 | 00:01:20 | 100.00% | 0.0195 | 0.0010 | | 9 | 350 | 00:01:33 | 99.22% | 0.0127 | 0.0010 | | 10 | 390 | 00:01:43 | 100.00% | 0.0039 | 0.0010 | |========================================================================================| Training finished: Max epochs completed. ```

View the weights learned by the weighted addition layer.

`net.Layers(8).Weights`
```ans = 1x2 single row vector 1.0226 1.0009 ```

Evaluate the network performance by predicting on new data and calculating the accuracy.

```[XTest,TTest] = digitTest4DArrayData; YPred = classify(net,XTest); accuracy = mean(TTest==YPred)```
```accuracy = 0.9898 ```