compressNetworkUsingProjection

Compress neural network using projection

Since R2022b

collapse all in page

Syntax

netProjected = compressNetworkUsingProjection(net,mbq)

netProjected = compressNetworkUsingProjection(net,X1,...,XN)

netProjected = compressNetworkUsingProjection(net,npca)

[netProjected, info] = compressNetworkUsingProjection(___)

[netProjected, info] = compressNetworkUsingProjection(___,Name=Value)

Description

The compressNetworkUsingProjection function reduces the number of learnable parameters of layers by performing principal component analysis (PCA) of the neuron activations using a data set representative of the training data and then projects the learnable parameters into the subspace that maintains the highest variance in neuron activations. In some cases, this operation is equivalent to replacing layers with networks containing two or more layers with fewer learnable parameters.

Depending on the network, projection configuration, and code generation libraries used (including library-free code generation), forward passes of a projected deep neural network can be faster when you deploy the network to embedded hardware.

If you prune or quantize your network, then use compression using projection after pruning and before quantization.

netProjected = compressNetworkUsingProjection(net,mbq) compresses the dlnetwork object net by replacing layers with projected layers. The function compresses layers by performing principal component analysis (PCA) of the neuron activations using the data in the minibatchqueue object mbq and projects learnable parameters into the subspace that maintains the highest variance in neuron activations. This feature requires the Deep Learning Toolbox™ Model Compression Library support package. This support package is a free add-on that you can download using the Add-On Explorer. Alternatively, see Deep Learning Toolbox Model Compression Library.

example

netProjected = compressNetworkUsingProjection(net,X1,...,XN) compresses the network using the data in the dlarray objects X1,...,XN, where N is the number of network inputs.

netProjected = compressNetworkUsingProjection(net,npca) compresses the network using the neuronPCA object npca. The PCA step can be computationally intensive. If you expect to compress the same network multiple times (for example, when exploring different levels of compression), then you can perform the PCA step up front using a neuronPCA object.

[netProjected, info] = compressNetworkUsingProjection(___) also returns the structure info that contains information about the projected layers, the reduction of learnable parameters, and the explained variance achieved during compression.

[netProjected, info] = compressNetworkUsingProjection(___,Name=Value) specifies additional options using one or more name-value arguments.

Examples

collapse all

Compress Network Using Projection

This example uses:

Open Live Script

Load the pretrained network in dlnetJapaneseVowels and the training data in JapaneseVowelsTrainData.

load dlnetJapaneseVowels
load JapaneseVowelsTrainData

Create a mini-batch queue containing the training data. To create a mini-batch queue from in-memory data, convert the sequences to an array datastore.

adsXTrain = arrayDatastore(XTrain,OutputType="same");

Create the minibatchqueue object.

Specify a mini-batch size of 16.
Preprocess the mini-batches using the preprocessMiniBatchPredictors function, listed in the Mini-Batch Predictors Preprocessing Function section of the example.
Specify that the output data has format "CTB" (channel, time, batch).

mbq = minibatchqueue(adsXTrain, ...
    MiniBatchSize=16, ...
    MiniBatchFcn=@preprocessMiniBatchPredictors, ...
    MiniBatchFormat="CTB");

Compress the network.

[netProjected,info] = compressNetworkUsingProjection(net,mbq);

Compressed network has 83.4% fewer learnable parameters.
Projection compressed 2 layers: "lstm","fc"

View the network layers.

netProjected.Layers

ans = 
  4×1 Layer array with layers:

     1   'sequenceinput'   Sequence Input    Sequence input with 12 dimensions
     2   'lstm'            Projected Layer   Projected LSTM with 100 hidden units
     3   'fc'              Projected Layer   Projected fully connected layer with output size 9
     4   'softmax'         Softmax           softmax

View the projected LSTM layer. The LearnablesReduction property shows the proportion of learnables removed in the layer. The Network property contains the neural network that represents the projection.

netProjected.Layers(2)

ans = 
  ProjectedLayer with properties:

                   Name: 'lstm'
          OriginalClass: 'nnet.cnn.layer.LSTMLayer'
    LearnablesReduction: 0.8408
              InputSize: 12
             OutputSize: 100

   Hyperparameters
     InputProjectorSize: 8
    OutputProjectorSize: 7

   Learnable Parameters
                Network: [1×1 dlnetwork]

   State Parameters
                Network: [1×1 dlnetwork]

   Network Learnable Parameters
     Network/lstm/InputWeights      400×8 dlarray
     Network/lstm/RecurrentWeights  400×7 dlarray
     Network/lstm/Bias              400×1 dlarray
     Network/lstm/InputProjector    12×8  dlarray
     Network/lstm/OutputProjector   100×7 dlarray

   Network State Parameters
     Network/lstm/HiddenState  100×1 dlarray
     Network/lstm/CellState    100×1 dlarray

  Show all properties

Mini-Batch Predictors Preprocessing Function

The preprocessMiniBatchPredictors function preprocesses a mini-batch of predictors by extracting the sequence data from the input cell array and truncating them along the second dimension so that they have the same length.

Note: Do not pad sequence data when doing the PCA step for projection as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

function X = preprocessMiniBatchPredictors(dataX)

X = padsequences(dataX,2,Length="shortest");

end

Input Arguments

collapse all

`net` — Neural network
initialized `dlnetwork` object

Neural network, specified as an initialized dlnetwork object.

`mbq` — Mini-batch queue
`minibatchqueue` object

Mini-batch queue that outputs data for each input of the network, specified as a minibatchqueue object.

The PCA step typically works best when using the full training set. However, any dataset that is representative of the training data distribution suffices. The input data must contain two or more observations and sequences must contain two or more time steps.

Note

Do not pad sequence as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

`X1,...,XN` — Input data
formatted `dlarray`

Input data, specified as a formatted dlarray.

For more information about dlarray formats, see the fmt input argument of dlarray.

Note

Do not pad sequence as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

`npca` — Neuron principal component analysis
`neuronPCA` object

Neuron principal component analysis, specified as a neuronPCA object.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: netProjected = compressNetworkUsingProjection(net,mbq,VerbosityLevel="off") compresses the network using projection and disables the command line display.

`LayerNames` — Names of layers to compress
string array | cell array of character vectors | character vector

Names of layers to compress, specified as a string array, cell array of character vectors, or a character vector containing a single layer name.

The software, by default, compress all the layers in the network that support projection.

The compressNetworkUsingProjection function supports projecting these layers:

Data Types: string | cell

`ExplainedVarianceGoal` — Target proportion of neuron activation variance explained by principal components
`0.95` (default) | nonnegative scalar less than or equal to 1

Target proportion of neuron activation variance explained by the remaining principal components of each projected layer, specified as a value between 0 (maximum compression) and 1 (project layers with minimal compression).

If you specify the ExplainedVarianceGoal option, then you must not specify the LearnablesReductionGoal option.

`LearnablesReductionGoal` — Target proportion of total number of network learnables to remove
nonnegative scalar less than or equal to 1

Target proportion of total number of network learnables to remove, specified as a nonnegative scalar less than or equal to 1.

If you specify the LearnablesReductionGoal option, then you must not specify the ExplainedVarianceGoal option. If you do not specify the LearnablesReductionGoal option, then the function compresses the network using the ExplainedVarianceGoal option.

If LearnablesReductionGoal is greater than the maximum possible reduction in learnables, then the function removes the maximum possible proportion of learnables. Use the neuronPCA function to determine the possible range of reduction in learnables.

If LearnablesReductionGoal is smaller than the maximum possible reduction in learnables, then the function removes at least the proportion of learnables specified by LearnablesReductionGoal. If removing a greater proportion of learnables does not reduce the explained variance, then the function automatically removes a higher proportion of learnables. For example, if you specify a learnables reduction goal of 0.2, and if the explained variance is the same for learnables reductions between 0.2 and 0.5, then the function removes 50% of learnables.

`VerbosityLevel` — Verbosity level
`"summary"` (default) | `"steps"` | `"iterations"` | `"off"`

Verbosity level, specified as one of these values:

"summary" — Display a summary of the compression algorithm.
"steps" — Display information about the steps of the compression algorithm.
"iterations" — Display information about the iterations of the compression algorithm.
"off" — Do not display information.

`UnpackProjectedLayers` — Flag to unpack projected layers
`0` (`false`) (default) | `1` (`true`)

Since R2023b

Flag to unpack projected layers, specified as one of these values:

0 (false) — Do not unpack projected layers. The function replaces projectable layers with ProjectedLayer objects.
1 (true) — Unpack projected layers. The function replaces projectable layers with the network that is equivalent to the projection.

Output Arguments

collapse all

`netProjected` — Projected network
`dlnetwork` object

Projected network, returned as a dlnetwork object.

After you compress the network using projection, you can fine-tune the network to help regain predictive accuracy lost by the compression process. For an example, see Compress Neural Network Using Projection.

`info` — Projection information
structure

Projection information, returned as a structure with these fields:

LearnablesReduction — Proportion of total number of network learnables removed
ExplainedVariance — Proportion of neuron activation variance explained by principal components
LayerNames (since R2023b) — Names of projected layers

Tips

Code generation does not support ProjectedLayer objects. To replace ProjectedLayer objects in a neural network with the equivalent neural network that represents the projection, use the unpackProjectedLayers function or set the UnpackProjectedLayers option of the compressNetworkUsingProjection function to 1 (true).
To determine the maximum possible compression, open your network in Deep Network Designer, then click Analyze for Compression.

Algorithms

collapse all

Projected Layer

To compress a deep learning network, you can use projected layers. A projected layer is a type of deep learning layer that enables compression by reducing the number of stored learnable parameters. The layer introduces learnable projector matrices Q, replaces multiplications of the form $W x$ , where W is a learnable matrix, with the multiplication $W Q Q^{⊤} x$ , and stores Q and $W' = W Q$ instead of storing W. Projecting x into a lower dimensional space using Q typically requires less memory to store the learnable parameters and can have similarly strong prediction accuracy.

For some types of layers, you can represent a projected layer as a neural network containing two or more layers with fewer learnable parameters. For example, you can represent a projected convolution layer as three convolution layers that perform the input projection, convolution, and the output projection operations independently. When you compress a network using the compressNetworkUsingProjection function, the software replaces layers that support projection with ProjectedLayer objects that contain the equivalent neural network. To replace ProjectedLayer objects in a neural network with the equivalent neural network that represents the projection, use the unpackProjectedLayers function or set the UnpackProjectedLayers option of the compressNetworkUsingProjection function to 1 (true).

The compressNetworkUsingProjection function supports projecting these layers:

The compressNetworkUsingProjection function replaces projectable layers with ProjectedLayer objects. A ProjectedLayer object contains information about the projection operation and contains the neural network that represents the projection.

The neural network that represents the projection depends on the type of layer:

Original Layer	Network
`convolution1dLayer` (since R2024b)	Network containing two or three `convolution1dLayer` objects
`convolution2dLayer`	Network containing two or three `convolution2dLayer` objects
`fullyConnectedLayer`	Network containing two `fullyConnectedLayer` objects
`lstmLayer`	Network containing a single `lstmProjectedLayer` object
`gruLayer`	Network containing a single `gruProjectedLayer` object

References

[1] "Compressing Neural Networks Using Network Projection." Accessed July 20, 2023. https://www.mathworks.com/company/technical-articles/compressing-neural-networks-using-network-projection.html.

Extended Capabilities

expand all

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

The compressNetworkUsingProjection function supports GPU array input with these usage notes and limitations:

This function runs on the GPU if any of these conditions are met:
- Any of the values of the network learnable parameters inside net.Learnables.Value are dlarray objects with underlying data of type gpuArray
- The input argument mbq outputs dlarray objects with underlying data of type gpuArray
- The input arguments X1,...,XN are dlarray objects with underlying data of type gpuArray

For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).

Version History

Introduced in R2022b

expand all

R2024b: `compressNetworkUsingProjection` supports projecting 1-D convolution layers

The compressNetworkUsingProjection function now supports projecting convolution1dLayer objects.

R2023b: `compressNetworkUsingProjection` supports projecting convolution, fully connected, and GRU layers

The compressNetworkUsingProjection function now supports projecting these layers:

R2023b: Projection information output contains projected layer names

The output info structure now has the field LayerNames that contains the names of the projected layers.

R2023b: `compressNetworkUsingProjection` replaces LSTM layers with `ProjectedLayer` objects

Starting in R2023b, the compressNetworkUsingProjection function replaces LSTM layers with ProjectedLayer objects with a network that contains a single lstmProjectedLayer object. In previous versions, the function replaces LSTM layers with lstmProjectedLayer objects directly.

To reproduce the previous behavior, replace the ProjectedLayer objects with their networks using the unpackProjectedLayers function or set the UnpackProjectedLayers option of the compressNetworkUsingProjection function to 1 (true).

compressNetworkUsingProjection

Syntax

Description

Examples

Compress Network Using Projection

Input Arguments

`net` — Neural network
initialized `dlnetwork` object

`mbq` — Mini-batch queue
`minibatchqueue` object

`X1,...,XN` — Input data
formatted `dlarray`

`npca` — Neuron principal component analysis
`neuronPCA` object

Name-Value Arguments

`LayerNames` — Names of layers to compress
string array | cell array of character vectors | character vector

`ExplainedVarianceGoal` — Target proportion of neuron activation variance explained by principal components
`0.95` (default) | nonnegative scalar less than or equal to 1

`LearnablesReductionGoal` — Target proportion of total number of network learnables to remove
nonnegative scalar less than or equal to 1

`VerbosityLevel` — Verbosity level
`"summary"` (default) | `"steps"` | `"iterations"` | `"off"`

`UnpackProjectedLayers` — Flag to unpack projected layers
`0` (`false`) (default) | `1` (`true`)

Output Arguments

`netProjected` — Projected network
`dlnetwork` object

`info` — Projection information
structure

Tips

Algorithms

Projected Layer

References

Extended Capabilities

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024b: `compressNetworkUsingProjection` supports projecting 1-D convolution layers

R2023b: `compressNetworkUsingProjection` supports projecting convolution, fully connected, and GRU layers

R2023b: Projection information output contains projected layer names

R2023b: `compressNetworkUsingProjection` replaces LSTM layers with `ProjectedLayer` objects

See Also

Topics

compressNetworkUsingProjection

Syntax

Description

Examples

Compress Network Using Projection

Input Arguments

net — Neural network initialized dlnetwork object

mbq — Mini-batch queue minibatchqueue object

X1,...,XN — Input data formatted dlarray

npca — Neuron principal component analysis neuronPCA object

Name-Value Arguments

LayerNames — Names of layers to compress string array | cell array of character vectors | character vector

ExplainedVarianceGoal — Target proportion of neuron activation variance explained by principal components 0.95 (default) | nonnegative scalar less than or equal to 1

LearnablesReductionGoal — Target proportion of total number of network learnables to remove nonnegative scalar less than or equal to 1

VerbosityLevel — Verbosity level "summary" (default) | "steps" | "iterations" | "off"

UnpackProjectedLayers — Flag to unpack projected layers 0 (false) (default) | 1 (true)

Output Arguments

netProjected — Projected network dlnetwork object

info — Projection information structure

Tips

Algorithms

Projected Layer

References

Extended Capabilities

GPU Arrays Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

Version History

R2024b: compressNetworkUsingProjection supports projecting 1-D convolution layers

R2023b: compressNetworkUsingProjection supports projecting convolution, fully connected, and GRU layers

R2023b: Projection information output contains projected layer names

R2023b: compressNetworkUsingProjection replaces LSTM layers with ProjectedLayer objects

See Also

Topics

`net` — Neural network
initialized `dlnetwork` object

`mbq` — Mini-batch queue
`minibatchqueue` object

`X1,...,XN` — Input data
formatted `dlarray`

`npca` — Neuron principal component analysis
`neuronPCA` object

`LayerNames` — Names of layers to compress
string array | cell array of character vectors | character vector

`ExplainedVarianceGoal` — Target proportion of neuron activation variance explained by principal components
`0.95` (default) | nonnegative scalar less than or equal to 1

`LearnablesReductionGoal` — Target proportion of total number of network learnables to remove
nonnegative scalar less than or equal to 1

`VerbosityLevel` — Verbosity level
`"summary"` (default) | `"steps"` | `"iterations"` | `"off"`

`UnpackProjectedLayers` — Flag to unpack projected layers
`0` (`false`) (default) | `1` (`true`)

`netProjected` — Projected network
`dlnetwork` object

`info` — Projection information
structure

GPU Arrays
Accelerate code by running on a graphics processing unit (GPU) using Parallel Computing Toolbox™.

R2024b: `compressNetworkUsingProjection` supports projecting 1-D convolution layers

R2023b: `compressNetworkUsingProjection` supports projecting convolution, fully connected, and GRU layers

R2023b: `compressNetworkUsingProjection` replaces LSTM layers with `ProjectedLayer` objects