Main Content

compressNetworkUsingProjection

Compress neural network using projection

Since R2022b

    Description

    The compressNetworkUsingProjection function reduces the number of learnable parameters of layers by performing principal component analysis (PCA) of the neuron activations using a data set representative of the training data and then projects the learnable parameters into the subspace that maintains the highest variance in neuron activations.

    Forward passes of a projected deep neural network are typically faster when you deploy the network to embedded hardware using library-free C or C++ code generation.

    If you prune or quantize your network, then use compression using projection after pruning and before quantization. Network compression using projection supports projecting LSTM layers only.

    example

    netProjected = compressNetworkUsingProjection(net,mbq) compresses the dlnetwork object net by replacing layers with projected layers. The function compresses layers by performing principal component analysis (PCA) of the neuron activations using the data in the minibatchqueue object mbq and projects learnable parameters into the subspace that maintains the highest variance in neuron activations. This function requires the Deep Learning Toolbox™ Model Quantization Library support package. This support package is a free add-on that you can download using the Add-On Explorer. Alternatively, see Deep Learning Toolbox Model Quantization Library.

    netProjected = compressNetworkUsingProjection(net,X1,...,XN) compresses the network using the data in the dlarray objects X1,...,XN, where N is the number of network inputs.

    netProjected = compressNetworkUsingProjection(net,npca) compresses the network using the neuronPCA object npca. The PCA step can be computationally intensive. If you expect to compress the same network multiple times (for example, when exploring different levels of compression), then you can perform the PCA step up front using a neuronPCA object.

    [netProjected, info] = compressNetworkUsingProjection(___) also returns the structure info that contains information about the reduction of learnable parameters and the explained variance achieved during compression.

    example

    [netProjected, info] = compressNetworkUsingProjection(___,Name=Value) specifies additional options using one or more name-value arguments.

    Examples

    collapse all

    Load the pretrained network in dlnetJapaneseVowels and the training data.

    load dlnetJapaneseVowels
    XTrain = japaneseVowelsTrainData;

    Create a mini-batch queue containing the training data. To create a mini-batch queue from in-memory data, convert the sequences to an array datastore.

    adsXTrain = arrayDatastore(XTrain,OutputType="same");

    Create the minibatchqueue object.

    • Specify a mini-batch size of 16.

    • Preprocess the mini-batches using the preprocessMiniBatchPredictors function, listed in the Mini-Batch Predictors Preprocessing Function section of the example.

    • Specify that the output data has format "CTB" (channel, time, batch).

    miniBatchSize = 16;
    
    mbq = minibatchqueue(adsXTrain, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@preprocessMiniBatchPredictors, ...
        MiniBatchFormat="CTB");

    Compress the network.

    netProjected = compressNetworkUsingProjection(net,mbq);
    Compressed network has 82.4% fewer learnable parameters.
    Projected layers explain on average 96.6% of layer activation variance.
    

    Mini-Batch Predictors Preprocessing Function

    The preprocessMiniBatchPredictors function preprocesses a mini-batch of predictors by extracting the sequence data from the input cell array and truncating them along the second dimension so that they have the same length.

    Note: Do not pad sequence data when doing the PCA step for projection as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    function X = preprocessMiniBatchPredictors(dataX)
    
    X = padsequences(dataX,2,Length="shortest");
    
    end

    To determine the maximum possible compression, set the LearnablesReductionGoal option to 1 or the ExplainedVarianceGoal option to 0.

    Load the pretrained network in dlnetJapaneseVowels and the training data.

    load dlnetJapaneseVowels
    XTrain = japaneseVowelsTrainData;

    Create a mini-batch queue containing the training data. To create a mini-batch queue from in-memory data, convert the sequences to an array datastore.

    adsXTrain = arrayDatastore(XTrain,OutputType="same");

    Create the minibatchqueue object.

    • Specify a mini-batch size of 16.

    • Preprocess the mini-batches using the preprocessMiniBatchPredictors function, listed in the Mini-Batch Predictors Preprocessing Function section of the example.

    • Specify that the output data has format "CTB" (channel, time, batch).

    miniBatchSize = 16;
    
    mbq = minibatchqueue(adsXTrain, ...
        MiniBatchSize=miniBatchSize, ...
        MiniBatchFcn=@preprocessMiniBatchPredictors, ...
        MiniBatchFormat="CTB");

    Compress the network. To determine the maximum possible compression, set the LearnablesReductionGoal option to 1.

    [netProjected,info] = compressNetworkUsingProjection(net,mbq,LearnablesReductionGoal=1);
    Compressed network has 95.2% fewer learnable parameters.
    Projected layers explain on average 33.8% of layer activation variance.
    

    View the proportion of total number of network learnables removed by inspecting the LearnablesReduction property of the information structure.

    info.LearnablesReduction
    ans = 0.9518
    

    Mini-Batch Predictors Preprocessing Function

    The preprocessMiniBatchPredictors function preprocesses a mini-batch of predictors by extracting the sequence data from the input cell array and truncating them along the second dimension so that they have the same length.

    Note: Do not pad sequence data when doing the PCA step for projection as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    function X = preprocessMiniBatchPredictors(dataX)
    
    X = padsequences(dataX,2,Length="shortest");
    
    end

    Input Arguments

    collapse all

    Neural network, specified as an initialized dlnetwork object.

    Mini-batch queue that outputs data for each input of the network, specified as a minibatchqueue object.

    The PCA step typically works best when using the full training set. However, any dataset that is representative of the training data distribution suffices. The input data must contain two or more observations and sequences must contain two or more time steps.

    Note

    Do not pad sequence as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    Input data, specified as a formatted dlarray.

    For more information about dlarray formats, see the fmt input argument of dlarray.

    The PCA step typically works best when using the full training set. However, any dataset that is representative of the training data distribution suffices. The input data must contain two or more observations and sequences must contain two or more time steps.

    Note

    Do not pad sequence as this can negatively impact the analysis. Instead, truncate mini-batches of data to have the same length or use mini-batches of size 1.

    Neuron principal component analysis, specified as a neuronPCA object.

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Example: netProjected = compressNetworkUsingProjection(net,mbq,VerbosityLevel="off") compresses the network using projection and disables the command line display.

    Names of layers to compress, specified as a string array, cell array of character vectors, or a character vector containing a single layer name.

    The software, by default, compress all the layers in the network that support projection. Network compression using projection supports projecting LSTM layers only.

    Data Types: string | cell

    Target proportion of neuron activation variance explained by the remaining principal components of each projected layer, specified as a value between 0 (maximum compression) and 1 (project layers with minimal compression).

    If you specify the ExplainedVarianceGoal option, then you must not specify the LearnablesReductionGoal option.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Target proportion of total number of network learnables to remove, specified as a nonnegative scalar less than or equal to 1.

    If you specify the LearnablesReductionGoal option, then you must not specify the ExplainedVarianceGoal option. If you do not specify the LearnablesReductionGoal option, then the function compresses the network using the ExplainedVarianceGoal option.

    Data Types: single | double | int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64

    Verbosity level, specified as one of these values:

    • "summary" — Display a summary of the compression algorithm.

    • "steps" — Display information about the steps of the compression algorithm.

    • "iterations" — Display information about the iterations of the compression algorithm.

    • "off" — Do not display information.

    Output Arguments

    collapse all

    Projected network, returned as a dlnetwork object.

    After you compress the network using projection, you can fine-tune the network to help regain predictive accuracy lost by the compression process. For an example, see Compress Neural Network Using Projection.

    Projection information, returned as a structure with these fields:

    • LearnablesReduction — proportion of total number of network learnables removed

    • ExplainedVariance — proportion of neuron activation variance explained by principal components

    Extended Capabilities

    Version History

    Introduced in R2022b