La traduction de cette page n'est pas à jour. Cliquez ici pour voir la dernière version en anglais.

Quantification, projection et élagage

Compresser un réseau de neurones profond en effectuant une quantification, une projection ou un élagage

Utilisez Deep Learning Toolbox™ avec le support package Deep Learning Toolbox Model Quantization Library pour réduire l’empreinte mémoire et les exigences de calcul d’un réseau de neurones profond en :

Élaguant des filtres des couches de convolution avec une approximation de Taylor du premier ordre. Vous pouvez ensuite générer le code C/C++ ou CUDA^® à partir de ce réseau élagué.
Projetant des couches en réalisant une analyse en composantes principales (PCA) sur les activations des couches en utilisant un jeu de données représentatif des données d’apprentissage et en appliquant des projections linéaires sur les paramètres entraînables des couches. Les passages vers l’avant d’un réseau de neurones profond projeté sont typiquement plus rapides lorsque vous déployez le réseau sur un hardware embarqué avec une génération de code C/C++ sans bibliothèque.
Quantifiant les poids, biais et activations des couches vers des types de données entiers mis à l’échelle à précision réduite. Vous pouvez ensuite générer du code C/C++, CUDA ou HDL à partir de ce réseau quantifié.
Pour la génération de code C/C++ et CUDA, le software génère du code pour un réseau de neurones profond à convolution en quantifiant les poids, biais et activations des couches de convolution sur des entiers mis à l'échelle sur 8 bits. La quantification est effectuée en fournissant le fichier de résultat de calibrage produit par la fonction calibrate à la commande codegen (MATLAB Coder).
La génération de code ne supporte pas les réseaux de neurones profonds quantifiés produits par la fonction quantize.

Fonctions

développer tout

Élagage

`taylorPrunableNetwork`	Network that can be pruned by using first-order Taylor approximation (depuis R2022a)
`forward`	Compute deep learning network output for training (depuis R2019b)
`predict`	Compute deep learning network output for inference (depuis R2019b)
`updatePrunables`	Remove filters from prunable layers based on importance scores (depuis R2022a)
`updateScore`	Compute and accumulate Taylor-based importance scores for pruning (depuis R2022a)
`dlnetwork`	Deep learning neural network (depuis R2019b)

Projection

`compressNetworkUsingProjection`	Compress neural network using projection (depuis R2022b)
`neuronPCA`	Principal component analysis of neuron activations (depuis R2022b)

Quantification

`dlquantizer`	Quantize a deep neural network to 8-bit scaled integer data types (depuis R2020a)
`dlquantizationOptions`	Options for quantizing a trained deep neural network (depuis R2020a)
`calibrate`	Simulate and collect ranges of a deep neural network (depuis R2020a)
`quantize`	Quantize deep neural network (depuis R2022a)
`validate`	Quantize and validate a deep neural network (depuis R2020a)
`quantizationDetails`	Display quantization details for a neural network (depuis R2022a)
`estimateNetworkMetrics`	Estimate network metrics for specific layers of a neural network (depuis R2022a)
`equalizeLayers`	Equalize layer parameters of deep neural network (depuis R2022b)

Applications

Deep Network Quantizer

Quantize deep neural network to 8-bit scaled integer data types (depuis R2020a)

Rubriques

Élagage

Parameter Pruning and Quantization of Image Classification Network
Use parameter pruning and quantization to reduce network size.
Prune Image Classification Network Using Taylor Scores
This example shows how to reduce the size of a deep neural network using Taylor pruning.
Prune Filters in a Detection Network Using Taylor Scores
This example shows how to reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

Projection

Compress Neural Network Using Projection
This example shows how to compress a neural network using projection and principal component analysis.

Quantification du Deep Learning

Quantization of Deep Neural Networks
Understand effects of quantization and how to visualize dynamic ranges of network convolution layers.
Quantization Workflow Prerequisites
Products required for the quantization of deep learning networks.

Quantification pour un GPU cible

Generate INT8 Code for Deep Learning Networks (GPU Coder)
Quantize and generate code for a pretrained convolutional neural network.
Quantize Residual Network Trained for Image Classification and Generate CUDA Code
This example shows how to quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.
Quantize Layers in Object Detectors and Generate CUDA Code
This example shows how to generate CUDA® code for an SSD vehicle detector and a YOLO v2 vehicle detector that performs inference computations in 8-bit integers for the convolutional layers.

Quantification pour un FPGA cible

Quantize Network for FPGA Deployment (Deep Learning HDL Toolbox)
Reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types.
Classify Images on FPGA Using Quantized Neural Network (Deep Learning HDL Toolbox)
This example shows how to use Deep Learning HDL Toolbox™ to deploy a quantized deep convolutional neural network (CNN) to an FPGA.
Classify Images on FPGA by Using Quantized GoogLeNet Network (Deep Learning HDL Toolbox)
This example show how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image.

Quantification pour un CPU cible

Generate int8 Code for Deep Learning Networks (MATLAB Coder)
Quantize and generate code for a pretrained convolutional neural network.
Generate INT8 Code for Deep Learning Network on Raspberry Pi (MATLAB Coder)
Generate code for deep learning network that performs inference computations in 8-bit integers.

Exemples présentés

Prune Image Classification Network Using Taylor Scores

Reduce the size of a deep neural network using Taylor pruning. By using the taylorPrunableNetwork function to remove convolution layer filters, you can reduce the overall network size and increase the inference speed.

Ouvrir le live script

Prune Filters in a Detection Network Using Taylor Scores

Reduce network size and increase inference speed by pruning convolutional filters in a you only look once (YOLO) v3 object detection network.

Ouvrir le live script

Compress Neural Network Using Projection

Compress a neural network using projection and principal component analysis.

Ouvrir le live script

Quantize Residual Network Trained for Image Classification and Generate CUDA Code

Quantize the learnable parameters in the convolution layers of a deep learning neural network that has residual connections and has been trained for image classification with CIFAR-10 data.

Ouvrir le live script