Deep Learning Code Generation on Intel Targets for Different Batch Sizes
This example shows how to use the codegen
command to generate code for an image classification application that uses deep learning on Intel® processors. The generated code uses the Intel Math Kernel Library for Deep Neural Networks (MKL-DNN). This example consists of two parts:
The first part shows how to generate a MEX function that accepts a batch of images as input.
The second part shows how to generate an executable that accepts a batch of images as input.
Prerequisites
Intel processor with support for Intel Advanced Vector Extensions 2 (Intel AVX2) instructions
Intel Math Kernel Library for Deep Neural Networks (MKL-DNN)
Environment variables for the compilers and libraries. For information on the supported versions of compilers, see Supported Compilers. For setting up the environment variables, see Prerequisites for Deep Learning with MATLAB Coder (MATLAB Coder).
This example is supported on Linux® , Windows® and Mac® platforms and not supported for MATLAB Online.
Download input video File
Download a sample video file.
if ~exist('./object_class.avi', 'file') url = 'https://www.mathworks.com/supportfiles/gpucoder/media/object_class.avi.zip'; websave('object_class.avi.zip',url); unzip('object_class.avi.zip'); end
Define the resnet_predict
Function
This example uses the DAG network ResNet-50 to show image classification on Intel desktops. A pretrained ResNet-50 model for MATLAB is available as part of the support package Deep Learning Toolbox Model for ResNet-50 Network.
The resnet_predict
function loads the ResNet-50 network into a persistent network object and then performs prediction on the input. Subsequent calls to the function reuse the persistent network object.
type resnet_predict
% Copyright 2020 The MathWorks, Inc. function out = resnet_predict(in) %#codegen % A persistent object mynet is used to load the series network object. At % the first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is % reused to call predict on inputs, avoiding reconstructing and reloading % the network object. persistent mynet; if isempty(mynet) % Call the function resnet50 that returns a DAG network % for ResNet-50 model. mynet = coder.loadDeepLearningNetwork('resnet50','resnet'); end % pass in input out = mynet.predict(in);
Generate MEX for resnet_predict
To generate a MEX function for the resnet_predict
function, use codegen
with a deep learning configuration object for the MKL-DNN library. Attach the deep learning configuration object to the MEX code generation configuration object that you pass to codegen
. Run the codegen
command and specify the input as a 4D matrix of size [224,224,3,|batchSize|]. This value corresponds to the input layer size of the ResNet-50 network.
batchSize = 5; cfg = coder.config('mex'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn'); codegen -config cfg resnet_predict -args {ones(224,224,3,batchSize,'single')} -report
Code generation successful: To view the report, open('codegen\mex\resnet_predict\html\report.mldatx')
Perform Prediction on a Batch of Images
Presuming the Object_class.avi video file is already downloaded. Create the videoReader object and read five frames using videoReader read function.Since batchSize is set to 5 read 5 images. Resize the batch of input images to size needed by resnet50 size expected by ResNet50 network.
videoReader = VideoReader('Object_class.avi'); imBatch = read(videoReader,[1 5]); imBatch = imresize(imBatch, [224,224]);
Call the generated resnet_predict_mex
function which outputs classification results for the inputs that you provide.
predict_scores = resnet_predict_mex(single(imBatch));
Get top 5 probability scores and their labels for each image in the batch.
[val,indx] = sort(transpose(predict_scores), 'descend'); scores = val(1:5,:)*100; net = resnet50; classnames = net.Layers(end).ClassNames; for i = 1:batchSize labels = classnames(indx(1:5,i)); disp(['Top 5 predictions on image, ', num2str(i)]); for j=1:5 disp([labels{j},' ',num2str(scores(j,i), '%2.2f'),'%']) end end
For predictions on the first image, map the top five prediction scores to words in the synset
dictionary.
fid = fopen('synsetWords.txt'); synsetOut = textscan(fid,'%s', 'delimiter', '\n'); synsetOut = synsetOut{1}; fclose(fid); [val,indx] = sort(transpose(predict_scores), 'descend'); scores = val(1:5,1)*100; top5labels = synsetOut(indx(1:5,1));
Display the top five classification labels on the image.
outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = imBatch(:,:,k,1); end
scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', 'w','FontSize',20, 'BoxColor', 'black'); srow = srow + 30; for k = 1:5 outputImage = insertText(outputImage, [scol, srow], [top5labels{k},' ',num2str(scores(k), '%2.2f'),'%'], 'TextColor', 'w','FontSize',15, 'BoxColor', 'black'); srow = srow + 25; end
imshow(outputImage);
Clear the persistent network object from memory.
clear mex;
Define the resnet_predict_exe
Entry-Point Function
To generate an executable from MATLAB code, define a new entry-point function resnet_predict_exe
. This function is similar to the previous entry-point function resent_predict
but, in addition, includes code for preprocessing and postprocessing. The API that resnet_predict_exe
uses is platform independent. This function accepts a video and the batch size as input arguments. These arguments are compile-time constants.
type resnet_predict_exe
% Copyright 2020 The MathWorks, Inc. function resnet_predict_exe(inputVideo,batchSize) %#codegen % A persistent object mynet is used to load the series network object. % At the first call to this function, the persistent object is constructed and % setup. When the function is called subsequent times, the same object is reused % to call predict on inputs, avoiding reconstructing and reloading the % network object. persistent mynet; if isempty(mynet) % Call the function resnet50 that returns a DAG network % for ResNet-50 model. mynet = coder.loadDeepLearningNetwork('resnet50','resnet'); end % Create video reader and video player objects % videoReader = VideoReader(inputVideo); depVideoPlayer = vision.DeployableVideoPlayer; % Read the classification label names % synsetOut = readImageClassLabels('synsetWords.txt'); i=1; % Read frames until end of video file % while ~(i+batchSize > (videoReader.NumFrames+1)) % Read and resize batch of frames as specified by input argument% reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i); % run predict on resized input images % predict_scores = mynet.predict(reSizedImagesBatch); % overlay the prediction scores on images and display % overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer) i = i+ batchSize; end release(depVideoPlayer); end function synsetOut = readImageClassLabels(classLabelsFile) % Read the classification label names from the file % % Inputs : % classLabelsFile - supplied by user % % Outputs : % synsetOut - cell array filled with 1000 image class labels synsetOut = cell(1000,1); fid = fopen(classLabelsFile); for i = 1:1000 synsetOut{i} = fgetl(fid); end fclose(fid); end function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i) % Read and resize batch of frames as specified by input argument% % % Inputs : % videoReader - Object used for reading the images from video file % batchSize - Number of images in batch to process. Supplied by user % i - index to track frames read from video file % % Outputs : % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize img = read(videoReader,[i (i+batchSize-1)]); reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img)); resizeTo = coder.const([224,224]); reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo); end function overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer) % Read and resize batch of frames as specified by input argument% % % Inputs : % predict_scores - classification results for given network % synsetOut - cell array filled with 1000 image class labels % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize % batchSize - Number of images in batch to process. Supplied by user % depVideoPlayer - Object for displaying results % % Outputs : % Predicted results overlayed on input images % sort the predicted scores % [val,indx] = sort(transpose(predict_scores), 'descend'); for j = 1:batchSize scores = val(1:5,j)*100; outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j); end % Overlay the results on image % scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', [255 255 255],'FontSize',20, 'BoxColor', [0 0 0]); srow = srow + 30; for k = 1:5 scoreStr = sprintf('%2.2f',scores(k)); outputImage = insertText(outputImage, [scol, srow], [synsetOut{indx(k,j)},' ',scoreStr,'%'], 'TextColor', [255 255 255],'FontSize',15, 'BoxColor', [0 0 0]); srow = srow + 25; end depVideoPlayer(outputImage); end end
Structure of the resnet_predict_exe
Function
The function resnet_predict_exe
contains four subsections that perform these actions:
Read the classification labels from supplied input text file
Read the input batch of images and resize them as needed by the network
Run inference on input image batch
Overlay the results on the images
For more information each of these steps, see the subsequent sections.
The readImageClassLabels
Function
This function accepts the synsetWords.txt
file as an input argument. It reads the classification labels and populates a cell array.
function synsetOut = readImageClassLabels(classLabelsFile) % Read the classification label names from the file % % Inputs : % classLabelsFile - supplied by user % % Outputs : % synsetOut - cell array filled with 1000 image class labels
synsetOut = cell(1000,1); fid = fopen(classLabelsFile); for i = 1:1000 synsetOut{i} = fgetl(fid); end fclose(fid); end
The readImageInputBatch
Function
This function reads and resizes the images from the video input file that is passed to the function as an input argument. It reads the specified input images and resizes them to 224x224x3 which is the size the resnet50 network expects.
function reSizedImagesBatch = readImageInputBatch(videoReader,batchSize,i) % Read and resize batch of frames as specified by input argument% % % Inputs : % videoReader - Object used for reading the images from video file % batchSize - Number of images in batch to process. Supplied by user % i - index to track frames read from video file % % Outputs : % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize
img = read(videoReader,[i (i+batchSize-1)]); reSizedImagesBatch = coder.nullcopy(ones(224,224,3,batchSize,'like',img)); resizeTo = coder.const([224,224]); reSizedImagesBatch(:,:,:,:) = imresize(img,resizeTo); end
The mynet.predict
Function
This function accepts the resized batch of images as input and returns the prediction results.
% run predict on resized input images % predict_scores = mynet.predict(reSizedImagesBatch);
The overlayResultsOnImages
Function
This function accepts the prediction results and sorts them in descending order. It overlays these results on the input images and displays them.
function overlayResultsOnImages(predict_scores,synsetOut,reSizedImagesBatch,batchSize,depVideoPlayer) % Read and resize batch of frames as specified by input argument% % % Inputs : % predict_scores - classification results for given network % synsetOut - cell array filled with 1000 image class labels % reSizedImagesBatch - Batch of images resized to 224x224x3xbatchsize % batchSize - Number of images in batch to process. Supplied by user % depVideoPlayer - Object for displaying results % % Outputs : % Predicted results overlayed on input images
% sort the predicted scores % [val,indx] = sort(transpose(predict_scores), 'descend');
for j = 1:batchSize scores = val(1:5,j)*100; outputImage = zeros(224,400,3, 'uint8'); for k = 1:3 outputImage(:,177:end,k) = reSizedImagesBatch(:,:,k,j); end
% Overlay the results on image % scol = 1; srow = 1; outputImage = insertText(outputImage, [scol, srow], 'Classification with ResNet-50', 'TextColor', [255 255 255],'FontSize',20, 'BoxColor', [0 0 0]); srow = srow + 30; for k = 1:5 scoreStr = sprintf('%2.2f',scores(k)); outputImage = insertText(outputImage, [scol, srow], [synsetOut{indx(k,j)},' ',scoreStr,'%'], 'TextColor', [255 255 255],'FontSize',15, 'BoxColor', [0 0 0]); srow = srow + 25; end
depVideoPlayer(outputImage); end end
Build and Run Executable
Create a code configuration object for generating an executable. Attach a deep learning configuration object to it. Set the batchSize
and inputVideoFile
variables.
If you do not intend to create a custom C++ main function and use the generated example C++ main instead, set the GenerateExampleMain
parameter to 'GenerateCodeAndCompile'
. Also, disable cfg.EnableOpenMP to make sure there are no openmp library dependencies when you run your executable from the desktop terminal.
cfg = coder.config('exe'); cfg.TargetLang = 'C++'; cfg.DeepLearningConfig = coder.DeepLearningConfig('mkldnn'); batchSize = 5; inputVideoFile = 'object_class.avi'; cfg.GenerateExampleMain = 'GenerateCodeAndCompile'; cfg.EnableOpenMP = 0;
Run the codegen
command to build the executable. Run the generated executable resnet_predict_exe either at the MATLAB command line or at the desktop terminal.
codegen -config cfg resnet_predict_exe -args {coder.Constant(inputVideoFile), coder.Constant(batchSize)} -report system('./resnet_predict_exe')