Main Content

estimatePerformance

Class: dlhdl.ProcessorConfig
Package: dlhdl

Retrieve layer-level latencies and performance by using estimatePerformance method

Description

estimatePerformance(processorConfigObject, network) returns the layer-level latencies and network performance for the object specified by the network argument.

performance = estimatePerformance(processorConfigObject,network) returns a table containing the network object layer-level latencies and performance.

example

performance = estimatePerformance(processorConfigObject,network,Name,Value) returns a table containing the network object layer-level latencies and performance, with one or more arguments specified by optional name-value pair arguments.

Input Arguments

expand all

Processor configuration, specified as a dlhdl.ProcessorConfig object.

Name of network object for performance estimate.

Example: estimatePerformance(snet)

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Number of frames to consider for the calculation of performance estimation, specified as a positive number integer.

Example: 'FrameCount',10

Output Arguments

expand all

Network object performance for the ProcessorConfig object, returned as a table.

Examples

expand all

  1. Create a file in your current working folder called getLogoNetwork.m. In the file, enter:

    function net = getLogoNetwork
     if ~isfile('LogoNet.mat')
            url = 'https://www.mathworks.com/supportfiles/gpucoder/cnn_models/logo_detection/LogoNet.mat';
            websave('LogoNet.mat',url);
        end
        data = load('LogoNet.mat');
        net  = data.convnet;
    end
  2. Create a dlhdl.ProcessorConfig object.

    snet = getLogoNetwork;
    hPC = dlhdl.ProcessorConfig;
  3. To retrieve the layer-level latencies and performance for the LogoNet network, call the estimatePerformance method.

    hPC.estimatePerformance(snet)
    ### Notice: The layer 'imageinput' of type 'ImageInputLayer' is split into an image input layer 'imageinput' and an addition layer 'imageinput_norm' for normalization on hardware.
    ### Notice: The layer 'softmax' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
    ### Notice: The layer 'classoutput' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
    
    
                  Deep Learning Processor Estimator Performance Results
    
                       LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                             -------------             -------------              ---------        ---------       ---------
    Network                   38810151                  0.19405                       1           38810151              5.2
        ____imageinput_norm     216472                  0.00108 
        ____conv_1             6829224                  0.03415 
        ____maxpool_1          3705912                  0.01853 
        ____conv_2            10454501                  0.05227 
        ____maxpool_2          1173810                  0.00587 
        ____conv_3             9364533                  0.04682 
        ____maxpool_3          1229970                  0.00615 
        ____conv_4             1384564                  0.00692 
        ____maxpool_4            24450                  0.00012 
        ____fc_1               2644886                  0.01322 
        ____fc_2               1692534                  0.00846 
        ____fc_3                 89295                  0.00045 
     * The clock frequency of the DL processor is: 200MHz

Estimate the performance of the ResNet-18 network for multiple frames by using the dlhdl.ProcessorConfig object.

Load the ResNet-18 network and save it to net

net = resnet18;

Create a dlhdl.ProcessorConfig object and save to hPC

hPC = dlhdl.ProcessorConfig;

Retrieve layer level latencies and performance in frames per second (FPS) for multiple frames by using the estimatePerformance method with FrameNumber as an optional input argument.

hPC.estimatePerformance(net,'FrameCount',10);
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'data' of type 'ImageInputLayer' is split into an image input layer 'data', an addition layer 'data_norm_add', and a multiplication layer 'data_norm' for hardware normalization.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.


              Deep Learning Processor Estimator Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   21328236                  0.10664                      10          211208850              9.5
    ____data_norm_add       210750                  0.00105 
    ____data_norm           210750                  0.00105 
    ____conv1              2164124                  0.01082 
    ____pool1               515064                  0.00258 
    ____res2a_branch2a      966221                  0.00483 
    ____res2a_branch2b      966221                  0.00483 
    ____res2a               210750                  0.00105 
    ____res2b_branch2a      966221                  0.00483 
    ____res2b_branch2b      966221                  0.00483 
    ____res2b               210750                  0.00105 
    ____res3a_branch1       540861                  0.00270 
    ____res3a_branch2a      540749                  0.00270 
    ____res3a_branch2b      919117                  0.00460 
    ____res3a               105404                  0.00053 
    ____res3b_branch2a      919117                  0.00460 
    ____res3b_branch2b      919117                  0.00460 
    ____res3b               105404                  0.00053 
    ____res4a_branch1       503405                  0.00252 
    ____res4a_branch2a      509261                  0.00255 
    ____res4a_branch2b      905421                  0.00453 
    ____res4a                52724                  0.00026 
    ____res4b_branch2a      905421                  0.00453 
    ____res4b_branch2b      905421                  0.00453 
    ____res4b                52724                  0.00026 
    ____res5a_branch1       744525                  0.00372 
    ____res5a_branch2a      751693                  0.00376 
    ____res5a_branch2b     1415373                  0.00708 
    ____res5a                26368                  0.00013 
    ____res5b_branch2a     1415373                  0.00708 
    ____res5b_branch2b     1415373                  0.00708 
    ____res5b                26368                  0.00013 
    ____pool5                54594                  0.00027 
    ____fc1000              207351                  0.00104 
 * The clock frequency of the DL processor is: 200MHz

Tips

To obtain the performance estimation for a dlquantizer object, set the dlhdl.ProcessorConfig object ProcessorDataType to int8.

Version History

Introduced in R2021a