Main Content

Deploy Transfer Learning Network for Lane Detection

This example shows how to create, compile, and deploy a dlhdl.Workflow object that has a lane detection convolutional neural network as the network object, by using the Deep learning HDL Toolbox™. The network can detect and output lane marker boundaries as the network object using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device.


  • Xilinx ZCU102 SoC development kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

Load the Pretrained SeriesNetwork

To load the pretrained series network lanenet, enter:

snet = getLaneDetectionNetwork;

Normalize the Input Layer

To normalize the input layer by modifying its type, enter:

inputlayer = imageInputLayer(snet.Layers(1).InputSize, 'Normalization','none');
snet = SeriesNetwork([inputlayer; snet.Layers(2:end)]);

To view the layers of the pretrained series network, enter:

% The saved network contains 23 layers including input, convolution, ReLU, cross channel normalization,
% max pool, fully connected, and the regression output layers.


Create Target Object

Create a target object that has a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG AND Ethernet.

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Generate Bitstream to Run Network

The lane detection network consists of multiple Cross Channel Normalization layers. To support this layer on hardware, the 'LRNBlockGeneration' property of the conv module needs to be turned on in the bitstream used for FPGA inference. The shipping zcu102_single bitstream does not have this property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a dlhdl.Workflow object for inference.

When creating a dlhdl.ProcessorConfig object for an existing shipping bitstream, make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SoC board and the date type is single. Update the processor configuration with 'LRNBlockGeneration' turned on and 'SegmentationBlockGeneration' turned off. Turn the latter off to fit the Deep Learning IP on the FPGA and avoid overutilization of resources.

% hPC = dlhdl.ProcessorConfig('Bitstream', 'zcu102_single');
% hPC.setModuleProperty('conv', 'LRNBlockGeneration', 'on');
% hPC.setModuleProperty('conv', 'SegmentationBlockGeneration', 'off');
% dlhdl.buildProcessor(hPC)

If targeting the Xilinx ZC706 board, replace 'zcu102_single' with 'zc706_single' in the first command above.

To learn how to use the generated bitstream file, see Generate Custom Bitstream.

Create Workflow Object

Create an object of the dlhdl.Workflow class. When you create the class, specify the network and the bitstream name. Make sure to use the generated bitstream which enables processing of Cross Channel Normalization layers on the FPGA. Specify the saved pretrained lanenet neural network, snet, as the network.

hW = dlhdl.Workflow('network', snet, 'Bitstream', 'dlprocessor.bit','Target',hTarget);

Compile the Lanenet series Network

To compile the lanenet series network, run the compile function of the dlhdl.Workflow object.

dn = hW.compile;
          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "24.0 MB"        
    "OutputResultOffset"        "0x01800000"     "4.0 MB"         
    "SystemBufferOffset"        "0x01c00000"     "28.0 MB"        
    "InstructionDataOffset"     "0x03800000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x03c00000"     "16.0 MB"        
    "FCWeightDataOffset"        "0x04c00000"     "148.0 MB"       
    "EndOffset"                 "0x0e000000"     "Total: 224.0 MB"

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Loading weights to FC Processor.
### 13% finished, current time is 28-Jun-2020 12:36:09.
### 25% finished, current time is 28-Jun-2020 12:36:10.
### 38% finished, current time is 28-Jun-2020 12:36:11.
### 50% finished, current time is 28-Jun-2020 12:36:12.
### 63% finished, current time is 28-Jun-2020 12:36:13.
### 75% finished, current time is 28-Jun-2020 12:36:14.
### 88% finished, current time is 28-Jun-2020 12:36:14.
### FC Weights loaded. Current time is 28-Jun-2020 12:36:15

Run Prediction for Example Video

Run the demoOnVideo function for the dlhdl.Workflow class object. This function loads the example video, executes the predict function of the dlhdl.Workflow object, and then plots the result.

### Finished writing input activations.
### Running single input activations.

              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   24904175                  0.11320                       1           24904217              8.8
    conv_module            8967009                  0.04076 
        conv1              1396633                  0.00635 
        norm1               623003                  0.00283 
        pool1               226855                  0.00103 
        conv2              3410044                  0.01550 
        norm2               378531                  0.00172 
        pool2               233635                  0.00106 
        conv3              1139419                  0.00518 
        conv4               892918                  0.00406 
        conv5               615897                  0.00280 
        pool5                50189                  0.00023 
    fc_module             15937166                  0.07244 
        fc6               15819257                  0.07191 
        fcLane1             117125                  0.00053 
        fcLane2                782                  0.00000 
 * The clock frequency of the DL processor is: 220MHz