Contenu principal

QNN HTP Predict

Predict responses of a QNN model or QNN context binary for the HTP (NPU) backend

Since R2025b

Libraries:
Embedded Coder Support Package for Qualcomm Hexagon Processors / Hexagon / QNN

Description

The QNN HTP Predict block predicts responses of a deep learning network represented as a QNN model or QNN context binary for the HTP (NPU) backend of Qualcomm® AI Direct Engine, based on the given input data.

To add the block to your Simulink model, open the model (for example, myQNNModel), and enter this command at the MATLAB prompt:

add_block("mwqnnlib/QNN HTP Predict","myQNNModel/QNN HTP Predict")

The QNN HTP Predict block allows you to select a QNN model as a compiled shared object (.so) for running on x86-based host. For the target, you can select either a compiled shared object (.so) or a QNN context binary file (.bin) that are optimized to run on HTP (NPU) backend.

The code generated using this block can be deployed to one of these boards that are available under the Hardware board parameter in Configuration Parameters:

  • Qualcomm Android Board

  • Qualcomm Linux Board

  • Qualcomm Hexagon Android Board, with Processor Version cDSP

  • Qualcomm Hexagon Linux Board, with Processor Version cDSP

The block also provides the option to dequantize outputs to single-precision, if required.

Ports

Input

expand all

The input tensor used for inference with the selected QNN model, represented as an n-D array, in accordance with the Input layer size of the QNN model.

The QNN HTP Predict block supports a multiple-input multiple-output tensor with a maximum of 4 dimensions, but the batch size must always be 1. For example, if the input layer of the original deep learning network is 128-by-128-by-3, the input dimension should be either 128-by-128-by-3 or 1-by-128-by-128-by-3.

If the leading dimensions are 1 (singleton dimensions), these dimensions can often be removed without affecting compatibility. For example, if the input layer of an AI model expects an input size of 1-by-1-by-128-by-3, the input can be provided as 1-by-1-by-128-by-3 or simply 128-by-3. This is because dimensions of size 1 can be broadcast to match the expected shape.

The QNN HTP Predict block accepts either floating-point input or fixed-point input. The input datatype must be as per the QNN network's input layer datatype. Additionally, the input can be floating-point even for quantized QNN network.

Data Types: single | half | int8 | int16 | int32 | uint8 | uint16 | uint32

Output

expand all

The output tensor used for inference with the selected QNN model, represented as an n-D array, in accordance with the QNN output layer. The output datatypes match the QNN network's output layers' datatypes.

Data Types: single | half | int8 | int16 | int32 | uint8 | uint16 | uint32

Parameters

expand all

Select the format of deep learning network optimized to run on HTP (NPU) backend on the target.

Click Browse and select the QNN model (compiled shared object (.so) or .dll) to perform inference. For details on creating an QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Note

This parameter does not appear if the host operating system is Linux® and you select BINARY for the Deep Learning network format parameter. In this case, the block uses the same context binary file (which you select using the Context binary file parameter) as the host(x86) QNN-model file also.

Click Browse and select the QNN model (compiled shared object (.so)) on the target to perform inference on the host. For details on creating an QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Dependencies

This parameter appears only if you select QNN-model for the Deep Learning network format parameter.

Click Browse and select the QNN context binary file (.bin) on the target to perform inference on the host. For details on creating an binary file to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Select the checkbox to dequantize the block's output. Enabling this option results in output data type always being single, irrespective of the deep learning neural network's output layer data type.

Extended Capabilities

expand all

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

Introduced in R2025b