QNN HTP Predict

Predict responses of a QNN model or QNN context binary for the HTP (NPU) backend

Since R2025b

Libraries:
Embedded Coder Support Package for Qualcomm Hexagon Processors / Hexagon / QNN

Description

The QNN HTP Predict block predicts responses of a deep learning network represented as a QNN model or QNN context binary for the HTP (NPU) backend of Qualcomm^® AI Direct Engine, based on the given input data.

To add the block to your Simulink model, open the model (for example, myQNNModel), and enter this command at the MATLAB prompt:

add_block("mwqnnlib/QNN HTP Predict","myQNNModel/QNN HTP Predict")

The QNN HTP Predict block allows you to select a QNN model as a compiled shared object (.so) for running on x86-based host. For the target, you can select either a compiled shared object (.so) or a QNN context binary file (.bin) that are optimized to run on HTP (NPU) backend.

The code generated using this block can be deployed to one of these boards that are available under the Hardware board parameter in Configuration Parameters:

Qualcomm Android Board
Qualcomm Linux Board
Qualcomm Hexagon Android Board, with Processor Version cDSP
Qualcomm Hexagon Linux Board, with Processor Version cDSP

The block also provides the option to dequantize outputs to single-precision, if required.

Ports

Input

expand all

Port_1 — Input signal to predict response
`n`–D array

The input tensor used for inference with the selected QNN model, represented as an n-D array, in accordance with the Input layer size of the QNN model.

The QNN HTP Predict block supports a multiple-input multiple-output tensor with a maximum of 4 dimensions, but the batch size must always be 1. For example, if the input layer of the original deep learning network is 128-by-128-by-3, the input dimension should be either 128-by-128-by-3 or 1-by-128-by-128-by-3.

If the leading dimensions are 1 (singleton dimensions), these dimensions can often be removed without affecting compatibility. For example, if the input layer of an AI model expects an input size of 1-by-1-by-128-by-3, the input can be provided as 1-by-1-by-128-by-3 or simply 128-by-3. This is because dimensions of size 1 can be broadcast to match the expected shape.

The QNN HTP Predict block accepts either floating-point input or fixed-point input. The input datatype must be as per the QNN network's input layer datatype. Additionally, the input can be floating-point even for quantized QNN network.

Output

expand all

Port_2 — Output signal after inference
`n`–D array

The output tensor used for inference with the selected QNN model, represented as an n-D array, in accordance with the QNN output layer. The output datatypes match the QNN network's output layers' datatypes.

Parameters

expand all

Deep Learning network format — Format of deep learning network optimized to run on HTP (NPU) backend on the target
`QNN-Model` | `Binary`

Select the format of deep learning network optimized to run on HTP (NPU) backend on the target.

Host(x86) QNN-model file — QNN model on host (x-86) to predict responses
`<filename>`.so | `<filename>`.dll

Click Browse and select the QNN model (compiled shared object (.so) or .dll) to perform inference. For details on creating an QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Note

This parameter does not appear if the host operating system is Linux^® and you select BINARY for the Deep Learning network format parameter. In this case, the block uses the same context binary file (which you select using the Context binary file parameter) as the host(x86) QNN-model file also.

Target QNN-model file — Target QNN model on HTP (NPU) backend
`<filename>`.so

Click Browse and select the QNN model (compiled shared object (.so)) on the target to perform inference on the host. For details on creating an QNN model to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Dependencies

This parameter appears only if you select QNN-model for the Deep Learning network format parameter.

Context binary file — QNN context binary file on HTP (NPU) backend
`<filename>`.bin

Click Browse and select the QNN context binary file (.bin) on the target to perform inference on the host. For details on creating an binary file to run on device processors like HTP, refer to Qualcomm AI Engine Direct SDK documentation.

Dequantize output — Use output dequantization to predict response
off (default) | on

Select the checkbox to dequantize the block's output. Enabling this option results in output data type always being single, irrespective of the deep learning neural network's output layer data type.

QNN HTP Predict

Description

Ports

Input

Port_1 — Input signal to predict response
`n`–D array

Output

Port_2 — Output signal after inference
`n`–D array

Parameters

Deep Learning network format — Format of deep learning network optimized to run on HTP (NPU) backend on the target
`QNN-Model` | `Binary`

Host(x86) QNN-model file — QNN model on host (x-86) to predict responses
`<filename>`.so | `<filename>`.dll

Target QNN-model file — Target QNN model on HTP (NPU) backend
`<filename>`.so

Dependencies

Context binary file — QNN context binary file on HTP (NPU) backend
`<filename>`.bin

Dequantize output — Use output dequantization to predict response
off (default) | on

Extended Capabilities

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Topics

QNN HTP Predict

Description

Ports

Input

Port_1 — Input signal to predict response n–D array

Output

Port_2 — Output signal after inference n–D array

Parameters

Deep Learning network format — Format of deep learning network optimized to run on HTP (NPU) backend on the target QNN-Model | Binary

Host(x86) QNN-model file — QNN model on host (x-86) to predict responses <filename>.so | <filename>.dll

Target QNN-model file — Target QNN model on HTP (NPU) backend <filename>.so

Dependencies

Context binary file — QNN context binary file on HTP (NPU) backend <filename>.bin

Dequantize output — Use output dequantization to predict response off (default) | on

Extended Capabilities

C/C++ Code Generation Generate C and C++ code using Simulink® Coder™.

Version History

See Also

Topics

Port_1 — Input signal to predict response
`n`–D array

Port_2 — Output signal after inference
`n`–D array

Deep Learning network format — Format of deep learning network optimized to run on HTP (NPU) backend on the target
`QNN-Model` | `Binary`

Host(x86) QNN-model file — QNN model on host (x-86) to predict responses
`<filename>`.so | `<filename>`.dll

Target QNN-model file — Target QNN model on HTP (NPU) backend
`<filename>`.so

Context binary file — QNN context binary file on HTP (NPU) backend
`<filename>`.bin

Dequantize output — Use output dequantization to predict response
off (default) | on

C/C++ Code Generation
Generate C and C++ code using Simulink® Coder™.