Struggling with QNN-LLM NPU Acceleration on Snapdragon (MNN/QAIRT) #4012

liu-mengyang · 2025-11-17T10:44:46Z

liu-mengyang
Nov 17, 2025

I'm trying to run LLMs using the MNN framework with QNN backend and QAIRT 2.38 on a Snapdragon SoC. I successfully built the repo and converted/quantized the model using the official MNN QNN-LLM tutorial.

I am running into three main issues and need help confirming successful NPU usage:

1. Cannot Confirm NPU Execution

The MNN documentation says NPU inference registers to the CPU backend. This is confusing.

When following this confusing CPU configuration, I cannot verify NPU usage. A print line I added to QNNBackend::executeGraph() never executes.

Question: How can I successfully verify that the NPU is actually being used when configured to the CPU backend?

2. Segmentation Fault with 'npu' Backend

If I explicitly set the backend to npu in the configuration file, I consistently get a Segmentation Fault.

This is supposed to trigger the online graph construction mode, but I can't get it working.

Question: What specific steps or prerequisites are needed to run successfully using the explicit npu backend without crashing?

3. VLM (Vision-Language Model) Conversion

The provided llm-qnn generator only converts the language model part.

I can't find a way to convert or run the visual encoder of a VLM on the NPU.

Question: Has anyone successfully converted and run a full VLM (including the visual encoder) onto the NPU using this toolchain?

Please share your setup details and success stories if you have managed to get QNN-LLM running reliably on a Snapdragon NPU. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Struggling with QNN-LLM NPU Acceleration on Snapdragon (MNN/QAIRT) #4012

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Struggling with QNN-LLM NPU Acceleration on Snapdragon (MNN/QAIRT) #4012

Uh oh!

liu-mengyang Nov 17, 2025

Replies: 0 comments

liu-mengyang
Nov 17, 2025