Struggling with QNN-LLM NPU Acceleration on Snapdragon (MNN/QAIRT) #4012
Unanswered
liu-mengyang
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to run LLMs using the MNN framework with QNN backend and QAIRT 2.38 on a Snapdragon SoC. I successfully built the repo and converted/quantized the model using the official MNN QNN-LLM tutorial.
I am running into three main issues and need help confirming successful NPU usage:
1. Cannot Confirm NPU Execution
The MNN documentation says NPU inference registers to the CPU backend. This is confusing.
When following this confusing CPU configuration, I cannot verify NPU usage. A print line I added to
QNNBackend::executeGraph()never executes.Question: How can I successfully verify that the NPU is actually being used when configured to the CPU backend?
2. Segmentation Fault with 'npu' Backend
If I explicitly set the backend to npu in the configuration file, I consistently get a
Segmentation Fault.This is supposed to trigger the online graph construction mode, but I can't get it working.
Question: What specific steps or prerequisites are needed to run successfully using the explicit npu backend without crashing?
3. VLM (Vision-Language Model) Conversion
The provided llm-qnn generator only converts the language model part.
I can't find a way to convert or run the visual encoder of a VLM on the NPU.
Question: Has anyone successfully converted and run a full VLM (including the visual encoder) onto the NPU using this toolchain?
Please share your setup details and success stories if you have managed to get QNN-LLM running reliably on a Snapdragon NPU. Thanks!
Beta Was this translation helpful? Give feedback.
All reactions