NexaSDK - Run any AI model on any backend

🤝 Supported chipmakers

NexaSDK - Run any AI model on any backend

NexaSDK is an easy-to-use developer toolkit for running any AI model locally — across NPUs, GPUs, and CPUs — powered by our NexaML engine, built entirely from scratch for peak performance on every hardware stack. Unlike wrappers that depend on existing runtimes, NexaML is a unified inference engine built at the kernel level. It’s what lets NexaSDK achieve Day-0 support for new model architectures (LLM. VLM, CV, Embedding, Rerank, ASR, TTS). NexaML supports 3 model formats: GGUF, MLX, and Nexa AI's own .nexa format.

⚙️ Differentiation

Features	NexaSDK	Ollama	llama.cpp	LM Studio
NPU support	✅ NPU-first	⚠️	⚠️	❌
Android/iOS SDK support	✅ NPU/GPU/CPU support	⚠️	⚠️	❌
Linux support (Docker image)	✅	✅	✅	❌
Support any model in GGUF, MLX, NEXA format	✅ Low-level Control	❌	⚠️	❌
Full multimodality support	✅ Image, Audio, Text, Embedding, Rerank, ASR, TTS	⚠️	⚠️	⚠️
Cross-platform support	✅ Desktop, Mobile (Android, iOS), Automotive, IoT (Linux)	⚠️	⚠️	⚠️
One line of code to run	✅	✅	⚠️	✅
OpenAI-compatible API + Function calling	✅	✅	✅	✅

Legend: ✅ Supported | ⚠️ Partial or limited support | ❌ No

Recent Wins

📣 NexaSDK for Linux is released in partnership with Qualcomm and Docker Inc. See Linux SDK Doc for usages.
📣 NexaSDK for Android is highlighted by Qualcomm blog as "a simple way to bring on-device AI to smartphones with Snapdragon", and NexaML engine is featured in Qualcomm blog as "Revolutionizing On-Device AI Inferencing".
📣 Release Nexa AI’s AutoNeural-VL-1.5B, an NPU-native vision–language model built for real-time in-car assistants, delivering 14× lower latency, 3× faster decode, and 4× longer context on Qualcomm SA8295P — now also runnable on Qualcomm X Elite laptops.
📣 Support Mistral AI's Ministral-3-3B across Qualcomm Hexagon NPU, Apple Neural Engine, GPU and CPU.
📣 Release Linux SDK for NPU/GPU/CPU. See Linux SDK Doc.
📣 Support Apple Neural Engine for Granite-4.0, Qwen3, Gemma3, and Parakeetv3. Download NexaSDK for ANE here.
📣 Support Android SDK for NPU/GPU/CPU. See Android SDK Doc and Android SDK Demo App.
📣 Support SDXL-turbo image generation on AMD NPU. See AMD blog : Advancing AI with Nexa AI.
Support Android Python SDK for NPU/GPU/CPU. See Android Python SDK Doc and Android Python SDK Demo App.
📣 Day-0 Support for Qwen3-VL-4B and 8B in GGUF, MLX, .nexa format for NPU/GPU/CPU. We are the only framework that supports the GGUF format. Featured in Qwen's post about our partnership.
📣 Day-0 Support for IBM Granite 4.0 on NPU/GPU/CPU. NexaML engine were featured right next to vLLM, llama.cpp, and MLX in IBM's blog.
📣 Day-0 Support for Google EmbeddingGemma on NPU. We are featured in Google's social post.
📣 Supported vision capability for Gemma3n: First-ever Gemma-3n multimodal inference for GPU & CPU, in GGUF format.
📣 Intel NPU Support DeepSeek-r1-distill-Qwen-1.5B and Llama3.2-3B
📣 Apple Neural Engine Support for real-time speech recognition with Parakeet v3 model

Quick Start

Step 1: Download Nexa CLI with one click

Windows

Linux

For arm64 (Qualcomm NPU)

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_arm64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

For x86_64:

curl -fsSL https://github.com/NexaAI/nexa-sdk/releases/latest/download/nexa-cli_linux_x86_64.sh -o install.sh && chmod +x install.sh && ./install.sh && rm install.sh

macOS

Uninstall

sudo rm -r /opt/nexa_sdk
sudo rm /usr/local/bin/nexa
# if you want to remove data as well
# rm -r $HOME/.cache/nexa.ai

Step 2: Run models with one line of code

You can run any compatible GGUF, MLX, or nexa model from 🤗 Hugging Face by using the nexa infer <full repo name>.

GGUF models

Tip

GGUF runs on macOS, Linux, and Windows on CPU/GPU. Note certain GGUF models are only supported by NexaSDK (e.g. DeepSeek-OCR).

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer ggml-org/Qwen3-1.7B-GGUF

🖼️ Run and chat with Multimodal models, e.g. Qwen3-VL-4B:

nexa infer NexaAI/Qwen3-VL-4B-Instruct-GGUF

MLX models

Tip

MLX is macOS-only (Apple Silicon). Many MLX models in the Hugging Face mlx-community organization have quality issues and may not run reliably. We recommend starting with models from our curated NexaAI Collection for best results. For example

📝 Run and chat with LLMs, e.g. Qwen3:

nexa infer NexaAI/Qwen3-4B-4bit-MLX

🖼️ Run and chat with Multimodal models, e.g. Gemma3n:

nexa infer NexaAI/gemma-3n-E4B-it-4bit-MLX

Qualcomm NPU models

Tip

You need to download the arm64 with Qualcomm NPU support and make sure you have Snapdragon® X Elite chip on your laptop.

Quick Start (Windows arm64, Snapdragon X Elite)

Login & Get Access Token (required for Pro Models)
- Create an account at sdk.nexa.ai
- Go to Deployment → Create Token
- Run this once in your terminal (replace with your token):
```
nexa config set license '<your_token_here>'
```
Run and chat with our multimodal model, OmniNeural-4B, or other models on NPU

nexa infer NexaAI/OmniNeural-4B
nexa infer NexaAI/Granite-4-Micro-NPU
nexa infer NexaAI/Qwen3-VL-4B-Instruct-NPU

CLI Reference

Essential Command	What it does
`nexa -h`	show all CLI commands
`nexa pull <repo>`	Interactive download & cache of a model
`nexa infer <repo>`	Local inference
`nexa list`	Show all cached models with sizes
`nexa remove <repo>` / `nexa clean`	Delete one / all cached models
`nexa serve --host 127.0.0.1:8080`	Launch OpenAI‑compatible REST server
`nexa run <repo>`	Chat with a model via an existing server

👉 To interact with multimodal models, you can drag photos or audio clips directly into the CLI — you can even drop multiple images at once!

See CLI Reference for full commands.

Import model from local filesystem

# hf download <model> --local-dir /path/to/modeldir
nexa pull <model> --model-hub localfs --local-path /path/to/modeldir

🎯 You Decide What Model We Support Next

Nexa Wishlist — Request and vote for the models you want to run on-device.

Drop a Hugging Face repo ID, pick your preferred backend (GGUF, MLX, or Nexa format for Qualcomm + Apple NPUs), and watch the community's top requests go live in NexaSDK.

👉 Vote now at sdk.nexa.ai/wishlist

Acknowledgements

We would like to thank the following projects:

Join Builder Bounty Program

Earn up to 1,500 USD for building with NexaSDK.

Learn more in our Participant Details.

License

NexaSDK uses a dual licensing model:

CPU/GPU Components

Licensed under Apache License 2.0.

NPU Components

Personal Use: Free license key available from Nexa AI Model Hub. Each key activates 1 device for NPU usage.
Commercial Use: Contact [email protected] for licensing.

Name		Name	Last commit message	Last commit date
Latest commit History 1,379 Commits
.github		.github
.vscode		.vscode
assets		assets
bindings		bindings
cookbook		cookbook
runner		runner
solutions		solutions
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
LICENSE-THIRD-PARTY		LICENSE-THIRD-PARTY
README.md		README.md
README_zh.md		README_zh.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

NexaSDK - Run any AI model on any backend

⚙️ Differentiation

Recent Wins

Quick Start

Step 1: Download Nexa CLI with one click

Windows

Linux

For arm64 (Qualcomm NPU)

For x86_64:

macOS

Uninstall

Step 2: Run models with one line of code

GGUF models

MLX models

Qualcomm NPU models

Quick Start (Windows arm64, Snapdragon X Elite)

CLI Reference

Import model from local filesystem

🎯 You Decide What Model We Support Next

Acknowledgements

Join Builder Bounty Program

License

CPU/GPU Components

NPU Components

About

Licenses found

Uh oh!

Releases 46

Packages

Uh oh!

Contributors 43

Languages

License

Licenses found

NexaAI/nexa-sdk

Folders and files

Latest commit

History

Repository files navigation

NexaSDK - Run any AI model on any backend

⚙️ Differentiation

Recent Wins

Quick Start

Step 1: Download Nexa CLI with one click

Windows

Linux

For arm64 (Qualcomm NPU)

For x86_64:

macOS

Uninstall

Step 2: Run models with one line of code

GGUF models

MLX models

Qualcomm NPU models

Quick Start (Windows arm64, Snapdragon X Elite)

CLI Reference

Import model from local filesystem

🎯 You Decide What Model We Support Next

Acknowledgements

Join Builder Bounty Program

License

CPU/GPU Components

NPU Components

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases 46

Packages 0

Uh oh!

Contributors 43

Languages

Packages