MLLM

Mobile × Multimodal

Fast and lightweight LLM inference engine for mobile and edge devices

| Arm CPU | X86 CPU | Qualcomm NPU(QNN) |

MLLM-advanced is an extension of the MLLM project and offers more features and functionalities.

Features

Dynamic-static integrated computation graph for easy implementation of your algorithms.
Multi-backend support for CPU/NPU. Designed for MOBILE devices.
A complete graph-level IR with customizable Passes, and a Qnn Lowering Pipeline provided to compile your graph into QNN Graph.
MobiLMCache! An edge-side LMCache with cache pagination management.

Supported Models

Model	Quantization Methods	Backends
DeepSeek-Distill-Qwen-1.5B	W32A32, W4A32	Arm-CPU
Qwen2VL-2B-Instruct	W32A32, W4A32	Arm-CPU

Run Examples

We demonstrate the usage with an example using DeepSeek distill qwen2 1.5B as the model.

./demo_ds_qwen2 -m {model path} -j {tokenize.json file}

Install

Build From Source

The following commands have been tested on Linux systems.

git clone --recursive https://github.com/chenghuaWang/mllm-advanced.git

export ANDROID_NDK_PATH = /path/to/android-ndk

# build
python task.py tasks/android_build.yaml

# push to u device
python task.py tasks/adb_push.yaml

Install From PyPI

pip install .

Tools

Model Convertor

Use the following command to convert models:

python tools/convertor.py --input {safetensors file} --output {output file} --format safetensors

mllm-quantizer

Usage:

Usage:
 [-h|--help] <FILE> [-s|--show]

Options:
  -h, --help    Show help message
  <FILE>        Input file path
  -s, --show    Show parameters meta data

mllm-tokenize-checker

Usage:

Usage:
 [-h|--help] [-j|--json] [-m|--merge] [-t|--type] [-i|--input_str]

Options:
  -h, --help    Show help message.
  -j, --json    SentencePiece json file path.
  -m, --merge   Merge file path.
  -t, --type    Model Type.
  -i, --input_str       Input string for testing.

Example:

./mllm-tokenize-checker -j ../mllm-models/DeepSeek-R1-Distill-Qwen-1.5B/tokenizer.json -t ds-qwen2 -i "你好"

Output:

Tensor Meta Info
address:  0x7f85e8224000
name:     qwen2-tokenizer-i0
shape:    1x2
device:   kCPU
dtype:    kInt64
[[151646, 108386]]

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
.devcontainer		.devcontainer
algorithms		algorithms
benchmarks		benchmarks
cmake		cmake
docker		docker
docs		docs
examples/Models		examples/Models
mllm		mllm
playground		playground
pymllm		pymllm
scripts/QuantCfg		scripts/QuantCfg
tasks		tasks
tests		tests
third_party		third_party
tools		tools
.clang-format		.clang-format
.clang-tidy		.clang-tidy
.clang-tidy.ignore		.clang-tidy.ignore
.clangd		.clangd
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
task.py		task.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

MLLM

Mobile × Multimodal

Features

Supported Models

Run Examples

Install

Build From Source

Install From PyPI

Tools

Model Convertor

mllm-quantizer

mllm-tokenize-checker

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

chenghuaWang/mllm-advanced

Folders and files

Latest commit

History

Repository files navigation

MLLM

Mobile × Multimodal

Features

Supported Models

Run Examples

Install

Build From Source

Install From PyPI

Tools

Model Convertor

mllm-quantizer

mllm-tokenize-checker

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages