Codestin Search App

About

SGLang-Omni is a high-performance serving framework for omni and multimodal models, built on top of SGLang. It is designed to orchestrate multi-stage pipelines with low latency and OpenAI-compatible APIs.

Modern omni models — such as speech-output LLMs and multimodal generation systems — decompose into heterogeneous stages with fundamentally different computational profiles: a compute-bound thinker, a memory-bound talker, a latency-sensitive codec. SGLang-Omni is built around a computation-centric design: each stage runs its own independent scheduler tuned to its bottleneck, communicates through a shared inbox/outbox abstraction, and transfers tensors via zero-copy shared memory. This prevents any single stage from degrading the others and allows new models to plug into the framework by declaring a pipeline topology rather than building an inference system from scratch.

Core features:

Multi-Stage Pipeline: Flexible framework for orchestrating preprocessing, AR engine, codec, and vocoder stages across processes and GPUs.
Native SGLang Integration: Leverages SGLang's RadixAttention, continuous batching, and CUDA Graph optimizations for the AR backbone.
OpenAI-Compatible Server: Drop-in /v1/audio/speech and /v1/chat/completions endpoints with real-time streaming support.
Broad Model Support: Supports a growing set of TTS and omni models including Higgs Audio, Fish Audio S2-Pro, Voxtral TTS, Qwen3 TTS, MOSS-TTS, Qwen3-Omni, Ming-Omni, and LLaDA2.0-Uni.

Supported Models

Model	Type	Notes
boson-sglang/higgs-audio-v3-tts-4b-base	TTS	Voice cloning, streaming, 100+ languages
fishaudio/s2-pro	TTS	Voice cloning, streaming
mistralai/Voxtral-4B-TTS-2603	TTS	Named voices, streaming, 9 languages
Qwen/Qwen3-TTS-12Hz-Base	TTS	Voice cloning, streaming, 10 languages, 0.6B / 1.7B
OpenMOSS-Team/MOSS-TTS-v1.5	TTS	Voice cloning, streaming, 31 languages
Qwen/Qwen3-Omni-30B-A3B-Instruct	Omni	Text, image, audio, video → text + audio
inclusionAI/Ming-flash-omni-2.0	Omni	Streaming TTS
inclusionAI/LLaDA2.0-Uni	Multimodal	Text + image understanding and generation

Name		Name	Last commit message	Last commit date
Latest commit History 337 Commits
.claude/skills		.claude/skills
.github		.github
benchmarks		benchmarks
docker		docker
docs		docs
examples		examples
playground		playground
scripts/ci/utils		scripts/ci/utils
sglang_omni		sglang_omni
sglang_omni_router		sglang_omni_router
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Supported Models

Get Started

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

About

Supported Models

Get Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages