Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sgl-project/sglang-omni

Repository files navigation

logo

license issue resolution open issues DeepWiki


Documentation | Join Slack

About

SGLang-Omni is a high-performance serving framework for omni and multimodal models, built on top of SGLang. It is designed to orchestrate multi-stage pipelines with low latency and OpenAI-compatible APIs.

Modern omni models — such as speech-output LLMs and multimodal generation systems — decompose into heterogeneous stages with fundamentally different computational profiles: a compute-bound thinker, a memory-bound talker, a latency-sensitive codec. SGLang-Omni is built around a computation-centric design: each stage runs its own independent scheduler tuned to its bottleneck, communicates through a shared inbox/outbox abstraction, and transfers tensors via zero-copy shared memory. This prevents any single stage from degrading the others and allows new models to plug into the framework by declaring a pipeline topology rather than building an inference system from scratch.

Core features:

  • Multi-Stage Pipeline: Flexible framework for orchestrating preprocessing, AR engine, codec, and vocoder stages across processes and GPUs.
  • Native SGLang Integration: Leverages SGLang's RadixAttention, continuous batching, and CUDA Graph optimizations for the AR backbone.
  • OpenAI-Compatible Server: Drop-in /v1/audio/speech and /v1/chat/completions endpoints with real-time streaming support.
  • Broad Model Support: Supports a growing set of TTS and omni models including Higgs Audio, Fish Audio S2-Pro, Voxtral TTS, Qwen3 TTS, MOSS-TTS, Qwen3-Omni, Ming-Omni, and LLaDA2.0-Uni.

Supported Models

Model Type Notes
boson-sglang/higgs-audio-v3-tts-4b-base TTS Voice cloning, streaming, 100+ languages
fishaudio/s2-pro TTS Voice cloning, streaming
mistralai/Voxtral-4B-TTS-2603 TTS Named voices, streaming, 9 languages
Qwen/Qwen3-TTS-12Hz-Base TTS Voice cloning, streaming, 10 languages, 0.6B / 1.7B
OpenMOSS-Team/MOSS-TTS-v1.5 TTS Voice cloning, streaming, 31 languages
Qwen/Qwen3-Omni-30B-A3B-Instruct Omni Text, image, audio, video → text + audio
inclusionAI/Ming-flash-omni-2.0 Omni Streaming TTS
inclusionAI/LLaDA2.0-Uni Multimodal Text + image understanding and generation

Get Started

About

SGLang Omni: High-Performance Multi-Stage Pipeline Framework for Omni Models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors