Stars
A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
Kimi K2 is the large language model series developed by Moonshot AI team
Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.
slime is an LLM post-training framework for RL Scaling.
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Expander, an open-source GKR prover designed for scaling large-scale parallel computing.
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
The source of LMSYS website and blogs
verl: Volcano Engine Reinforcement Learning for LLMs
HunyuanVideo: A Systematic Framework For Large Video Generation Model
MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Model Compression Toolbox for Large Language Models and Diffusion Models
Fast, Flexible and Portable Structured Generation
My learning notes for ML SYS.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.
SGLang is fast serving framework for large language models and vision language models.