Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View merrymercy's full-sized avatar
:octocat:
:octocat:

Sponsors

@Ying1123
@HaiShaw
@amiruci

Highlights

  • Pro

Organizations

@apache @dmlc @ucbrise @alpa-projects @lm-sys

Block or report merrymercy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A compact implementation of SGLang, designed to demystify the complexities of modern LLM serving systems.

Python 2,495 233 Updated Dec 26, 2025

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 577 126 Updated Dec 26, 2025
Python 636 62 Updated Dec 25, 2025

JAX backend for SGL

Python 205 49 Updated Dec 27, 2025

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Python 3,541 371 Updated Dec 23, 2025

Kimi K2 is the large language model series developed by Moonshot AI team

9,775 710 Updated Nov 7, 2025

Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton

Go 351 52 Updated Dec 27, 2025

Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serving systems.

Python 250 43 Updated Dec 10, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,035 370 Updated Dec 27, 2025
Python 970 101 Updated Dec 23, 2025

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,492 483 Updated Dec 27, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 3,289 261 Updated Dec 27, 2025

Expander, an open-source GKR prover designed for scaling large-scale parallel computing.

Rust 138 53 Updated Sep 18, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 64,557 7,828 Updated Dec 26, 2025

The source of LMSYS website and blogs

JavaScript 72 65 Updated Dec 23, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,835 2,914 Updated Dec 27, 2025

Fast low-bit matmul kernels in Triton

Python 413 30 Updated Dec 18, 2025

HunyuanVideo: A Systematic Framework For Large Video Generation Model

Python 11,510 1,158 Updated Nov 21, 2025

MiniCPM4 & MiniCPM4.1: Ultra-Efficient LLMs on End Devices, achieving 3+ generation speedup on reasoning tasks

Jupyter Notebook 8,481 527 Updated Oct 8, 2025

[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models

Python 3,514 210 Updated Dec 27, 2025

Model Compression Toolbox for Large Language Models and Diffusion Models

Python 722 78 Updated Aug 14, 2025

Fast, Flexible and Portable Structured Generation

C++ 1,445 114 Updated Dec 27, 2025

My learning notes for ML SYS.

Python 4,823 309 Updated Dec 24, 2025

如何在美国加州建立501c3非盈利组织的文档

15 2 Updated Sep 12, 2021

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & TIS & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,662 841 Updated Dec 18, 2025

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

Cuda 2,955 296 Updated Dec 22, 2025

SGLang is fast serving framework for large language models and vision language models.

Python 30 18 Updated Nov 24, 2025

Materials for learning SGLang

703 51 Updated Dec 15, 2025
Next