-
vLLM
- San Francisco Bay Area
-
07:23
(UTC -07:00) - https://zhuohan.li
- @zhuohan123
- in/zhuohan-li
Stars
TPU inference for vLLM, with unified JAX and PyTorch support.
SkyRL: A Modular Full-stack RL Library for LLMs
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
๐คไธไธชๅบไบ WeChaty ็ปๅ DeepSeek / ChatGPT / Kimi / ่ฎฏ้ฃ็ญAiๆๅกๅฎ็ฐ็ๅพฎไฟกๆบๅจไบบ ๏ผๅฏไปฅ็จๆฅๅธฎๅฉไฝ ่ชๅจๅๅคๅพฎไฟกๆถๆฏ๏ผๆ่ ็ฎก็ๅพฎไฟก็พค/ๅฅฝๅ๏ผๆฃๆตๅตๅฐธ็ฒ็ญ...
Renderer for the harmony response format to be used with gpt-oss
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
Mirage Persistent Kernel: Compiling LLMs into a MegaKernel
From Images to High-Fidelity 3D Assets with Production-Ready PBR Material
A PyTorch native platform for training generative AI models
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
A program to read, merge, and write programs for the Breville Control ยฐFreakยฎ
Tensors and Dynamic neural networks in Python with strong GPU acceleration
The Startup CTO's Handbook, a book covering leadership, management and technical topics for leaders of software engineering teams
A high-performance distributed file system designed to address the challenges of AI training and inference workloads.
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
The best OSS video generation models, created by Genmo
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
A throughput-oriented high-performance serving framework for LLMs
Dynamic Memory Management for Serving LLMs without PagedAttention
A framework for few-shot evaluation of language models.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.