Lists (4)
Sort Name ascending (A-Z)
Starred repositories
程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).
Perplexity open source garden for inference technology
pizlonator / fil-c
Forked from llvm/llvm-projectFil-C: completely compatible memory safety for C and C++
⚡ Clash for Lab 是为实验室环境设计的科学上网工具,无需sudo权限,优雅地一键式脚本安装
程序员延寿指南 | A programmer's guide to live longer
A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
Easy Data Preparation with latest LLMs-based Operators and Pipelines.
DeepEP: an efficient expert-parallel communication library
Practical GPU Sharing Without Memory Size Constraints
Optimized primitives for collective multi-GPU communication
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
ArcticInference: vLLM plugin for high-throughput, low-latency inference
Official implementation of "Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding"
A lightweight design for computation-communication overlap.
Tile primitives for speedy kernels
Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
A Datacenter Scale Distributed Inference Serving Framework
Development repository for the Triton language and compiler