-
08:33
(UTC) - https://shuaills.github.io/
Lists (1)
Sort Name ascending (A-Z)
Stars
Zero-Config Code Flow for Claude code & Codex
Build a Claude Code–like CLI coding agent from scratch.
🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench
AI模型聚合管理中转分发系统,支持将多种大模型转为统一格式调用,支持OpenAI、Claude、Gemini等格式,可供个人或者企业内部管理与分发渠道使用。🍥 The next-generation LLM gateway and AI asset management system supports multiple languages.
Step-Audio 2 is an end-to-end multi-modal large language model designed for industry-strength audio understanding and speech conversation.
slime is an LLM post-training framework for RL Scaling.
gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI
An open-source AI agent that brings the power of Gemini directly into your terminal.
3x Faster Inference; Unofficial implementation of EAGLE Speculative Decoding
Train speculative decoding models effortlessly and port them smoothly to SGLang serving.
A fast communication-overlapping library for tensor/expert parallelism on GPUs.
The official Python SDK for Model Context Protocol servers and clients
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
A unified library of state-of-the-art model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment…
Ongoing research training transformer models at scale
Fast, Flexible and Portable Structured Generation
A bidirectional pipeline parallelism algorithm for computation-communication overlap in DeepSeek V3/R1 training.
My learning notes/codes for ML SYS.
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
FlashInfer: Kernel Library for LLM Serving
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
Tensors and Dynamic neural networks in Python with strong GPU acceleration
Making large AI models cheaper, faster and more accessible
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…