-
University of California, Riverside
- Riverside, CA
- www.shixun404.com
Stars
verl: Volcano Engine Reinforcement Learning for LLMs
slime is an LLM post-training framework for RL Scaling.
Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.
Distributed Compiler based on Triton for Parallel Systems
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Applied AI experiments and examples for PyTorch
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
[RSS 2025] "ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills"
Tile primitives for speedy kernels
SGLang is a fast serving framework for large language models and vision language models.
DeepEP: an efficient expert-parallel communication library
Fast and memory-efficient exact attention
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
High Performance Inter-Thread Messaging Library
An optimized ANS compressor for multi-byte integer data on NVIDIA GPUs.
DFloat11: Lossless LLM Compression for Efficient GPU Inference
Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
FlashInfer: Kernel Library for LLM Serving
CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning
Development repository for the Triton language and compiler
A PyTorch native platform for training generative AI models
Ongoing research training transformer models at scale