Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View shixun404's full-sized avatar

Block or report shixun404

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,990 2,402 Updated Oct 31, 2025

slime is an LLM post-training framework for RL Scaling.

Python 2,323 236 Updated Nov 1, 2025

Lightning-Fast RL for LLM Reasoning and Agents. Made Simple & Flexible.

Python 2,925 214 Updated Nov 1, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,209 100 Updated Oct 17, 2025

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,008 1,827 Updated Nov 1, 2025

Applied AI experiments and examples for PyTorch

Python 301 29 Updated Aug 22, 2025

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory…

Python 2,874 535 Updated Oct 31, 2025

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda 5,847 729 Updated Oct 15, 2025

[RSS 2025] "ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills"

Python 1,704 174 Updated Sep 9, 2025

Tile primitives for speedy kernels

Cuda 2,859 191 Updated Oct 24, 2025

SGLang is a fast serving framework for large language models and vision language models.

Python 19,530 3,229 Updated Nov 1, 2025

DeepEP: an efficient expert-parallel communication library

Cuda 8,671 971 Updated Oct 30, 2025

Fast and memory-efficient exact attention

Python 20,267 2,104 Updated Oct 31, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,468 674 Updated Nov 1, 2025

High Performance Inter-Thread Messaging Library

Java 18,065 3,958 Updated Apr 2, 2025

A novel data compression framework

C 2,665 106 Updated Oct 30, 2025

An optimized ANS compressor for multi-byte integer data on NVIDIA GPUs.

Cuda 4 Updated Aug 7, 2025

DFloat11: Lossless LLM Compression for Efficient GPU Inference

Python 556 33 Updated Aug 24, 2025

Optimized FP16/BF16 x FP4 GPU kernels for AMD GPUs

C++ 34 6 Updated Oct 9, 2025

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 3,812 293 Updated Oct 31, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 3,993 554 Updated Nov 1, 2025
Cuda 11 2 Updated Sep 24, 2025

CUDA-L1: Improving CUDA Optimization via Contrastive Reinforcement Learning

Python 201 17 Updated Oct 28, 2025

HPC Container Maker

Python 497 100 Updated Oct 22, 2025
Python 907 96 Updated Oct 31, 2025

Development repository for the Triton language and compiler

MLIR 17,422 2,351 Updated Nov 1, 2025

Container plugin for Slurm Workload Manager

C 389 37 Updated Oct 2, 2025

OpenPMIx Project Repository

C 251 124 Updated Oct 31, 2025

A PyTorch native platform for training generative AI models

Python 4,622 590 Updated Nov 1, 2025

Ongoing research training transformer models at scale

Python 14,029 3,220 Updated Nov 1, 2025
Next