Stars
cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
Helpful kernel tutorials and examples for tile-based GPU programming
FlashInfer: Kernel Library for LLM Serving
Github mirror of trition-lang/triton repo.
Virtual whiteboard for sketching hand-drawn like diagrams
Efficient Triton Kernels for LLM Training
🚀 Efficient implementations of state-of-the-art linear attention models
NVIDIA Linux open GPU kernel module source
This is a typora theme inspired by Vue document style. 一个类似于 Vue 文档风格的 Typora Markdown 编辑器主题。
Display CuTe layouts in no time! No need to wait for minutes for CuTe to compile just to print a single SVG.
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
CUDA Matrix Multiplication Optimization
Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels
Distributed Compiler based on Triton for Parallel Systems
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.
FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.
verl: Volcano Engine Reinforcement Learning for LLMs
Train transformer language models with reinforcement learning.