Efficient Triton Kernels for LLM Training
-
Updated
Jan 21, 2026 - Python
Efficient Triton Kernels for LLM Training
FlagGems is an operator library for large language models implemented in the Triton Language.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Automatic ROPChain Generation
SymGDB - symbolic execution plugin for gdb
Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!
TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels
A performance library for machine learning applications.
AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming
Triton implementation of FlashAttention2 that adds Custom Masks.
ClearML - Model-Serving Orchestration and Repository Solution
Training-free Post-training Efficient Sub-quadratic Complexity Attention. Implemented with OpenAI Triton.
nanoRLHF: from-scratch journey into how LLMs and RLHF really work.
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.
Learn how to develop kernels
Deploy DL/ ML inference pipelines with minimal extra code.
Triton adapter for Ascend. Mirror of https://gitee.com/ascend/triton-ascend
Add a description, image, and links to the triton topic page so that developers can more easily learn about it.
To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."