Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View qili93's full-sized avatar

Block or report qili93

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

NVIDIA cuTile learn

Python 130 Updated Dec 9, 2025

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,649 85 Updated Dec 20, 2025

Helpful kernel tutorials and examples for tile-based GPU programming

Python 471 25 Updated Dec 22, 2025

FlashInfer: Kernel Library for LLM Serving

Cuda 4,332 613 Updated Dec 23, 2025

Github mirror of trition-lang/triton repo.

MLIR 109 31 Updated Dec 22, 2025

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 112,894 11,921 Updated Dec 21, 2025

Material for gpu-mode lectures

Jupyter Notebook 5,444 552 Updated Dec 8, 2025

Fast low-bit matmul kernels in Triton

Python 413 30 Updated Dec 18, 2025
Python 15 14 Updated Nov 10, 2025

Efficient Triton Kernels for LLM Training

Python 5,966 454 Updated Dec 21, 2025

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,108 336 Updated Dec 20, 2025

NVIDIA Linux open GPU kernel module source

C 16,511 1,546 Updated Dec 18, 2025

This is a typora theme inspired by Vue document style. 一个类似于 Vue 文档风格的 Typora Markdown 编辑器主题。

CSS 958 180 Updated Jun 1, 2023

Display CuTe layouts in no time! No need to wait for minutes for CuTe to compile just to print a single SVG.

Python 4 Updated Aug 7, 2025

正确使用电脑,保护眼睛,强制休息。 Eye care for PC user

C# 89 17 Updated Nov 20, 2024

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,019 882 Updated Dec 4, 2025

CUDA Matrix Multiplication Optimization

Cuda 247 24 Updated Jul 19, 2024

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,280 353 Updated Dec 22, 2025

Tile primitives for speedy kernels

Cuda 3,009 217 Updated Dec 9, 2025

Distributed Compiler based on Triton for Parallel Systems

Python 1,285 114 Updated Dec 16, 2025

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,504 696 Updated Dec 22, 2025

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 178 15 Updated Dec 23, 2025

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,529 607 Updated Feb 15, 2025

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 304 55 Updated Dec 23, 2025

Puzzles for learning Triton

Jupyter Notebook 2,192 179 Updated Nov 18, 2024

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 148 30 Updated Dec 23, 2025

The Modular Platform (includes MAX & Mojo)

Mojo 25,374 2,745 Updated Dec 22, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,709 2,868 Updated Dec 22, 2025

Train transformer language models with reinforcement learning.

Python 16,738 2,372 Updated Dec 22, 2025
Next