Thanks to visit codestin.com
Credit goes to Github.com

qili93

Follow

Qi Li qili93

Follow

12 followers · 4 following

Achievements

Achievements

Stars

dsl-learn / cutile-learn

NVIDIA cuTile learn

Python 130 Updated Dec 9, 2025

NVIDIA / cutile-python

cuTile is a programming model for writing parallel kernels for NVIDIA GPUs

Python 1,649 85 Updated Dec 20, 2025

NVIDIA / TileGym

Helpful kernel tutorials and examples for tile-based GPU programming

Python 471 25 Updated Dec 22, 2025

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 4,332 613 Updated Dec 23, 2025

facebookexperimental / triton

Github mirror of trition-lang/triton repo.

MLIR 109 31 Updated Dec 22, 2025

excalidraw / excalidraw

Virtual whiteboard for sketching hand-drawn like diagrams

TypeScript 112,894 11,921 Updated Dec 21, 2025

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 5,444 552 Updated Dec 8, 2025

dropbox / gemlite

Fast low-bit matmul kernels in Triton

Python 413 30 Updated Dec 18, 2025

Ascend / triton-ascend-ops

Python 15 14 Updated Nov 10, 2025

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 5,966 454 Updated Dec 21, 2025

MoonshotAI / Kimi-Linear

1,238 56 Updated Nov 17, 2025

fla-org / flash-linear-attention

🚀 Efficient implementations of state-of-the-art linear attention models

Python 4,108 336 Updated Dec 20, 2025

NVIDIA / open-gpu-kernel-modules

NVIDIA Linux open GPU kernel module source

C 16,511 1,546 Updated Dec 18, 2025

blinkfox / typora-vue-theme

This is a typora theme inspired by Vue document style. 一个类似于 Vue 文档风格的 Typora Markdown 编辑器主题。

CSS 958 180 Updated Jun 1, 2023

melonedo / cute-layout-display

Display CuTe layouts in no time! No need to wait for minutes for CuTe to compile just to print a single SVG.

Python 4 Updated Aug 7, 2025

GiantappMan / eye-nurse

正确使用电脑，保护眼睛，强制休息。 Eye care for PC user

C# 89 17 Updated Nov 20, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 9,019 882 Updated Dec 4, 2025

leimao / CUDA-GEMM-Optimization

CUDA Matrix Multiplication Optimization

Cuda 247 24 Updated Jul 19, 2024

tile-ai / tilelang

Domain-specific language designed to streamline the development of high-performance GPU/CPU/Accelerators kernels

C++ 4,280 353 Updated Dec 22, 2025

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 3,009 217 Updated Dec 9, 2025

ByteDance-Seed / Triton-distributed

Distributed Compiler based on Triton for Parallel Systems

Python 1,285 114 Updated Dec 16, 2025

pytorch / FBGEMM

FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/

C++ 1,504 696 Updated Dec 22, 2025

meta-pytorch / tritonparse

TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernels

Python 178 15 Updated Dec 23, 2025

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,529 607 Updated Feb 15, 2025

meta-pytorch / tritonbench

Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.

Python 304 55 Updated Dec 23, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,192 179 Updated Nov 18, 2024

flagos-ai / flagtree

FlagTree is a unified compiler supporting multiple AI chip backends for custom Deep Learning operations, which is forked from triton-lang/triton.

C++ 148 30 Updated Dec 23, 2025

modular / modular

The Modular Platform (includes MAX & Mojo)

Mojo 25,374 2,745 Updated Dec 22, 2025

volcengine / verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python 17,709 2,868 Updated Dec 22, 2025

huggingface / trl

Train transformer language models with reinforcement learning.

Python 16,738 2,372 Updated Dec 22, 2025