Lists (1)
Sort Name ascending (A-Z)
Stars
Tilus is a tile-level kernel programming language with explicit control over shared memory and registers.
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Official Implementation of EAGLE-1 (ICML'24), EAGLE-2 (EMNLP'24), and EAGLE-3 (NeurIPS'25).
Production-tested AI infrastructure tools for efficient AGI development and community-driven innovation
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
A book for Learning the Foundations of LLMs
An extremely fast Python linter and code formatter, written in Rust.
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous …
KECC: KAIST Educational C Compiler. IMPORTANT: DON'T FORK!
A Easy-to-understand TensorOp Matmul Tutorial
how to optimize some algorithm in cuda.
A curated list for Efficient Large Language Models
Awesome-LLM: a curated list of Large Language Model
A high-throughput and memory-efficient inference and serving engine for LLMs
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
GPU programming related news and material links
High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
程序员延寿指南 | A programmer's guide to live longer
An open-source efficient deep learning framework/compiler, written in python.
A list of awesome compiler projects and papers for tensor computation and deep learning.
AISystem 主要是指AI系统,包括AI芯片、AI编译器、AI推理和训练框架等AI全栈底层技术