🔥 LLM-powered GPU kernel synthesis: Train models to convert PyTorch ops into optimized Triton kernels via SFT+RL. Multi-turn compilation feedback, cross-platform NVIDIA/AMD, Kernelbook + KernelBench

Python 98 3 Updated Nov 10, 2025

Gar-b-age / CookLikeHOC

🥢像老乡鸡🐔那样做饭。主要部分于2024年完工，非老乡鸡官方仓库。文字来自《老乡鸡菜品溯源报告》，并做归纳、编辑与整理。CookLikeHOC.

JavaScript 22,020 2,214 Updated Oct 17, 2025

thinking-machines-lab / batch_invariant_ops

Python 894 68 Updated Nov 4, 2025

KuangjuX / NVSHMEM-Tutorial

NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer

Cuda 142 11 Updated Sep 18, 2025

exo-explore / exo

Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚

Python 32,375 2,198 Updated Nov 3, 2025

openai / gpt-oss

gpt-oss-120b and gpt-oss-20b are two open-weight language models by OpenAI

Python 19,168 1,918 Updated Nov 1, 2025

NVIDIA / cutlass

CUDA Templates and Python DSLs for High-Performance Linear Algebra

C++ 8,763 1,523 Updated Nov 10, 2025

boson-ai / higgs-audio

Text-audio foundation model from Boson AI

Python 7,617 564 Updated Sep 15, 2025

sgl-project / SpecForge

Train speculative decoding models effortlessly and port them smoothly to SGLang serving.

Python 462 106 Updated Nov 12, 2025

yifuwang / symm-mem-recipes

Python 147 14 Updated Dec 27, 2024

hao-ai-lab / Awesome-Video-Attention

A curated list of recent papers on efficient video attention for video diffusion models, including sparsification, quantization, and caching, etc.

46 4 Updated Oct 27, 2025

MoonshotAI / Kimi-Dev

open-source coding LLM for software engineering tasks

Python 1,039 120 Updated Sep 30, 2025

HW-whistleblower / True-Story-of-Pangu

诺亚盘古大模型研发背后的真正的心酸与黑暗的故事。

11,377 1,364 Updated Jul 9, 2025

HazyResearch / cartridges

Storing long contexts in tiny caches with self-study

Python 214 21 Updated Oct 17, 2025

snowflakedb / ArcticInference

ArcticInference: vLLM plugin for high-throughput, low-latency inference

Python 299 38 Updated Nov 11, 2025

HazyResearch / Megakernels

kernels, of the mega variety

Python 599 27 Updated Sep 28, 2025

ByteDance-Seed / ByteCheckpoint

ByteCheckpoint: An Unified Checkpointing Library for LFMs

Python 252 17 Updated Jul 10, 2025

chengzeyi / ParaAttention

https://wavespeed.ai/ Context parallel attention that accelerates DiT model inference with dynamic caching

Python 387 38 Updated Jul 5, 2025

Wenxuan Tan Edenzzzz

Highlights

Lists (3)

Deepseek infra

Model Toolkits

Paper list

Stars