kangguangli

🎯

Focusing

3 followers · 1 following

Achievements

x2 x3

Achievements

x2 x3

Lists (2)

Sort

interview

2 repositories

🚀 My stack

2 repositories

Stars

gpgpu-sim / gpgpu-sim_distribution

GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…

C++ 1,481 582 Updated Feb 15, 2025

qtqz / zhihu-backup-collect

油猴脚本：知乎备份剪藏，将你喜欢的回答/文章/想法保存为 markdown / zip / png

TypeScript 115 19 Updated Oct 8, 2025

Servarr / Wiki

Python 537 80 Updated Nov 3, 2025

abdulle-sabaf / cathunu-bhallifa

悟数学

TeX 194 42 Updated Jul 27, 2025

srush / Triton-Puzzles

Puzzles for learning Triton

Jupyter Notebook 2,098 171 Updated Nov 18, 2024

xlite-dev / LeetCUDA

📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉

Cuda 8,318 823 Updated Oct 17, 2025

zhuang2002 / Cobra

[SIGGRAPH 2025] Official code of the paper "Cobra: Efficient Line Art COlorization with BRoAder References". Cobra：利用更广泛参考图实现高效线稿上色

Python 225 16 Updated Apr 17, 2025

lllyasviel / style2paints

sketch + style = paints 🎨 (TOG2018/SIGGRAPH2018ASIA)

JavaScript 18,213 2,094 Updated Aug 1, 2023

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

C++ 4,215 420 Updated Nov 5, 2025

StarryVae / RDMA-tutorial

C 211 39 Updated Mar 24, 2023

LMCache / LMCache

Supercharge Your LLM with the Fastest KV Cache Layer

Python 5,901 690 Updated Nov 5, 2025

efeslab / Nanoflow

A throughput-oriented high-performance serving framework for LLMs

Jupyter Notebook 910 44 Updated Oct 29, 2025

NVIDIA / TensorRT-LLM

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…

C++ 12,042 1,839 Updated Nov 5, 2025