Lists (2)
Sort Name ascending (A-Z)
Stars
GPGPU-Sim provides a detailed simulation model of contemporary NVIDIA GPUs running CUDA and/or OpenCL workloads. It includes support for features such as TensorCores and CUDA Dynamic Parallelism as…
油猴脚本:知乎备份剪藏,将你喜欢的回答/文章/想法保存为 markdown / zip / png
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
[SIGGRAPH 2025] Official code of the paper "Cobra: Efficient Line Art COlorization with BRoAder References". Cobra:利用更广泛参考图实现高效线稿上色
sketch + style = paints 🎨 (TOG2018/SIGGRAPH2018ASIA)
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Supercharge Your LLM with the Fastest KV Cache Layer
A throughput-oriented high-performance serving framework for LLMs
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
主要记录大语言大模型(LLMs) 算法(应用)工程师相关的知识及面试题
A massively parallel, high-level programming language
Distribute and run LLMs with a single file.
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Ongoing research training transformer models at scale
A latent text-to-image diffusion model
🏂🏻 程序员海外工作/英文面试手册
A language about virtual kontinuation
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Jupyter kernel for the C++ programming language