- Pudong, Shanghai
Starred repositories
This is the homepage of a new book entitled "Mathematical Foundations of Reinforcement Learning."
CUDA Templates and Python DSLs for High-Performance Linear Algebra
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
how to optimize some algorithm in cuda.
手把手带你实战 Huggingface Transformers 课程视频同步更新在B站与YouTube
Development repository for the Triton language and compiler
The Triton Inference Server provides an optimized cloud and edge inferencing solution.
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
Implementations of SIMD instruction sets for systems which don't natively support them.
《Effective Modern C++》- 完成翻译
The C++ Core Guidelines are a set of tried-and-true guidelines, rules, and best practices about coding in C++
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)
SGI STL source code analysis and note from 《STL源码剖析》 by 侯捷(包含电子书、源码注释及测试代码)
PaddlePaddle custom device implementaion. (『飞桨』自定义硬件接入实现)
Warp is the agentic development environment, built for coding with multiple AI agents.
A General-purpose Task-parallel Programming System using Modern C++
C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, NEON for ARM.
This is my translation of Chinese document of Eigen
C++ examples for the Vulkan graphics API
ncnn is a high-performance neural network inference framework optimized for the mobile platform