Starred repositories
DLRover: An Automatic Distributed Deep Learning System
nanobind: tiny and efficient C++/Python bindings
Convert .ninja_log files to chrome's about:tracing format.
A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
Useful shortcuts for bash/zsh
Benchmarking unity builds on real c++ projects.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance…
A Python framework for accelerated simulation, data generation and spatial computing.
Flax is a neural network library for JAX that is designed for flexibility.
Task-based datasets, preprocessing, and evaluation for sequence models.
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A tool to classify and statistic GPU kernel information.
📚 Modern C++ Tutorial: C++11/14/17/20 On the Fly | https://changkun.de/modern-cpp/
An efficient GPU resource sharing system with fine-grained control for Linux platforms.
NumPy and SciPy on Multi-Node Multi-GPU systems
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
AddressSanitizer, ThreadSanitizer, MemorySanitizer
CUDA Python: Performance meets Productivity
Development repository for the Triton language and compiler