Starred repositories
An extremely fast Python package and project manager, written in Rust.
本项目将《动手学深度学习》(Dive into Deep Learning)原书中的MXNet实现改为PyTorch实现。
An easy to use PyTorch to TensorRT converter
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
mimalloc is a compact general purpose allocator with excellent performance.
An interactive NVIDIA-GPU process viewer and beyond, the one-stop solution for GPU process management.
A modern, C++-native, test framework for unit-tests, TDD and BDD - using C++14, C++17 and later (C++11 support is in v2.x branch, and C++03 on the Catch1.x branch)
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
A family of header-only, very fast and memory-friendly hashmap and btree containers.
The central registry of Bazel modules for the Bzlmod external dependency system.
[SIGMOD 2024] RaBitQ: Quantizing High-Dimensional Vectors with a Theoretical Error Bound for Approximate Nearest Neighbor Search
[SIGMOD 2025] Practical and Asymptotically Optimal Quantization of High-Dimensional Vectors in Euclidean Space for Approximate Nearest Neighbor Search
🦆 A curated list of awesome DuckDB resources
Examples from Programming in Parallel with CUDA
A fast multi-producer, multi-consumer lock-free concurrent queue for C++11
A distributed approximate nearest neighborhood search (ANN) library which provides a high quality vector index build, search and distributed online serving toolkits for large scale vector search sc…
Build rules for interfacing with "foreign" (non-Bazel) build systems (CMake, configure-make, GNU Make, boost, ninja, Meson)
RAFT contains fundamental widely-used algorithms and primitives for machine learning and information retrieval. The algorithms are CUDA-accelerated and form building blocks for more easily writing …
Efficient binary-decimal and decimal-binary conversion routines for IEEE doubles.
A composable and fully extensible C++ execution engine library for data management systems.
vsag is a vector indexing library used for similarity search.
cuVS - a library for vector search and clustering on the GPU
A curated list of awesome SIMD frameworks, libraries and software
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …
C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))