Stars
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
Fast and memory-efficient exact attention
Modern C++ Programming Course (C++03/11/14/17/20/23/26)
Performance Tuning Tutorial given at Oak Ridge National Laboratory
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. Tensor…
A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)
gProfiler is a system-wide profiler, combining multiple sampling profilers to produce unified visualization of what your CPU is spending time on.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
A very fast and expressive template engine.
Use pytest's runner to discover and execute C++ tests
An efficient C++17 GPU numerical computing library with Python-like syntax
Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)
Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators
[ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl
C++ Insights - See your source code with the eyes of a compiler
Run GUI applications and desktops in docker and podman containers. Focus on security.
An MLIR-based toolchain for AMD AI Engine-enabled devices.
Instant neural graphics primitives: lightning fast NeRF and more
Development repository for the Triton language and compiler
Multiple cursors plugin for vim/neovim
The LLVM Project is a collection of modular and reusable compiler and toolchain technologies.
A client/server indexer for c/c++/objc[++] with integration for Emacs based on clang.
MSVC's implementation of the C++ Standard Library.
VexCL is a C++ vector expression template library for OpenCL/CUDA/OpenMP
HIP: C++ Heterogeneous-Compute Interface for Portability