Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Change the repository type filter

All

    Repositories list

    • TensorRT-LLM

      Public
      TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
      Python
      2k12k521472Updated Dec 27, 2025Dec 27, 2025
    • TransformerEngine

      Public
      A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
      Python
      5893k284101Updated Dec 27, 2025Dec 27, 2025
    • cudaqx

      Public
      Accelerated libraries for quantum-classical computing built on CUDA-Q.
      C++
      41722818Updated Dec 27, 2025Dec 27, 2025
    • aistore

      Public
      AIStore: scalable storage for AI applications
      Go
      2311.7k10Updated Dec 27, 2025Dec 27, 2025
    • cccl

      Public
      CUDA Core Compute Libraries
      C++
      3092.1k1.1k201Updated Dec 27, 2025Dec 27, 2025
    • stdexec

      Public
      `std::execution`, the proposed C++ framework for asynchronous and parallel programming.
      C++
      2222.2k11512Updated Dec 26, 2025Dec 26, 2025
    • linux

      Public
      OpenBMC Linux kernel source tree
      C
      60k700Updated Dec 26, 2025Dec 26, 2025
    • Fuser

      Public
      A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")
      C++
      74368210219Updated Dec 26, 2025Dec 26, 2025
    • Megatron-LM

      Public
      Ongoing research training transformer models at scale
      Python
      3.4k15k339253Updated Dec 26, 2025Dec 26, 2025
    • nsmd

      Public
      MCTP VDM-based Nvidia System Management API
      C++
      1410Updated Dec 26, 2025Dec 26, 2025
    • bmcweb

      Public
      A do everything Redfish, KVM, GUI, and DBus webserver for OpenBMC
      C++
      175500Updated Dec 26, 2025Dec 26, 2025
    • garak

      Public
      the LLM vulnerability scanner
      Python
      7376.7k26540Updated Dec 26, 2025Dec 26, 2025
    • NVTX

      Public
      The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resources in your applications.
      C++
      6649134Updated Dec 26, 2025Dec 26, 2025
    • OSMO

      Public
      The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML
      Python
      6612313Updated Dec 26, 2025Dec 26, 2025
    • JAX-Toolbox

      Public
      JAX-Toolbox
      Python
      683698040Updated Dec 26, 2025Dec 26, 2025
    • doca-platform

      Public
      DOCA Platform manages provisioning and service orchestration for Bluefield DPUs
      Go
      166400Updated Dec 26, 2025Dec 26, 2025
    • cuda-quantum

      Public
      C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows
      C++
      31687640685Updated Dec 26, 2025Dec 26, 2025
    • barney

      Public
      A Scalable (and Optionally, Data-Parallel) ANARI Multi-GPU Path Tracer
      C++
      42120Updated Dec 26, 2025Dec 26, 2025
    • TensorRT-Incubator

      Public
      Experimental projects related to TensorRT
      MLIR
      221163712Updated Dec 25, 2025Dec 25, 2025
    • warp

      Public
      A Python framework for accelerated simulation, data generation and spatial computing.
      Python
      4036k1783Updated Dec 25, 2025Dec 25, 2025
    • Model-Optimizer

      Public
      A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
      Python
      2261.7k5657Updated Dec 25, 2025Dec 25, 2025
    • nvshmem

      Public
      NVIDIA NVSHMEM is a parallel programming interface for NVIDIA GPUs based on OpenSHMEM. NVSHMEM can significantly reduce multi-process communication and coordination overheads by allowing programmers to perform one-sided communication from within CUDA kernels and on CUDA streams.
      C++
      484272013Updated Dec 25, 2025Dec 25, 2025
    • mig-parted

      Public
      MIG Partition Editor for NVIDIA GPUs
      Go
      542352219Updated Dec 25, 2025Dec 25, 2025
    • KAI-Scheduler

      Public
      KAI Scheduler is an open source Kubernetes Native scheduler for AI workloads at large scale
      Go
      1271k2464Updated Dec 25, 2025Dec 25, 2025
    • NV-Kernels

      Public
      Ubuntu kernels which are optimized for NVIDIA server systems
      C
      497207Updated Dec 25, 2025Dec 25, 2025
    • NVSentinel

      Public
      NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments
      Go
      321383246Updated Dec 25, 2025Dec 25, 2025
    • edk2

      Public
      NVIDIA fork of tianocore/edk2
      C
      1625015Updated Dec 25, 2025Dec 25, 2025
    • gpu-operator

      Public
      NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
      Go
      4312.5k9467Updated Dec 25, 2025Dec 25, 2025
    • recsys-examples

      Public
      Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
      Python
      39195388Updated Dec 25, 2025Dec 25, 2025
    • TileGym

      Public
      Helpful kernel tutorials and examples for tile-based GPU programming
      Python
      2950103Updated Dec 25, 2025Dec 25, 2025