Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Change the repository type filter

All

    Repositories list

    • TensorRT-LLM

      Public
      TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
      Python
      2k13k513468Updated Jan 11, 2026Jan 11, 2026
    • Megatron-LM

      Public
      Ongoing research training transformer models at scale
      Python
      3.5k15k309249Updated Jan 11, 2026Jan 11, 2026
    • warp

      Public
      A Python framework for accelerated simulation, data generation and spatial computing.
      Python
      4116k18510Updated Jan 11, 2026Jan 11, 2026
    • OSMO

      Public
      The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML
      Python
      6722311Updated Jan 10, 2026Jan 10, 2026
    • NeMo-Agent-Toolkit

      Public
      The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
      Python
      4801.7k6518Updated Jan 10, 2026Jan 10, 2026
    • Model-Optimizer

      Public
      A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
      Python
      2321.8k5664Updated Jan 10, 2026Jan 10, 2026
    • TransformerEngine

      Public
      A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
      Python
      6033.1k292105Updated Jan 10, 2026Jan 10, 2026
    • nv-ingest

      Public
      NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
      Python
      2852.8k10133Updated Jan 9, 2026Jan 9, 2026
    • hpc-container-maker

      Public
      HPC Container Maker
      Python
      98504144Updated Jan 9, 2026Jan 9, 2026
    • JAX-Toolbox

      Public
      JAX-Toolbox
      Python
      683758041Updated Jan 10, 2026Jan 10, 2026
    • numba-cuda

      Public
      The CUDA target for Numba
      Python
      5424010333Updated Jan 9, 2026Jan 9, 2026
    • cuEquivariance

      Public
      cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used models, like DiffDock, MACE, Allegro and NEQUIP, based on equivariant neural networks. Also includes kernels for accelerated structure prediction.
      Python
      24340145Updated Jan 9, 2026Jan 9, 2026
    • maxtext-jaxpp

      Public
      Showcase JaxPP with MaxText
      Python
      447302Updated Jan 9, 2026Jan 9, 2026
    • NVFlare

      Public
      NVIDIA Federated Learning Application Runtime Environment
      Python
      2298641515Updated Jan 9, 2026Jan 9, 2026
    • nvidia-resiliency-ext

      Public
      NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the effective training time by minimizing the downtime due to failures and interruptions.
      Python
      42247216Updated Jan 11, 2026Jan 11, 2026
    • earth2studio

      Public
      Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
      Python
      893261210Updated Jan 9, 2026Jan 9, 2026
    • trt-samples-for-hackathon-cn

      Public
      Simple samples for TensorRT programming
      Python
      3521.7k652Updated Jan 9, 2026Jan 9, 2026
    • makani

      Public
      Massively parallel training of machine-learning based weather and climate models
      Python
      6334533Updated Jan 9, 2026Jan 9, 2026
    • jaxpp

      Public
      JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training
      Python
      16211Updated Jan 9, 2026Jan 9, 2026
    • nsight-python

      Public
      Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
      Python
      78952Updated Jan 9, 2026Jan 9, 2026
    • cutile-python

      Public
      cuTile is a programming model for writing parallel kernels for NVIDIA GPUs
      Python
      951.8k236Updated Jan 9, 2026Jan 9, 2026
    • numbast

      Public
      Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
      Python
      18552810Updated Jan 9, 2026Jan 9, 2026
    • physicsnemo

      Public
      Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
      Python
      5402.3k3237Updated Jan 9, 2026Jan 9, 2026
    • cloudai

      Public
      CloudAI Benchmark Framework
      Python
      428118Updated Jan 9, 2026Jan 9, 2026
    • compute-eval

      Public
      Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Large Language Models.
      Python
      159114Updated Jan 8, 2026Jan 8, 2026
    • physicsnemo-curator

      Public
      PhysicsNeMo-Curator is a Python-based library designed to streamline and accelerate the process of data curation for engineering datasets.
      Python
      82951Updated Jan 8, 2026Jan 8, 2026
    • doca-sosreport

      Public
      A unified tool for collecting system logs and other debug information
      Python
      596503Updated Jan 8, 2026Jan 8, 2026
    • recsys-examples

      Public
      Examples for Recommenders - easy to train and deploy on accelerated infrastructure.
      Python
      41202419Updated Jan 8, 2026Jan 8, 2026
    • TileGym

      Public
      Helpful kernel tutorials and examples for tile-based GPU programming
      Python
      3255411Updated Jan 8, 2026Jan 8, 2026
    • dgxc-benchmarking

      Public
      DGXC Benchmarking provides recipes in ready-to-use templates for evaluating performance of specific AI use cases across hardware and software combinations.
      Python
      175430Updated Jan 8, 2026Jan 8, 2026