Codestin Search App

cccl

Public

CUDA Core Compute Libraries

cpp hpc gpumodern-cpp parallel-computing cuda nvidia gpu-acceleration cuda-kernels gpu-computing

C++

•

Other

•309•2.1k•1.1k•202•Updated

Dec 19, 2025

cloudai

Public

CloudAI Benchmark Framework

Python

•

Apache License 2.0

•40•77•1•6•Updated

Dec 19, 2025

Megatron-LM

Public

Ongoing research training transformer models at scale

transformers model-para large-language-models

Python

•

Other

•3.4k•15k•344•250•Updated

Dec 19, 2025

A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.

Python

•

Apache License 2.0

•218•1.7k•53•54•Updated

Dec 19, 2025

edk2

Public

NVIDIA fork of tianocore/edk2

C

•

Other

•16•26•0•15•Updated

Dec 19, 2025

nsight-python

Public

Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools

Python

•

Apache License 2.0

•6•75•5•1•Updated

Dec 19, 2025

phosphor-user-manager

Public

C++

•

Apache License 2.0

•11•1•0•0•Updated

Dec 19, 2025

TensorRT-LLM

Public

TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.

cuda pytorch moeblackwell llm-serving

Python

•

Other

•2k•12k•536•482•Updated

Dec 19, 2025

numba-cuda

Public

The CUDA target for Numba

Python

•

BSD 2-Clause "Simplified" License

•51•233•99•25•Updated

Dec 19, 2025

cuEquivariance

Public

cuEquivariance is a math library that is a collective of low-level primitives and tensor ops to accelerate widely-used models, like DiffDock, MACE, Allegro and NEQUIP, based on equivariant neural networks. Also includes kernels for accelerated structure prediction.

Python

•23•336•12•5•Updated

Dec 19, 2025

cuda-quantum

Public

C++ and Python support for the CUDA Quantum programming model for heterogeneous quantum-classical workflows

python cpp quantumquantum-computing hacktoberfest quantum-programming-language quantum-algorithms quantum-machine-learning unitaryhack

C++

•

Other

•313•875•405•80•Updated

Dec 19, 2025

JAX-Toolbox

Public

JAX-Toolbox

Python

•

Apache License 2.0

•68•368•80•39•Updated

Dec 19, 2025

gpu-operator

Public

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes

kubernetes gpu cudanvidia

Go

•

Apache License 2.0

•431•2.5k•94•82•Updated

Dec 19, 2025

doca-platform

Public

DOCA Platform manages provisioning and service orchestration for Bluefield DPUs

Go

•

Apache License 2.0

•16•64•0•0•Updated

Dec 19, 2025

NVSentinel

Public

NVSentinel is a cross-platform fault remediation service designed to rapidly remediate runtime node-level issues in GPU-accelerated computing environments

Go

•

Apache License 2.0

•29•127•33•8•Updated

Dec 19, 2025

DALI

Public

A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.

python machine-learning deep-learningneural-network mxnet gpu image-processing pytorch gpu-tensorflow data-processing

C++

•

Apache License 2.0

•655•5.6k•222•31•Updated

Dec 19, 2025

AMGX

Public

Distributed multigrid linear solver library on GPU

Cuda

•166•628•111•2•Updated

Dec 19, 2025

NeMo-Agent-Toolkit

Public

The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.

Python

•

Apache License 2.0

•462•1.6k•61•35•Updated

Dec 19, 2025

Fuser

Public

A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")

C++

•

Other

•74•366•209•216•Updated

Dec 19, 2025

TensorRT-Incubator

Public

Experimental projects related to TensorRT

MLIR

•22•117•37•12•Updated

Dec 19, 2025

earth2studio

Public

Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.

weather ai deep-learningclimate-science

Python

•

Apache License 2.0

•85•316•10•7•Updated

Dec 19, 2025

TileGym

Public

Helpful kernel tutorials and examples for tile-based GPU programming

Python

•

Other

•22•455•0•0•Updated

Dec 19, 2025

stdexec

Public

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.

C++

•

Apache License 2.0

•222•2.1k•114•12•Updated

Dec 19, 2025

OSMO

Public

The developer-first platform for scaling complex Physical AI workloads across heterogeneous compute—unifying training GPUs, simulation clusters, and edge devices in a simple YAML

Python

•

Apache License 2.0

•6•61•14•12•Updated

Dec 19, 2025

bionemo-framework

Public

BioNeMo Framework: For building and adapting AI models in drug discovery at scale

machine-learning gpu pytorchdrug-discovery

Jupyter Notebook

•108•606•60•109•Updated

Dec 19, 2025

multi-storage-client

Public

Unified high-performance Python client for object and file stores.

Python

•

Apache License 2.0

•8•52•1•0•Updated

Dec 19, 2025

cutlass

Public

CUDA Templates and Python DSLs for High-Performance Linear Algebra

python deep-learning cppgpu cuda nvidia deep-learning-library

C++

•

Other

•1.6k•9k•411•95•Updated

Dec 19, 2025

skyhook

Public

A Kubernetes Operator to manage Node OS customizations.

Go

•

Apache License 2.0

•3•34•0•2•Updated

Dec 19, 2025

nv-ingest

Public

NeMo Retriever extraction is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.

Python

•

Apache License 2.0

•281•2.8k•101•31•Updated

Dec 19, 2025

TransformerEngine

Public

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.

python machine-learning deep-learninggpu cuda pytorch jax fp8 fp4

Python

•

Apache License 2.0

•583•3k•280•101•Updated

Dec 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NVIDIA Corporation

All

All

642 repositories

cccl

cloudai

Megatron-LM

Model-Optimizer

edk2

nsight-python

phosphor-user-manager

TensorRT-LLM

numba-cuda

cuEquivariance

cuda-quantum

JAX-Toolbox

gpu-operator

doca-platform

NVSentinel

DALI

AMGX

NeMo-Agent-Toolkit

Fuser

TensorRT-Incubator

earth2studio

TileGym

stdexec

OSMO

bionemo-framework

multi-storage-client

cutlass

skyhook

nv-ingest

TransformerEngine

All

All

Repositories list

642 repositories