Stars
A lightweight design for computation-communication overlap.
An easy to use PyTorch to TensorRT converter
Implementation of popular deep learning networks with TensorRT network definition API
Representation and Reference Lowering of ONNX Models in MLIR Compiler Infrastructure
Composable transformations of Python+NumPy programs: differentiate, vectorize, JIT to GPU/TPU, and more
A JIT assembler for x86/x64 architectures supporting FPU, MMX, SSE (1-4), AVX (1-2, 512), APX, and AVX10.2
Deploy your model with TensorRT quickly.
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
MindSpore is a new open source deep learning training/inference framework that could be used for mobile, edge and cloud scenarios.
Python library for Room Impulse Response (RIR) simulation with GPU acceleration
a language for fast, portable data-parallel computation
A simple tool to profile performance of multiple combinations of GEMM of cuBLAS
[ARCHIVED] The C++ parallel algorithms library. See https://github.com/NVIDIA/cccl
🤖 💬 Deep learning for Text to Speech (Discussion forum: https://discourse.mozilla.org/c/tts)
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba. Full multimodal LLM Android App:[MNN-LLM-Android](./apps/Android/MnnLlmChat/READ…
Kernel Fusion and Runtime Compilation Based on NNVM
A simple memory manager for CUDA designed to help Deep Learning frameworks manage memory
In-Place Activated BatchNorm for Memory-Optimized Training of DNNs
Reference implementation of real-time autoregressive wavenet inference
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
Facebook AI Research's Automatic Speech Recognition Toolkit