Stars
A project that optimizes Whisper for low latency inference using NVIDIA TensorRT
PaddlePaddle GAN library, including lots of interesting applications like First-Order motion transfer, Wav2Lip, picture repair, image editing, photo2cartoon, image style transfer, GPEN, and so on.
Industry leading face manipulation platform
RWKV-X is a Linear Complexity Hybrid Language Model based on the RWKV architecture, integrating Sparse Attention to improve the model's long sequence processing capabilities.
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
🤖 The free, Open Source alternative to OpenAI, Claude and others. Self-hosted and local-first. Drop-in replacement for OpenAI, running on consumer-grade hardware. No GPU required. Runs gguf, transf…
🛸 Optimized Video Native Interface - The fastest video editing GPU-accelerated pipeline.
A High-performance cross-platform Video Processing Python framework powerpacked with unique trailblazing features 🔥
Python library for reading and writing image data
The cpp inference of BiRefNet based on Tensorrt.
Tensorrt implementation for ultra fast face restoration inside ComfyUI
Add support for quantization int4 for faster inference.
fastllm是后端无依赖的高性能大模型推理库。同时支持张量并行推理稠密模型和混合模式推理MOE模型,任意10G以上显卡即可推理满血DeepSeek。双路9004/9005服务器+单显卡部署DeepSeek满血满精度原版模型,单并发20tps;INT4量化模型单并发30tps,多并发可达60+。
Ultra fast dwpose estimation inside comfyui using tensorrt ⚡
Go RPC framework with high-performance and strong-extensibility for building micro-services.
Benchmark GPU inference performance of MobileNetV2: full-precision vs quantized (INT8) models using TensorRT
C++ TensorRT implementation of Depth-Anything V1, V2
This project provides a high-performance image and video upscaler using [RealESRGAN](https://github.com/xinntao/Real-ESRGAN), accelerated with NVIDIA TensorRT. It supports both 2x and 4x upscaling,…
This is the official repository for Fast-nnUNet, a new fast model inference framework based on the nnUNet framework implementation.
InsightFace REST API for easy deployment of face recognition services with TensorRT in Docker.
This repository demonstrates how to export a pre-trained ResNet18 model to ONNX, and then convert it to a TensorRT engine for fast inference.