Stars
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Project Page for "LISA: Reasoning Segmentation via Large Language Model"
Coherent Video Inpainting Using Optical Flow-Guided Efficient Diffusion
Foundational model for human-like, expressive TTS
[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[CVPR 2024] DeepCache: Accelerating Diffusion Models for Free
[ICCV 2023] ProPainter: Improving Propagation and Transformer for Video Inpainting
Official pytorch implementation of "Diffuse3D: Wide-Angle 3D Photography via Bilateral Diffusion"
pix2tex: Using a ViT to convert images of equations into LaTeX code.
Segment Anything in Medical Images
[NeurIPS 2022] Towards Robust Blind Face Restoration with Codebook Lookup Transformer
Official implementation of "Segment Any Anomaly without Training via Hybrid Prompt Regularization (SAA+)".
A multi-voice TTS system trained with an emphasis on quality
This repository contains the codes of "A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild", published at ACM Multimedia 2020. For HD commercial model, please try out Sync Labs
Tiled Diffusion and VAE optimize, licensed under CC BY-NC-SA 4.0
Official Code for DragGAN (SIGGRAPH 2023)
The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoints, and example notebooks that show how to use the model.
Port of OpenAI's Whisper model in C/C++
[ECCV 2022] StyleHEAT: A framework for high-resolution editable talking face generation
NVIDIA's Deep Imagination Team's PyTorch Library
A Technical Analysis Bot that trades leveraged USDT futures markets on Binance.
Official repository of NeuMan: Neural Human Radiance Field from a Single Video (ECCV 2022)
Instant neural graphics primitives: lightning fast NeRF and more