Stars
A latent text-to-image diffusion model
🔊 Text-Prompted Generative Audio Model
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable…
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
Using Low-rank adaptation to quickly fine-tune diffusion models.
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
[SIGGRAPH Asia 2022] VToonify: Controllable High-Resolution Portrait Video Style Transfer
Official Implementation for "Encoding in Style: a StyleGAN Encoder for Image-to-Image Translation" (CVPR 2021) presenting the pixel2style2pixel (pSp) framework
Kandinsky 2 — multilingual text2image latent diffusion model
pytorch implementation of openpose including Hand and Body Pose Estimation.
[CVPR 2022] Pastiche Master: Exemplar-Based High-Resolution Portrait Style Transfer
[ACM MM 2024] This is the official code for "AniTalker: Animate Vivid and Diverse Talking Faces through Identity-Decoupled Facial Motion Encoding"
Official PyTorch repo for JoJoGAN: One Shot Face Stylization
VOLO: Vision Outlooker for Visual Recognition
This is the PyTorch implementation of paper Real-time Facial Surface Geometry from Monocular Video on Mobile GPUs (https://arxiv.org/pdf/1907.06724.pdf)
Generate broll for a video using AI
Official repository of Manga109Dialog (ICME 2024)
This GitHub repository contains image attributes for a dataset of free-use stock photos.