Thanks to visit codestin.com
Credit goes to github.com

Skip to content
/ llia Public

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

MeiGen-AI/llia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Haojie Yu* · Zhaonian Wang* · Yihan Pan* · Meng Cheng · Hao Yang · Chao Wang · Tao Xie · Xiaoming Xu · Xiaoming Wei · Xunliang Cai

*Equal Contribution Corresponding Authors

TL; DR: LLIA is a real-time audio-driven portrait video generation with diffusion models, enabling low-latency interactive avatars.

001.mp4

Video Demos

001.mp4
002.mp4
003.mp4

🔆 Introduction

We propose LLIA , a novel audio-driven portrait video generation framework based on the diffusion model. Our approach achieves low-latency, fluid, and authentic two-way communication. On an NVIDIA RTX 4090D, our model achieves a maximum of 78 FPS at a resolution of 384 × 384 and 45 FPS at a resolution of 512 × 512, with an initial video generation latency of 140 ms and 215 ms, respectively

🔥 Latest News

📑 Todo List

  • Release the technical report
  • Inference
  • Checkpoints

About

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published