LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Haojie Yu* · Zhaonian Wang* · Yihan Pan* · Meng Cheng · Hao Yang · Chao Wang · Tao Xie · Xiaoming Xu^✉ · Xiaoming Wei · Xunliang Cai

^*Equal Contribution ^✉Corresponding Authors

TL; DR: LLIA is a real-time audio-driven portrait video generation with diffusion models, enabling low-latency interactive avatars.

001.mp4

Video Demos

001.mp4

002.mp4

003.mp4

🔆 Introduction

We propose LLIA , a novel audio-driven portrait video generation framework based on the diffusion model. Our approach achieves low-latency, fluid, and authentic two-way communication. On an NVIDIA RTX 4090D, our model achieves a maximum of 78 FPS at a resolution of 384 × 384 and 45 FPS at a resolution of 512 × 512, with an initial video generation latency of 140 ms and 215 ms, respectively

🔥 Latest News

June 9, 2025: 👋 We release the Technique-Report of LLIA
June 9, 2025: 👋 We release the project page of LLIA

📑 Todo List

Release the technical report
Inference
Checkpoints

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Video Demos

🔆 Introduction

🔥 Latest News

📑 Todo List

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

MeiGen-AI/llia

Folders and files

Latest commit

History

Repository files navigation

LLIA - Enabling Low-Latency Interactive Avatars: Real-Time Audio-Driven Portrait Video Generation with Diffusion Models

Video Demos

🔆 Introduction

🔥 Latest News

📑 Todo List

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages