Sonic

Sonic: Shifting Focus to Global Audio Perception in Portrait Animation

🔥🔥🔥 NEWS

2025/02/08: Many thanks to the open-source community contributors for making the ComfyUI version of Sonic a reality. Your efforts are truly appreciated! ComfyUI version of Sonic

2025/02/06: Commercialization: Note that our license is non-commercial. If commercialization is required, please use Tencent Cloud Video Creation Large Model: Introduction / API documentation

2025/01/17: Our Online huggingface Demo is released.

2025/01/17: Thank you to NewGenAI for promoting our Sonic and creating a Windows-based tutorial on YouTube.

2024/12/16: Our Online Demo is released.

🎥 Demo

Input	Output	Input	Output
	anime1.mp4		female_diaosu.mp4
	hair.mp4		leonnado.mp4

For more visual demos, please visit our Page.

🧩 Community Contributions

If you develop/use Sonic in your projects, welcome to let us know.

ComfyUI version of Sonic: ComfyUI_Sonic

📑 Updates

2025/01/14: Our inference code and weights are released. Stay tuned, we will continue to polish the model.

📜 Requirements

An NVIDIA GPU with CUDA support is required.
- The model is tested on a single 32G GPU.
Tested operating system: Linux

🔑 Inference

Installtion

install pytorch

  pip3 install -r requirements.txt

All models are stored in checkpoints by default, and the file structure is as follows

Sonic
  ├──checkpoints
  │  ├──Sonic
  │  │  ├──audio2bucket.pth
  │  │  ├──audio2token.pth
  │  │  ├──unet.pth
  │  ├──stable-video-diffusion-img2vid-xt
  │  │  ├──...
  │  ├──whisper-tiny
  │  │  ├──...
  │  ├──RIFE
  │  │  ├──flownet.pkl
  │  ├──yoloface_v5m.pt
  ├──...

Download by huggingface-cli follow

  python3 -m pip install "huggingface_hub[cli]"
  huggingface-cli download LeonJoe13/Sonic --local-dir  checkpoints
  huggingface-cli download stabilityai/stable-video-diffusion-img2vid-xt --local-dir  checkpoints/stable-video-diffusion-img2vid-xt
  huggingface-cli download openai/whisper-tiny --local-dir checkpoints/whisper-tiny

or manully download pretrain model, svd-xt and whisper-tiny to checkpoints/

Run demo

  python3 demo.py \
  '/path/to/input_image' \
  '/path/to/input_audio' \
  '/path/to/output_video'

python3 demo.py input/girl_01.png input/audio/hello.m4a output/girl_01.mp4
python3 demo.py input/girl_02.png input/audio/hello.m4a output/girl_02.mp4
python3 demo.py input/girl_03.png input/audio/hello.m4a output/girl_03.mp4
python3 demo.py input/girl_04.png input/audio/hello.m4a output/girl_04.mp4
python3 demo.py input/girl_05.png input/audio/hello.m4a output/girl_05.mp4

python3 demo.py input/boy1.png input/audio/dean.m4a output/boy1.mp4
python3 demo.py input/boy2.png input/audio/doctor.m4a output/boy2.mp4
python3 demo.py input/boy3.png input/audio/doctor.m4a output/boy3.mp4
python3 demo.py input/boy4.png input/audio/doctor.m4a output/boy4.mp4
python3 demo.py input/boy5.png input/audio/doctor.m4a output/boy5.mp4

🔗 Citation

If you find our work helpful for your research, please consider citing our work.

@article{ji2024sonic,
  title={Sonic: Shifting Focus to Global Audio Perception in Portrait Animation},
  author={Ji, Xiaozhong and Hu, Xiaobin and Xu, Zhihong and Zhu, Junwei and Lin, Chuming and He, Qingdong and Zhang, Jiangning and Luo, Donghao and Chen, Yi and Lin, Qin and others},
  journal={arXiv preprint arXiv:2411.16331},
  year={2024}
}

@article{ji2024realtalk,
  title={Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network},
  author={Ji, Xiaozhong and Lin, Chuming and Ding, Zhonggan and Tai, Ying and Zhu, Junwei and Hu, Xiaobin and Luo, Donghao and Ge, Yanhao and Wang, Chengjie},
  journal={arXiv preprint arXiv:2406.18284},
  year={2024}
}

📜 Related Works

Explore our related researches:

[Super-fast talk：real-time and less GPU computation] Realtalk: Real-time and realistic audio-driven face generation with 3d facial prior-guided identity alignment network

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
config/inference		config/inference
examples		examples
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
demo.py		demo.py
demo.sh		demo.sh
gradio_app.py		gradio_app.py
predict.py		predict.py
replicate.md		replicate.md
requirements.txt		requirements.txt
sonic.py		sonic.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sonic

🔥🔥🔥 NEWS

🎥 Demo

🧩 Community Contributions

📑 Updates

📜 Requirements

🔑 Inference

Installtion

Run demo

🔗 Citation

📜 Related Works

📈 Star History

About

Uh oh!

Releases

Packages

Languages

License

clairema0418/Sonic

Folders and files

Latest commit

History

Repository files navigation

Sonic

🔥🔥🔥 NEWS

🎥 Demo

🧩 Community Contributions

📑 Updates

📜 Requirements

🔑 Inference

Installtion

Run demo

🔗 Citation

📜 Related Works

📈 Star History

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages