ComfyUI-Ovi

Custom nodes that bring Character.AI's Ovi video+audio generator to ComfyUI with streamlined setup, selectable precision, attention-backend control, and per-node device targeting for multi-GPU rigs.

Key Features

Self-bootstrapping loader – downloads MMAudio assets and the chosen Ovi fusion weights to custom_nodes/ComfyUI-Ovi/ckpts, then places the OVI safetensors in models/diffusion_models/.
Precision toggle – pick Ovi-11B BF16 or the 16-24 GB-friendly Ovi-11B FP8 from a dropdown; each mode is cached separately.
Optional CPU offload – move large modules to system RAM when VRAM is tight.
Attention selector – flip between auto, FlashAttention, SDPA, Sage, etc., at runtime.
Component reuse – keep using the Wan 2.2 VAE and UMT5 text encoder you already installed (no duplicated files!); the component loader will point to the correct location (see below).
Dloader - auto-detects available GPUs; single-GPU setups hide the selector, while multi-GPU rigs get a dropdown to pick the target device.

Requirements

GPU: 16-24 GB (FP8 + offload) or >32 GB+ (BF16 without offload)
CUDA stack: PyTorch 2.4+, CUDA 12.x driver/runtime

Installation

# from ComfyUI/custom_nodes
git clone https://github.com/snicolast/ComfyUI-Ovi.git
cd ComfyUI-Ovi
pip install -r requirements.txt

Restart ComfyUI after installing.

Weights Overview

Handled automatically by Ovi Engine Loader:

MMAudio/ext_weights/{best_netG.pt, v1-16.pth}
Ovi-11B-bf16.safetensors or Ovi-11B-fp8.safetensors (renamed and parked in models/diffusion_models/)

Provide manually if missing (usually already present in Wan setups). Choose only one text encoder:

BF16 (>32 GB VRAM) -> umt5-xxl-enc-bf16.safetensors → models/text_encoders
FP8 (16-24 GB VRAM) -> umt5-xxl-enc-fp8_e4m3fn.safetensors → models/text_encoders
wan2.2_vae.safetensors → models/vae

Directory sketch:

ComfyUI/
├── models/
│   ├── diffusion_models/
│   │   ├── Ovi-11B-bf16.safetensors
│   │   └── Ovi-11B-fp8.safetensors
│   ├── text_encoders/umt5-xxl-enc-bf16.safetensors
│   └── vae/wan2.2_vae.safetensors
└── custom_nodes/ComfyUI-Ovi/ckpts/MMAudio/ext_weights/...

Available Nodes

Node	Description
Ovi Engine Loader	Downloads missing weights, builds the fusion engine, exposes `OVI_ENGINE`, lets you pick precision/offload/device (multi-GPU ready).
Ovi Wan Component Loader	Optional override so you can point the engine to Wan VAE + UMT5 files stored elsewhere.
Ovi Attention Selector	Switch attention backend for the live engine (`auto`, FlashAttention, SDPA, xFormers, native, etc.).
Ovi Video Generator	Runs `OviFusionEngine.generate` and emits video/audio latents plus the engine handle for downstream nodes.
Ovi Latent Decoder	Takes those latents + engine, performs the VAE decode, and outputs `IMAGE` frames with accompanying `AUDIO` (16 kHz).

All nodes live under the Ovi category in the ComfyUI search dialog.

Quick Start Workflow

Load engine - drop Ovi Engine Loader, choose precision (BF16 vs FP8), enable CPU offload if you are on a 24 GB GPU, select device.
(Optional) Add Wan components - connect Ovi Wan Component Loader if your VAE/encoder live outside the default folders.
Tune attention - insert Ovi Attention Selector to lock a backend, otherwise leave on auto.
Generate - wire the engine into Ovi Video Generator, enter your prompt (supports <S>...<E> speech and <AUDCAP>...<ENDAUDCAP> audio tags), optionally feed a first-frame image.
Decode - add Ovi Latent Decoder, feed it the engine passthrough plus both latents to obtain IMAGE + AUDIO.
Export - connect those outputs to your preferred save nodes.

Tips & Troubleshooting

High VRAM after a run – use ComfyUI’s Unload Models; the engine rebuilds itself automatically on next use.
Missing weights – place the files manually in the paths above; the loader skips network calls when files already exist.
Switching precision – changing the dropdown spins up a new cached engine; you can hot-swap without restarting ComfyUI.
Backend errors – if FlashAttention/xFormers is unavailable, the selector falls back to native; check the console log for details.

Credits

Ovi: Twin Backbone Cross-Modal Fusion for Audio-Video Generation (Character.AI)
Wan 2.2 VAE, MMAudio, UMT5 ecosystem maintainers
Kijai

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
images		images
nodes		nodes
ovi		ovi
workflow_example		workflow_example
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

ComfyUI-Ovi

Key Features

Requirements

Installation

Weights Overview

Available Nodes

Quick Start Workflow

Tips & Troubleshooting

Credits

About

Uh oh!

Releases

Packages

Languages

Uh oh!

Uh oh!

snicolast/ComfyUI-Ovi

Folders and files

Latest commit

History

Repository files navigation

ComfyUI-Ovi

Key Features

Requirements

Installation

Weights Overview

Available Nodes

Quick Start Workflow

Tips & Troubleshooting

Credits

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages