Towards Open-Source Next-Generation Video Generalist
UniVA (Universal Video Agent) is an open-source, next-generation video generalist system that enables you to plan, compose, and produce videos through natural language instructions. UniVA acts as your intelligent video director, iterating shots and stories with you through an agentic, proactive workflow.
- Multi-round co-creation: Talk like a director; UniVA iterates shots & stories with you
- Deep memory & context: Global + user memory keep preferences, lore, and styles consistent
- Implicit intent reading: Understands vague & evolving instructions; less prompt hacking
- Proactive agent: Auto plans, checks, and suggests better shots & stories, not just obeys
- End-to-end workspace: UniVA plans, calls tools, and delivers full videos
- Universal video fabric: Text / Image / Entity / Video β controllable video in one framework
- Any-conditioned pipeline: Supports super HD & consistent, cinematic quality with stable identity & objects
- Complex narratives: Multi-scene, multi-role, multi-shot stories under structured control
- Ultra-long & fine-grained editing: From long-form cuts to per-shot/per-object refinement
- Grounded by understanding: Long-video comprehension & segmentation guide generation & edits
- MCP-native: Modular design, easy to extend with new models & tools
- Industrial quality: Production-ready video generation capabilities
UniVA consists of two main components:
- Plan Agent: High-level planning and task decomposition
- Act Agent: Execution of specific video generation tasks
- MCP Tools: Modular tools for video processing, generation, and editing
- FastAPI Server: RESTful API for client communication
- Web Interface: User-friendly chat interface
- Video Editor: Timeline-based video editing capabilities
- Project Management: Save and manage video projects
- Authentication: User management and access control
- Python: 3.10 or higher
- Node.js: 18.0 or higher (only if using the web frontend)
- Bun: 1.2.18 or higher (only if using the web frontend)
- CUDA: Recommended for GPU acceleration (optional but recommended)
The backend is the core UniVA agent system. You can use it standalone without the frontend.
git clone https://github.com/univa-agent/univa
cd univapip install -r requirements_simple.txtAnd using the project configuration:
pip install -e .Copy the example configuration file to create your local environment configuration:
cp .env.example .envEdit the .env file to set your API keys, model preferences, and local paths. This file serves as the central configuration for UniVA.
A. Core Agent Models (Planning & Acting) Configure the LLMs used by the main agents. You can use OpenAI, DeepSeek, Qwen, or local models.
# Plan Agent (High-level reasoning)
PLAN_MODEL_PROVIDER=openai
PLAN_MODEL_ID=gpt-5
PLAN_MODEL_API_KEY=your-api-key
# Act Agent (Execution & Tool use)
ACT_MODEL_PROVIDER=openai
ACT_MODEL_ID=gpt-5
ACT_MODEL_API_KEY=your-api-keyB. MCP Tools Configuration Set API keys for the tools used by UniVA (e.g., Image/Video generation).
# OpenAI API Key for tools using LLMs (e.g., query_llm)
LLM_OPENAI_API_KEY=sk-...
# Wavespeed API Key for generation tools (image, video, audio)
WAVESPEED_API_KEY=your-wavespeed-keyC. Local Model Paths (Optional) If you are running local models for video editing or understanding, specify their absolute paths here. These will override the default settings without needing to modify code.
# Video Editing (e.g., Wan2.1)
VIDEO_EDIT_MODEL_PATH=/abs/path/to/Wan2.1-VACE-1.3B
# Video Understanding (e.g., Qwen2.5-VL)
VIDEO_UNDERSTAND_MODEL_PATH=/abs/path/to/Qwen2.5-VL-32B-InstructD. System Settings
# Authentication & Admin (optional)
AUTH_ENABLED=False
ADMIN_ACCESS_CODE=your-secret-codeNote: Variables defined in
.envwill override the defaults inuniva/config/config.pyanduniva/config/mcp_tools_config/config.yaml.
Edit univa/config/mcp_configs.json to configure your MCP (Model Context Protocol) servers:
{
"mcpServers": {
"video-tools": {
"command": "python",
"args": ["-m", "univa.mcp_tools.video_server"],
"env": {}
}
}
}You have two options to use UniVA backend:
If you want to use UniVA locally without a web interface, you can directly use the command-line interface:
python univa/univa_agent.pyThis will start an interactive command-line session where you can chat with UniVA directly in your terminal.
If you want to use the web interface or access UniVA via API:
cd univa
python univa_server.pyThe backend API will be available at http://localhost:8000.
curl http://localhost:8000/healthYou should receive a response indicating the server is healthy.
The frontend provides a web-based interface for interacting with UniVA. If you only need the backend API, you can skip this section.
bun installCopy the example environment file and configure it:
cd apps/web
cp .env.example .env.local# From the project root
bun run dev
# Or from apps/web
cd apps/web
bun run devThe frontend will be available at http://localhost:3000.
We welcome contributions from the community! Whether you're fixing bugs, adding new features, improving documentation, or sharing your use cases, your contributions are valuable.
- π Bug fixes and issue resolution
- β¨ New features and enhancements
- π Documentation improvements
- π¨ UI/UX improvements
- π§ͺ Test coverage
- π Internationalization
- π§ New MCP tools and integrations
If you use UniVA in your research or project, please cite our paper:
@misc{liang2025univauniversalvideoagent,
title={UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist},
author={Zhengyang Liang and Daoan Zhang and Huichi Zhou and Rui Huang and Bobo Li and Yuechen Zhang and Shengqiong Wu and Xiaohan Wang and Jiebo Luo and Lizi Liao and Hao Fei},
year={2025},
eprint={2511.08521},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.08521},
}We would like to express our gratitude to the following:
-
OpenCut: Our frontend is built upon and adapted from the OpenCut project. We deeply appreciate their outstanding work and significant contributions to the open-source video editing community.
-
Open-Source Community: We thank all contributors and the broader open-source community for their continuous support, feedback, and contributions to this project.