Towards Open-Source Next-Generation Video Generalist
UniVA (Universal Video Agent) is an open-source, next-generation video generalist system that enables you to plan, compose, and produce videos through natural language instructions. UniVA acts as your intelligent video director, iterating shots and stories with you through an agentic, proactive workflow.
- Multi-round co-creation: Talk like a director; UniVA iterates shots & stories with you
- Deep memory & context: Global + user memory keep preferences, lore, and styles consistent
- Implicit intent reading: Understands vague & evolving instructions; less prompt hacking
- Proactive agent: Auto plans, checks, and suggests better shots & stories, not just obeys
- End-to-end workspace: UniVA plans, calls tools, and delivers full videos
- Universal video fabric: Text / Image / Entity / Video β controllable video in one framework
- Any-conditioned pipeline: Supports super HD & consistent, cinematic quality with stable identity & objects
- Complex narratives: Multi-scene, multi-role, multi-shot stories under structured control
- Ultra-long & fine-grained editing: From long-form cuts to per-shot/per-object refinement
- Grounded by understanding: Long-video comprehension & segmentation guide generation & edits
- MCP-native: Modular design, easy to extend with new models & tools
- Industrial quality: Production-ready video generation capabilities
- Python: 3.10 or higher
- Node.js: 18.0 or higher (only if using the web frontend)
- Bun: 1.2.18 or higher (only if using the web frontend)
- CUDA: Recommended for GPU acceleration (optional but recommended)
The backend is the core UniVA agent system. You can use it standalone without the frontend.
git clone https://github.com/univa-agent/univa
cd univapip install -r requirements.txtOr using the project configuration:
pip install -e .Edit the configuration file to set your API keys and preferences:
# Model configuration for Plan Agent
plan_model_id = "gpt-4"
plan_model_api_key = "your-openai-api-key"
# Model configuration for Act Agent
act_model_id = "gpt-4"
act_model_api_key = "your-openai-api-key"
# MCP servers configuration path
mcp_servers_config = "/path/to/your/univa/config/mcp_configs.json"
# Authentication (optional)
auth_enabled = true
admin_access_code = "your-admin-code"Edit univa/config/mcp_configs.json to configure your MCP (Model Context Protocol) servers:
{
"mcpServers": {
"video-tools": {
"command": "python",
"args": ["-m", "univa.mcp_tools.video_server"],
"env": {}
}
}
}You have two options to use UniVA backend:
If you want to use UniVA locally without a web interface, you can directly use the command-line interface:
python univa/univa_agent.pyThis will start an interactive command-line session where you can chat with UniVA directly in your terminal.
If you want to use the web interface or access UniVA via API:
cd univa
python univa_server.pyThe backend API will be available at http://localhost:8000.
curl http://localhost:8000/healthYou should receive a response indicating the server is healthy.
The frontend provides a web-based interface for interacting with UniVA. If you only need the backend API, you can skip this section.
bun installCopy the example environment file and configure it:
cd apps/web
cp .env.example .env.local# From the project root
bun run dev
# Or from apps/web
cd apps/web
bun run devThe frontend will be available at http://localhost:3000.
UniVA consists of two main components:
- Plan Agent: High-level planning and task decomposition
- Act Agent: Execution of specific video generation tasks
- MCP Tools: Modular tools for video processing, generation, and editing
- FastAPI Server: RESTful API for client communication
- Web Interface: User-friendly chat interface
- Video Editor: Timeline-based video editing capabilities
- Project Management: Save and manage video projects
- Authentication: User management and access control
We welcome contributions from the community! Whether you're fixing bugs, adding new features, improving documentation, or sharing your use cases, your contributions are valuable.
- π Bug fixes and issue resolution
- β¨ New features and enhancements
- π Documentation improvements
- π¨ UI/UX improvements
- π§ͺ Test coverage
- π Internationalization
- π§ New MCP tools and integrations
If you use UniVA in your research or project, please cite our paper:
@misc{liang2025univauniversalvideoagent,
title={UniVA: Universal Video Agent towards Open-Source Next-Generation Video Generalist},
author={Zhengyang Liang and Daoan Zhang and Huichi Zhou and Rui Huang and Bobo Li and Yuechen Zhang and Shengqiong Wu and Xiaohan Wang and Jiebo Luo and Lizi Liao and Hao Fei},
year={2025},
eprint={2511.08521},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.08521},
}We would like to express our gratitude to the following:
-
OpenCut: Our frontend is built upon and adapted from the OpenCut project. We deeply appreciate their outstanding work and significant contributions to the open-source video editing community.
-
Open-Source Community: We thank all contributors and the broader open-source community for their continuous support, feedback, and contributions to this project.