✨✨[NeurIPS 2025] VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction
multimodal-large-language-models large-multimodal-models omni-modal-video-understanding omni-language-model omni-model
-
Updated
Mar 28, 2025 - Python