Highlights
- Pro
MLLM
✨✨Latest Advances on Multimodal Large Language Models
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Awesome-LLM-3D: a curated list of Multi-modal Large Language Model in 3D world Resources
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
Official code for "Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning".
Official implementation of ICCV 2023 paper "3D-VisTA: Pre-trained Transformer for 3D Vision and Text Alignment"
[ECCV 2024] M3DBench introduces a comprehensive 3D instruction-following dataset with support for interleaved multi-modal prompts.
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model
JARVIS-1: Open-world Multi-task Agents with Memory-Augmented Multimodal Language Models
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.