An open source implementation of CLIP.
-
Updated
Sep 11, 2025 - Python
An open source implementation of CLIP.
An open source implementation of CLIP.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
The implementation of "Prismer: A Vision-Language Model with Multi-Task Experts".
A concise but complete implementation of CLIP with various experimental improvements from recent papers
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
A curated list of Visual Question Answering(VQA)(Image/Video Question Answering),Visual Question Generation ,Visual Dialog ,Visual Commonsense Reasoning and related area.
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI
Build high-performance AI models with modular building blocks
Build high-performance AI models with modular building blocks
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
CVPR 2023-2024 Papers: Dive into advanced research presented at the leading computer vision conference. Keep up to date with the latest developments in computer vision and deep learning. Code included. ⭐ support visual intelligence development!
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey
[CVPR 2024] Official PyTorch Code for "PromptKD: Unsupervised Prompt Distillation for Vision-Language Models"
Add a description, image, and links to the multi-modal-learning topic page so that developers can more easily learn about it.
To associate your repository with the multi-modal-learning topic, visit your repo's landing page and select "manage topics."