โจโจLatest Advances on Multimodal Large Language Models
-
Updated
Nov 6, 2025
โจโจLatest Advances on Multimodal Large Language Models
ใEMNLP 2024๐ฅใVideo-LLaVA: Learning United Visual Representation by Alignment Before Projection
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
ใTMM 2025๐ฅใ Mixture-of-Experts for Large Vision-Language Models
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
๐ฅ ๐ฅ ๐ฅ [NeurIPS 2024] Official Implementation of Hawk: Learning to Understand Open-World Video Anomalies
[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"
[ECCV 2024] API: Attention Prompting on Image for Large Vision-Language Models
โจโจlatest advancements in VLA models(VIsion Language Action)
[NeurIPS'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models
๐Official code for our paper: "VL-Uncertainty: Detecting Hallucination in Large Vision-Language Model via Uncertainty Estimation".
Awesome Large Vision-Language Model: A Curated List of Large Vision-Language Model
This is the offical repository of LLAVIDAL
[CVPR 2024 Highlight] The first benchmark for lithic use-wear analysis leveraging SOTA vision and vision-language models (DINOv2, GPT-4V), demonstrating AI performance surpassing that of expert archaeologists.
Source code of our paper "Transferring Textual Preferences to Vision-Language Understanding through Model Merging", ACL 2025
Easy-to-use large vision language model pipeline for quantitative analysis
Code release for THRONE, a CVPR 2024 paper on measuring object hallucinations in LVLM generated text.
Official implementation of TCSVT 2025 paper: DiViCo: Disentangled Visual Token Compression For Efficient Large Vision-Language Model
๐ Experiment with neural networks for binary classification on multimodal data using this extensible PyTorch framework.
๐ค Transform speech and text with this lightweight Python toolkit for transcription, analysis, and audio conversion tasks.
Add a description, image, and links to the large-vision-language-model topic page so that developers can more easily learn about it.
To associate your repository with the large-vision-language-model topic, visit your repo's landing page and select "manage topics."