lvlm

Star

Here are 29 public repositories matching this topic...

NVlabs / Eagle

Star

Eagle: Frontier Vision-Language Models with Data-Centric Strategies

demo eagle llama lmm nvdia huggingface gpt4 large-language-models llm mllm llava lvlm llama3

Updated Oct 25, 2025
Python

YingqingHe / Awesome-LLMs-meet-Multimodal-Generation

Star

🔥🔥🔥 A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).

text-to-speech multimodality text-to-image text-to-audio text-to-video text-to-music multimodal-models aigc large-language-models llm text-to-3d multimodal-generation mllm text-to-sound large-vision-language-models multimodal-large-language-models lvlm

Updated Apr 4, 2025
HTML

Hon-Wong / VoRA

Star

[Fully open] [Encoder-free MLLM] Vision as LoRA

lora vlm llm mllm lvlm

Updated Jun 12, 2025
Python

zhaochen0110 / OpenThinkIMG

Star

OpenThinkIMG is an end-to-end open-source framework that empowers LVLMs to think with images.

reinforcement-learning lvlm grpo vision-tool

Updated Jun 1, 2025
Jupyter Notebook

NishilBalar / Awesome-LVLM-Hallucination

Star

up-to-date curated list of state-of-the-art Large vision language models hallucinations research work, papers & resources

mlm hallucination large-language-models llm mllm large-vision-language-models multimodal-large-language-models hallucination-evaluation hallucination-detection vision-language-models lvlm hallucination-mitigation hallucination-survey hallucination-research hallucination-benchmark multimodal-language-model

Updated Oct 3, 2025

MMStar-Benchmark / MMStar

Star

[NeurIPS 2024] This repo contains evaluation code for the paper "Are We on the Right Way for Evaluating Large Vision-Language Models"

evaluation multimodality multimodal-learning visual-question-answering multimodal large-language-models llm llms large-vision-language-model large-vision-language-models large-multimodal-models lvlms lvlm

Updated Sep 26, 2024
Python

wang2226 / Awesome-LLM-Decoding

Star

📜 Paper list on decoding methods for LLMs and LVLMs

nlp natural-language-processing awesome awesome-list llm llm-inference lvlm

Updated Nov 7, 2025

thu-nics / FrameFusion

Star

[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"

video efficient-deep-learning llm lvlm

Updated Nov 24, 2025
Python

w1oves / hqclip

Star

[ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets

clip image-text mllm lvlm

Updated Aug 6, 2025

OpenSparseLLMs / CLIP-MoE

Star

CLIP-MoE: Mixture of Experts for CLIP

moe clip mixture-of-experts openai-clip lvlm

Updated Oct 10, 2024
Python

The-Martyr / Awesome-Multimodal-Reasoning

Star

Latest Advances on (RL based) Multimodal Reasoning and Generation in Multimodal Large Language Models

reinforcement-learning rl image-generation video-understanding r1 image-understanding multimodal-learning cot video-generation o1 video-reasoning large-language-models llm chain-of-thought mllm lvlm multimodal-reasoning image-reasoning

Updated Oct 30, 2025

[AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Vision-Language Models (e.g., LLaVA-Next) under a fixed token budget.

ml vlm llava lvlm llava-next

Updated Apr 18, 2025
Python

tsinghua-fib-lab / SmartAgent

Star

The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".

personalization human-computer-interaction multi-modal embodied-ai chain-of-thought large-language-model llm-agent llm-reasoning lvlm openai-o1 human-centric-ai

Updated Aug 20, 2025

fan19-hub / LEMMA

Star

LEMMA: An effective and explainable way to detect multimodal misinformation with LVLM and external knowledge augmentation, incorporating the intuition and reasoning capbility inside LVLM.

misinformation rag lvlm