This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Python 1,964 119 Updated Nov 30, 2023

ZhangXJ199 / TinyLLaVA-Video-R1

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Python 105 1 Updated May 22, 2025

ZhangXJ199 / TinyLLaVA-Video

A Simple Framework of Small-scale LMMs for Video Understanding

Python 96 6 Updated Jun 11, 2025

Wang-Xiaodong1899 / Open-R1-Video

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 372 13 Updated Feb 23, 2025

tulerfeng / Video-R1

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 726 38 Updated Sep 19, 2025

HITsz-TMG / Awesome-Large-Multimodal-Reasoning-Models

The development and future prospects of large multimodal reasoning models.

530 20 Updated Aug 2, 2025

apple / ml-fastvlm

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 6,828 476 Updated May 5, 2025

showlab / Awesome-MLLM-Hallucination

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

885 38 Updated Sep 27, 2025

ollama / ollama

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 155,138 13,514 Updated Nov 1, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 61,487 7,438 Updated Oct 30, 2025

MoonshotAI / Kimi-VL

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,083 52 Updated Jul 15, 2025

QwenLM / Qwen2.5-Omni

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,760 294 Updated Jun 12, 2025

OpenBMB / ToolBench

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.

Python 5,315 450 Updated May 21, 2025

OpenBMB / XAgent

An Autonomous LLM Agent for Complex Task Solving

Python 8,457 890 Updated Aug 12, 2024

AIDC-AI / Ovis

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 1,381 81 Updated Sep 22, 2025

BradyFU / Awesome-Multimodal-Large-Language-Models

✨✨Latest Advances on Multimodal Large Language Models

16,581 1,069 Updated Oct 31, 2025

zhyang2226 / OPA-DPO

[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Python 84 4 Updated Sep 30, 2025

Liuziyu77 / Visual-RFT

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,237 99 Updated Oct 29, 2025

open-compass / VLMEvalKit

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,284 526 Updated Oct 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sjli ER123

Block or report ER123

Stars

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

microsoft / DELT

tc-mb / MiniCPM-o-demo-iOS

OpenSQZ / MiniCPM-V-CookBook

dvlab-research / VisionZip

pkunlp-icler / FastV

cokeshao / Awesome-Multimodal-Token-Compression

NVlabs / Long-RL

argosopentech / argos-translate

mem0ai / mem0

Kwai-Keye / Keye

apple / ml-fastvit