Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View ER123's full-sized avatar
  • sichuan unitersity
  • chengdu,sichuan

Block or report ER123

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Fully Open Framework for Democratized Multimodal Training

Python 594 40 Updated Oct 21, 2025

DELT: Data Efficacy for Language Model Training

Python 40 4 Updated Aug 31, 2025
Swift 34 7 Updated Aug 7, 2025

Cook up amazing multimodal AI applications effortlessly with MiniCPM-o

Python 212 19 Updated Oct 30, 2025

Official repository for VisionZip (CVPR 2025)

Python 366 15 Updated Jul 21, 2025

[ECCV 2024 Oral] Code for paper: An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

Python 506 20 Updated Jan 4, 2025

Survey: https://arxiv.org/pdf/2507.20198

186 13 Updated Oct 24, 2025

Long-RL: Scaling RL to Long Sequences (NeurIPS 2025)

Python 648 23 Updated Sep 24, 2025

Open-source offline translation library written in Python

Python 4,972 360 Updated Oct 21, 2025

Universal memory layer for AI Agents; Announcing OpenMemory MCP - local and secure memory management.

Python 42,396 4,558 Updated Nov 1, 2025
Python 691 12 Updated Sep 24, 2025

This repository contains the official implementation of the research paper, "FastViT: A Fast Hybrid Vision Transformer using Structural Reparameterization" ICCV 2023

Python 1,964 119 Updated Nov 30, 2023

TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning

Python 105 1 Updated May 22, 2025

A Simple Framework of Small-scale LMMs for Video Understanding

Python 96 6 Updated Jun 11, 2025

✨First Open-Source R1-like Video-LLM [2025/02/18]

Python 372 13 Updated Feb 23, 2025

Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]

Python 726 38 Updated Sep 19, 2025

The development and future prospects of large multimodal reasoning models.

530 20 Updated Aug 2, 2025

This repository contains the official implementation of "FastVLM: Efficient Vision Encoding for Vision Language Models" - CVPR 2025

Python 6,828 476 Updated May 5, 2025

📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).

885 38 Updated Sep 27, 2025

Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.

Go 155,138 13,514 Updated Nov 1, 2025

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 61,487 7,438 Updated Oct 30, 2025

Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities

1,083 52 Updated Jul 15, 2025

Qwen2.5-Omni is an end-to-end multimodal model by Qwen team at Alibaba Cloud, capable of understanding text, audio, vision, video, and performing real-time speech generation.

Jupyter Notebook 3,760 294 Updated Jun 12, 2025

[ICLR'24 spotlight] An open platform for training, serving, and evaluating large language model for tool learning.

Python 5,315 450 Updated May 21, 2025

An Autonomous LLM Agent for Complex Task Solving

Python 8,457 890 Updated Aug 12, 2024

A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.

Python 1,381 81 Updated Sep 22, 2025

✨✨Latest Advances on Multimodal Large Language Models

16,581 1,069 Updated Oct 31, 2025

[CVPR 2025 (Oral)] Mitigating Hallucinations in Large Vision-Language Models via DPO: On-Policy Data Hold the Key

Python 84 4 Updated Sep 30, 2025

Official repository of 'Visual-RFT: Visual Reinforcement Fine-Tuning' & 'Visual-ARFT: Visual Agentic Reinforcement Fine-Tuning'’

Jupyter Notebook 2,237 99 Updated Oct 29, 2025

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

Python 3,284 526 Updated Oct 31, 2025
Next