Stars
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"
Official PyTorch implementation for "Large Language Diffusion Models"
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
[CVPR2024] ViP-LLaVA: Making Large Multimodal Models Understand Arbitrary Visual Prompts
The official repository for the paper "ThinkMorph: Emergent Properties in Multimodal Interleaved Chain-of-Thought Reasoning"
Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
Glance: Accelerating Diffusion Models with 1 Sample
This is a collection of recent papers on reasoning in video generation models.
Strong and Open Vision Language Assistant for Mobile Devices
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
Official Repository for "Glyph: Scaling Context Windows via Visual-Text Compression"
Code for paper "Towards Better & Faster Autoregressive Image Generation: From the Perspective of Entropy" [NeurIPS 2025] .
SIGMORPHON 2022 Shared Task on Morpheme Segmentation
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
[Lumina Embodied AI] 具身智能技术指南 Embodied-AI-Guide
[EMNLP-Findings'24] Tokenization Falling Short: On Subword Robustness in Large Language Models
MuLan: Adapting Multilingual Diffusion Models for 110+ Languages (无需额外训练为任意扩散模型支持多语言能力)
OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.
Facebook Low Resource (FLoRes) MT Benchmark