Stars
The implementation of paper FedMGP: Personalized Federated Learning with Multi-Group Text-Visual Prompts[NIPS 2025].
[NeurIPS'25][OralGPT & MMOral] The official repo of OralGPT & MMOral Bench.
Building General-Purpose Robots Based on Embodied Foundation Model
Reference PyTorch implementation and models for DINOv3
[TCSVT underreview] This is the Pytorch code for our paper "SSP-SAM: SAM with Semantic-Spatial Prompt for Referring Expression Segmentation".
[NeurIPS 2022] Singular Value Fine-tuning: Few-shot Segmentation requires Few-parameters Fine-tuning
[TPAMI]CTNet: Context-based Tandem Network for Semantic Segmentation
Solve Visual Understanding with Reinforced VLMs
The official code for the CVPR 2025 paper "Open-World Objectness Modeling Unifies Novel Object Detection" will be released soon.
[TMM 2025] This is the official Pytorch code for our paper "Visual Position Prompt for MLLM based Visual Grounding".
Integrate the DeepSeek API into popular softwares
MME-CoT: Benchmarking Chain-of-Thought in LMMs for Reasoning Quality, Robustness, and Efficiency
Extend OpenRLHF to support LMM RL training for reproduction of DeepSeek-R1 on multimodal tasks.
MM-Eureka V0 also called R1-Multimodal-Journey, Latest version is in MM-Eureka
Fully open reproduction of DeepSeek-R1
Witness the aha moment of VLM with less than $3.
Implementation of the paper Open Eyes, Then Reason: Fine-grained Visual Mathematical Understanding in MLLMs
Eagle: Frontier Vision-Language Models with Data-Centric Strategies
Personalized Representation from Personalized Generation (ICLR 2025)
Descriptive Caption Enhancement with Visual Specialists for Multimodal Perception
DINO-X: The World's Top-Performing Vision Model for Open-World Object Detection and Understanding
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…