Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,472 58 Updated Jun 14, 2025

QwenLM / Qwen3

Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.

Python 25,126 1,749 Updated Oct 13, 2025

facebookresearch / locate-3d

Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset

Python 378 32 Updated Jun 3, 2025

InternRobotics / EmbodiedScan

[CVPR 2024 & NeurIPS 2024] EmbodiedScan: A Holistic Multi-Modal 3D Perception Suite Towards Embodied AI

Python 636 46 Updated Jun 13, 2025

VITA-MLLM / Long-VITA

✨✨Long-VITA: Scaling Large Multi-modal Models to 1 Million Tokens with Leading Short-Context Accuracy

Python 302 28 Updated May 14, 2025

Kwai-YuanQi / MM-RLHF

The Next Step Forward in Multimodal LLM Alignment

Python 184 8 Updated May 1, 2025

bytedance / Sa2VA

Official Repo For "Sa2VA: Marrying SAM2 with LLaVA for Dense Grounded Understanding of Images and Videos"

Python 1,359 94 Updated Oct 24, 2025

vision-x-nyu / thinking-in-space

Official repo and evaluation implementation of VSI-Bench

Python 607 37 Updated Aug 5, 2025

Genesis-Embodied-AI / Genesis

A generative world for general-purpose robotics & embodied AI learning.

Python 27,451 2,523 Updated Oct 25, 2025

causalfusion / causalfusion

Python 183 4 Updated Dec 17, 2024

dvlab-research / Lyra

[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"

Python 301 29 Updated Jan 9, 2025

allenai / open-instruct

AllenAI's post-training codebase

Python 3,265 453 Updated Oct 25, 2025

DepthAnything / Depth-Anything-V2

[NeurIPS 2024] Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation

Python 6,850 681 Updated Jan 22, 2025

baaivision / Emu3

Next-Token Prediction is All You Need

Python 2,219 84 Updated Mar 17, 2025

openai / whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Python 89,920 11,243 Updated Sep 8, 2025

CaraJ7 / MMSearch

[ICLR 2025] The First Multimodal Seach Engine Pipeline and Benchmark for LMMs

Python 479 31 Updated Jan 23, 2025

QwenLM / Qwen3-VL

Qwen3-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Jupyter Notebook 15,481 1,205 Updated Oct 22, 2025

Drexubery / ViewCrafter

[TPAMI 2025] ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis

Python 1,447 52 Updated Sep 18, 2025

VITA-MLLM / VITA

✨✨VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction

Python 2,428 176 Updated Mar 28, 2025

facebookresearch / sam2

The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…

Jupyter Notebook 17,393 2,161 Updated Dec 25, 2024

dvlab-research / ControlNeXt

Controllable video and image Generation, SVD, Animate Anyone, ControlNet, ControlNeXt, LoRA

Python 1,618 80 Updated Sep 25, 2024

cambrian-mllm / cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Python 1,955 131 Updated Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Yanwei yanwei-li

Achievements

Achievements

Organizations

Block or report yanwei-li

Stars

song2yu / SIBench-VSR

EvolvingLMMs-Lab / LLaVA-OneVision-1.5

Mini-o3 / Mini-o3

dvlab-research / MGM-Omni

volcengine / verl

lxtGH / DenseWorld-1M

tau-yihouxiang / EX-4D

ByteDance-Seed / Bagel

ByteDance-Seed / Seed1.5-VL