Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View yangcaoai's full-sized avatar

Block or report yangcaoai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

[MTI-LLM@NeurIPS 2025] Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."

Python 144 6 Updated Jul 22, 2025

Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"

Python 264 16 Updated Jan 6, 2026
Jupyter Notebook 7 1 Updated Jan 17, 2026

Official code for paper: N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Python 77 2 Updated Jan 14, 2026

EO: Open-source Unified Embodied Foundation Model Series

Jupyter Notebook 282 26 Updated Nov 12, 2025

NeurIPS 2025 Spotlight; ICLR2024 Spotlight; CVPR 2024; EMNLP 2024

Python 1,802 75 Updated Nov 27, 2025

Official Implementation of Paper Transfer between Modalities with MetaQueries

Python 291 11 Updated Oct 12, 2025

The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'

Jupyter Notebook 192 6 Updated Nov 28, 2025

Official implementation of DepthLM

Python 286 13 Updated Jan 6, 2026

Open-source unified multimodal model

Python 5,570 487 Updated Oct 27, 2025

Official implementation of "C3G: Learning Compact 3D Representations with 2K Gaussians"

Python 122 3 Updated Jan 13, 2026

NEO Series: Native Vision-Language Models from First Principles

Python 630 22 Updated Jan 9, 2026

MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources

Python 212 9 Updated Sep 26, 2025

Unified 3D Reconstruction and Semantic Understanding via Generalizable Gaussian Splatting from Unposed Multi-View Images

Python 115 4 Updated Sep 3, 2025

One4D: Unified 4D Generation and Reconstruction

69 2 Updated Dec 2, 2025

Cambrian-S: Towards Spatial Supersensing in Video

Python 480 17 Updated Dec 27, 2025

HunyuanVideo-1.5: A leading lightweight video generation model

Python 3,408 116 Updated Jan 2, 2026

[NeurIPS 2025 DB Track] 3EED: Ground Everything Everywhere in 3D

Python 198 11 Updated Dec 26, 2025

PyTorch implementation of JiT https://arxiv.org/abs/2511.13720

Python 1,996 124 Updated Dec 8, 2025

SAM 3D Objects

Python 5,638 595 Updated Jan 9, 2026

Wan: Open and Advanced Large-Scale Video Generative Models

Python 13,659 1,620 Updated Dec 17, 2025

Depth Anything 3

Python 4,014 356 Updated Dec 12, 2025

[Awesome-Spatial-VLMs] This repository is the official, community-maintained resource for the survey paper: Spatial Intelligence in Vision-Language Models: A Comprehensive Survey;

Python 53 2 Updated Jan 7, 2026
195 3 Updated Oct 22, 2025

Mono4DGS-HDR: High Dynamic Range 4D Gaussian Splatting from Alternating-exposure Monocular Videos

23 1 Updated Oct 22, 2025

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 67,707 12,640 Updated Jan 17, 2026

Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.

Jupyter Notebook 1,519 60 Updated Jun 14, 2025

[ICCV 2025] SuperDec: 3D Scene Decomposition with 
Superquadric Primitives.

Python 162 10 Updated Dec 31, 2025
Next