-
UC Merced
- Merced, CA
-
10:07
(UTC -07:00) - https://yuanhaobo.me
- @HarborYuan
Highlights
- Pro
Stars
[ICCV'25 Best Paper Finalist] ReCamMaster: Camera-Controlled Generative Rendering from A Single Video
Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
Detect Anything via Next Point Prediction (Based on Qwen2.5-VL-3B)
🔥 [EMNLP 2025] Official open-source repo for Boosting Multi-modal Keyphrase Prediction with Dynamic Chain-of-Thought in Vision-Language Models
The official implementation of "Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs"
CUDA accelerated rasterization of gaussian splatting
johnnynunez / decord2
Forked from dmlc/decordAn efficient video loader for deep learning with smart shuffling that's super easy to digest
Efficient Triton Kernels for LLM Training
[CVPR 2024] RoMa: Robust Dense Feature Matching; RoMa is the robust dense feature matcher capable of estimating pixel-dense warps and reliable certainties for almost any image pair.
A pytorch CUDA extension implementation of instant-ngp (sdf and nerf), with a GUI.
Wan: Open and Advanced Large-Scale Video Generative Models
Official repo of "Time Reversal Fusion" (ECCV2024)
Code and dataset link for "DenseWorld-1M: Towards Detailed Dense Grounded Caption in the Real World"
Standalone TFRecord reader/writer with PyTorch data loaders
Dataset, streaming, and file system extensions maintained by TensorFlow SIG-IO
An intuitive and low-overhead instrumentation tool for Python
An open-source AI agent that brings the power of Gemini directly into your terminal.
Autonomous coding agent right in your IDE, capable of creating/editing files, executing commands, using the browser, and more with your permission every step of the way.
Official Jax Implementation of MD4 Masked Diffusion Models
Code and datasets for "What’s “up” with vision-language models? Investigating their struggle with spatial reasoning".
HiMTok: Learning Hierarchical Mask Tokens for Image Segmentation with Large Multimodal Model
MAGI-1: Autoregressive Video Generation at Scale
Open source repo for Locate 3D Model, 3D-JEPA and Locate 3D Dataset