-
CUHK
- Hong Kong
-
16:21
(UTC +08:00) - https://zj-binxia.github.io/
Stars
[NeurIPS'25] Official repository of Concerto: Joint 2D-3D Self-Supervised Learning Emerges Spatial Representations
DreamOmni2中VLM在ComfyUI中的复现,支持int4,int8量化;配合loras可完成原项目的复现
A ComfyUI node for dvlab-research/DreamOmni2
ViSurf: Visual Supervised-and-Reinforcement Fine-Tuning for Large Vision-and-Language Models
HunyuanVideo-Foley: Multimodal Diffusion with Representation Alignment for High-Fidelity Foley Audio Generation.
This project is the official implementation of 'DreamOmni2: Multimodal Instruction-based Editing and Generation''
Official inference code and LongText-Bench benchmark for our paper X-Omni (https://arxiv.org/pdf/2507.22058).
[NeurIPS 2025] Efficient Reasoning Vision Language Models
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving state-of-the-art performance on 38 out of 60 public benchmarks.
Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
[NeurIPS 2024] The official implementation of HairFastGAN. A framework for virtual hairstyle fitting.
Pytorch Implementation of: "Stable-Hair: Real-World Hair Transfer via Diffusion Model" (AAAI 2025)
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
[ICLR 2025 Oral] Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
verl: Volcano Engine Reinforcement Learning for LLMs
A fork to add multimodal model training to open-r1
A generative world for general-purpose robotics & embodied AI learning.
Official repo and evaluation implementation of VSI-Bench
[ICCV 2025] Official Implementation for "Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition"
Official repository for VisionZip (CVPR 2025)
The world's simplest facial recognition api for Python and the command line
The best OSS video generation models, created by Genmo
A Go implementation in Raft, for 18-845 at CMU (Spring 2015).
Code for "Diffusion Model Alignment Using Direct Preference Optimization"
A collection of resources on controllable generation with text-to-image diffusion models.
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)