-
Shanghai Jiao Tong University
- https://scholar.google.com/citations?user=pP5WG9wAAAAJ
Stars
MOVA: Towards Scalable and Synchronized VideoβAudio Generation
The ultimate training toolkit for finetuning diffusion models
Ring attention implementation with flash attention
Dexbotic: Open-Source Vision-Language-Action Toolbox
Qwen-Image text to image lora trainer
Constraint-based geometry sketcher for blender
Published on Nature Machine Intelligence! The first real robot(quadrotor) based on differentiable physics training.
Official implementation of "Sonic: Shifting Focus to Global Audio Perception in Portrait Animation"
πΉ A more flexible framework that can generate videos at any resolution and creates videos from images.
Solution finder for KAMI (2) game on IOS/Android
Python audio and music signal processing library
Multi-lingual large voice generation model, providing inference, training and deployment full-stack ability.
π [CVPR 2024] Pytorch implementation of 'Har Far Can We Compress Instant-NGP Based NeRF?'
π [ECCV 2024] Pytorch implementation of 'HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression'
text and image to video generation: CogVideoX (2024) and CogVideo (ICLR 2023)
Autoregressive Model Beats Diffusion: π¦ Llama for Scalable Image Generation
Official Implementation of "CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning" on MICCAI 2024.
Synthesizing and manipulating 2048x1024 images with conditional GANs
pytorch implementation of the paper ``Large Scale Image Completion via Co-Modulated Generative Adversarial Networks"
[ICLR 2021, Spotlight] Large Scale Image Completion via Co-Modulated Generative Adversarial Networks
MULTIMODAL SEMANTIC-AWARE AUTOMATIC COLORIZATION WITH DIFFUSION PRIOR
High-Quality Human Motion Video Generation with Confidence-aware Pose Guidance
[NeurIPS'24] Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models
MuseV: Infinite-length and High Fidelity Virtual Human Video Generation with Visual Conditioned Parallel Denoising