Codestin Search App

Panwang Pan | 潘攀望

I am currently employed as a Senior Researcher at PICO within ByteDance Ltd. Previously, I held the position of Senior Algorithm Engineer at Alibaba Cloud.

In 2019, I earned my Master's degree from Xiamen University, where I was enrolled in the School of Informatics.

My research focuses on the intersection of generative models and multi-modal representation learning. These contributions have been deployed in real-world systems, including embedded XR devices and large-scale platforms like the Aliyun Cloud AI-Box.

I welcome opportunities for coffee chats and collaborations. Please feel free to reach out!

Email / Google Scholar / Github / Twitter / Wechat

📢 Latest News

[2025-06] InstructLayout was accepted to T-PAMI 2025 🎉 .

[2025-06] InfoBridge was accepted to ICCV 2025 🎉 .

[2025-06] We released PartCrafter , a 3D-native DiT model designed to generate 3D objects in modular parts 🎉.

[2025-02] One paper about VLM + RRHF (JarvisIR) was accepted to CVPR 2025 🎉 .

[2025-01] 4K4DGEN was selected as ICLR25 Spotlight, top 3.2% among 11672 🎉.

[2025-01] Three papers about 3D/4D Generative Models (InstantSplamp & DiffSplat & 4K4DGEN) were accepted to ICLR 2025🎉.

[2024-09] One paper about generalizable single-view human reconstruction (HumanSplat) was accepted to NeurIPS 2024 🎉 .

[2024-09] One paper about VLM Distillation (MRD) was accepted to ECCV 2024 🎉 .

📑 Selected Publications ( Google Scholar )

* Equal contribution, † Project leader, ‡ Corresponding author

Generative AI

NeurIPS 2025

PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers

Yuchen Lin, Chenguo Lin, Panwang Pan^†(Project Lead), Honglei Yan, Yiqiang Feng, Yadong Mu, Katerina Fragkiadaki

[Paper] [Project] [Code] PartCrafter GitHub stars

PartCrafter is a structured 3D generative model that jointly generates multiple parts and objects from a single RGB image in one shot.

TPAMI 2025

InstructLayout: Instruction-Driven 2D and 3D Layout Synthesis with Semantic Graph Prior

Chenguo Lin, Yuchen Lin, Panwang Pan, Xuanyang Zhang, Yadong Mu
[Paper] [Project] [Code]

InstructLayout is a novel generative framework that integrates a semantic graph prior and a layout decoder to improve controllability and fidelity for 2D and 3D layout synthesis.

Preprint 2025

Deformable Gaussian Diffusion: Controllable 4D Scene Generation from a Single Image

Panwang Pan, Chenguo Lin*, Jingjing Zhao, Chenxin Li, Yuchen Lin, Kairun Wen, Yunlong Lin, Yixuan Yuan, Yadong Mu, Zhiwen Fan

[Paper] [Project] [Code]

Diff4Splat is a generalizable framework for controllable 4D scene generation from a single image using a video diffusion model.

ICLR 2025 🌟 spotlight 🌟

4K4DGEN: Panoramic 4D Generation at 4K Resolution

Panwang Pan^*‡, Renjie Li*, Bangbang Yang, Dejia Xu, Shijie Zhou, Xuanyang Zhang, Zeming Li, Achuta Kadambi, Zhangyang Wang, Zhengzhong Tu, Zhiwen Fan

[Openreview] [Paper] [Project] [Code]

4K4DGEN achieves high-quality Panorama-to-4D generation at a resolution of 4K for the first time using efficient splatting techniques for real-time exploration.

Preprint 2025

ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies

Jinyan Yuan, Bangbang Yang, Keke Wang, Panwang Pan, Lin Ma, Xuehai Zhang, Xiao Liu, Zhaopeng Cui, Yuewen Ma

[Paper] [Project] [Code]

ImmerseGen is a novel agent-guided framework for compact and photorealistic world modeling.

NeurIPS 2024

HumanSplat: Generalizable Single-Image Human Gaussian Splatting with Structure Priors

Panwang Pan^‡, Zhou Su Chenguo Lin, Zhen Fan, Yongjie Zhang, Zeming Li¹, Tingting Shen, Yadong Mu, Yebin Liu^‡

[Openreview] [Paper] [Project] [Code]

HumanSplat predicts the 3D Gaussian Splatting properties of any human from a single input image in a generalizable manner.

ICLR 2025

DiffSplat: Repurposing Image Diffusion Models for Scalable Gaussian Splatting Generation

Chenguo Lin^*, Panwang Pan^*^†, Bangbang Yang, Zeming Li, Yadong Mu,

[Openreview] [Paper] [Project] [Code] PartCrafter GitHub stars

DiffSplat is a novel 3D generative framework that natively generates 3D Gaussians by taming large-scale text-to-image diffusion models. DiffSplat directly generates 3D Gaussians from text prompts or single-view images in 1~2 seconds and achieves SOTA 3D Reconstruction results.

ICCV23 & ICLR25

StegaNeRF: Embedding Invisible Information within Neural Radiance Fields / InstantSplamp: Fast and Generalizable Stenography Framework for Generative Gaussian Splatting

[StegaNeRF Paper] [StegaNeRF Project] [StegaNeRF Code]

[InstantSplamp Paper] [InstantSplamp Project] [InstantSplamp Code]

StegaNeRF/InstantSplamp achieves reliable recovery of hidden information with minimal rendering impact. These works offer a promising outlook on ownership identification in 3D represents and calls for more attention and effort on related problems.

Preprint 2025

MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second

Chenguo Lin*, Yuchen Lin*, Panwang Pan†, Yifan Yu, Honglei Yan, Katerina Fragkiadaki, Yadong Mu

[Paper] [Project] [Code]

MoVieS is a feed-forward framework that jointly reconstructs appearance, geometry and motion for 4D scene perception from monocular videos.

NeurIPS 2025

DynamicVerse: Physically-Aware Multimodal Modeling for Dynamic 4D Worlds

Kairun Wen, Yuzhi Huang, Runyu Chen, Hui Zheng, Yunlong Lin, Panwang Pan, Chenxin Li, Wenyan Cong, Jian Zhang, Junbin Lu, Chenguo Lin, Dilin Wang, Zhicheng Yan, Hongyu Xu, Justin Theiss, Yue Huang, Xinghao Ding, Rakesh Ranjan, Zhiwen Fan

[Paper] [Project] [Code]

DynamicVerse is a physical‑scale, multimodal 4D modeling framework for real-world video.

Multi-modal Learning

NeurIPS 2025

JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent

Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding‡, Wenbo Li, Shuicheng Yan^‡

[Paper] [Project] [Code] PartCrafter GitHub stars

JarvisArt outperforms GPT-4o with a 60% improvement in average pixel-level metrics on MMArt-Bench for content fidelity, while maintaining comparable instruction-following capabilities.

CVPR 2025

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration

Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Panwang Pan*, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding‡

[Paper] [Project] [Code] PartCrafter GitHub stars

JarvisIR is a VLM-powered intelligent system that dynamically schedules expert models for restoration.

ECCV 2024

Multi-modal Relation Distillation for Unified 3D Representation Learning

Huiqun Wang, Yiping Bao, Panwang Pan^†(Project Lead), Zeming Li, Xiao Liu, Ruijie Yang, Di Huang

[Paper] [Project]

Multi-modal Relation Distillation is designed to effectively distill reputable large Vision-Language Models (VLM) into 3D backbones.

💼 Experience

ByteDance Ltd, Beijing, China, Senior Computer Vision Algorithm Engineer, advised by Cheng Chen and Zeming Li.	08/2022 - Present
Alibaba Cloud, Hangzhou, China, Senior Computer Vision Algorithm Engineer	07/2019 - 07/2022
DevTech Compute, NVIDIA, Beijing, China, AI Developer Technology Engineer Intern advised by Xipeng Li .	07/2018 - 10/2018

🏆 Selected Awards

2024: “Star Team Award” Innovation Breakthrough Award, Bytedance

2023: “Star Team Award” Innovation Breakthrough Award, Bytedance

2022: ByteStyle Award, Bytedance

2019: Outstanding Graduates of Xiamen University

2018: National Scholarship for Postgraduates, Ministry of Education

2018: First Prize of GEDC, Second Prize of MCM & CPIPC

2017: ZhongXian Huang Scholarship, Xiamen University (about 10 awards per year)

2015: National Scholarship for Undergraduates (the highest honor scholarship in China)

💬 Miscellaneous

Conference Reviewer: NeurIPS, ICLR, CVPR, ICML, ICCV, ACM MM