Thanks to visit codestin.com
Credit goes to www.zhangwp.com

I am a Ph.D. candidate at Peking University, advised by Prof. Zongqing Lu. My research focuses on Foundation Models, Embodied AI, and Reinforcement Learning. I am also a researcher at BeingBeyond, a startup company dedicated to building foundation models for embodied AI. For more information, please refer to my CV or CV(Chinese).

/ / / / /

News

I am hiring self-motivated students/interns to work on VLA/Embodied Agent/Robotics (@BeingBeyond/Peking University). If you are interested, please feel free to drop me an email.

Selected Publication

(For the full publications, please see my Google Scholar.)

1. Embodied AI

  • (arXiv’25.12) DiG-Flow: Discrepancy-Guided Flow Matching for Robust VLA Models.
    • Wanpeng Zhang, Ye Wang, Hao Luo, Haoqi Yuan, Yicheng Feng, Sipeng Zheng, Qin Jin, Zongqing Lu.
    • TLDR: DiG-Flow is a plug-and-play module for flow-matching based VLAs that rebalances control between the autoregressive foundation model and the flow expert.
    • Project / Paper / Bib / GitHub
  • (arXiv’25.12) Spatial-Aware VLA Pretraining through Visual-Physical Alignment from Human Videos.
    • Yicheng Feng, Wanpeng Zhang, Ye Wang, Hao Luo, Haoqi Yuan, Sipeng Zheng, Zongqing Lu.
    • TLDR: We introduce VIPA-VLA, which learns 2D-to-3D visual-physical grounding from human videos, enabling VLA with stronger spatial understanding and generalization.
    • Project / Paper / Bib / GitHub
  • (arXiv’25.07) Being-H0: Vision-Language-Action Pretraining from Large-Scale Human Videos.
    • Hao Luo*, Yicheng Feng*, Wanpeng Zhang*, Sipeng Zheng*, Ye Wang, Haoqi Yuan, Jiazheng Liu, Chaoyi Xu, Qin Jin, Zongqing Lu. *Equal Contribution.
    • TLDR: We introduce Being-H0, the first dexterous Vision-Language-Action model pretrained from large-scale human videos via explicit hand motion modeling.
    • Project / Paper / Bib / GitHub / Hugging Face

2. MLLM

  • (NeurIPS’25) OpenMMEgo: Enhancing Egocentric Understanding for LMMs with Open Weights and Data.
    • Hao Luo, Zihao Yue, Wanpeng Zhang, Yicheng Feng, Sipeng Zheng, Deheng Ye, Zongqing Lu.
    • TLDR: OpenMMEgo enhances egocentric video understanding through a multi-level synthetic dataset, semantic-aware visual token compression to handle viewpoint shifts, and curriculum learning for stable training.
    • Paper / Bib / GitHub
  • (ICCV’25, Highlight) Unified Multimodal Understanding via Byte-Pair Visual Encoding.
    • Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li, Zihao Yue, Sipeng Zheng, Zongqing Lu.
    • TLDR: Building upon the visual BPE Tokenizer proposed in the previous work, we further designed a complete training framework and our Being-VL-0.5 model.
    • Project / Paper / Bib / GitHub / Link
  • (ICCV’25) VideoOrion: Tokenizing Object Dynamics in Videos.
    • Yicheng Feng, Yijiang Li, Wanpeng Zhang, Hao Luo, Zihao Yue, Sipeng Zheng, Zongqing Lu.
    • TLDR: VideoOrion encodes videos with a two-branch design, using object tokens from a detect-segment-track pipeline to capture object dynamics alongside scene context.
    • Paper / Bib / Link
  • (ICLR’25) From Pixels to Tokens: Byte-Pair Encoding on Quantized Visual Modalities.
    • Wanpeng Zhang, Zilong Xie, Yicheng Feng, Yijiang Li, Xingrun Xing, Sipeng Zheng, Zongqing Lu.
    • TLDR: We propose BPE Tokenizer for images, enabling Transformers to learn and align multi-modal information more effectively, providing a new learning paradigm for Unified MLLMs.
    • Paper / Bib / GitHub / Link

3. RL & Agent

  • (NAACL’25) LLM-Based Explicit Models of Opponents for Multi-Agent Games.
    • Xiaopeng Yu, Wanpeng Zhang, Zongqing Lu.
    • TLDR: We propose EMO, a method that models each opponent individually using LLMs with iterative self- and global-refinement for better multi-agent reasoning.
    • Paper / Bib / Link
  • (ICML’24) Tackling Non-Stationarity in Reinforcement Learning via Causal-Origin Representation.
    • Wanpeng Zhang, Yilin Li, Boyu Yang, Zongqing Lu.
    • TLDR: By adaptively learning the causal relationship joint graph in the environment and providing representations with causal relationships, RL algorithms can effectively tackle non-stationarities.
    • Paper / Bib / GitHub / Link
  • (NAACL’24) AdaRefiner: Refining Decisions of Language Models with Adaptive Feedback.
    • Wanpeng Zhang, Zongqing Lu.
    • TLDR: We propose AdaRefiner to achieve the co-learning of LLMs and RL agents by enabling them to provide feedback to each other, optimizing both perception and decision-making capabilities.
    • Paper / Bib / GitHub / Link
  • (ICML’23) Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning.
    • Ziluo Ding*, Wanpeng Zhang*, Junpeng Yue, Xiangjun Wang, Tiejun Huang, Zongqing Lu. *Equal Contribution.
    • TLDR: We propose EnDi framework, achieving agent goal division and collaboration enhancement in multi-agent systems through language and entity binding.
    • Paper / Bib / GitHub / Link
  • (NeurIPS’22) Model-Based Opponent Modeling.
    • Xiaopeng Yu, Jiechuan Jiang, Wanpeng Zhang, Haobin Jiang, Zongqing Lu.
    • TLDR: MBOM uses environment models to recursively simulate and mix imagined opponent policies for adaptive opponent modeling.
    • Paper / Bib / GitHub / Link

Education

  • Peking University. (Beijing, China. Sep 2022 — Jun 2026 (Expected))
    • Ph.D. Candidate in Computer Science.
    • Research Interest: Foundation Models / Embodied AI / Reinforcement Learning
  • Tsinghua University. (Beijing, China. Sep 2019 — Jun 2022)
    • M.S. in Computer Science.
    • Research Interest: Reinforcement Learning
  • Nankai University. (Tianjin, China. Sep 2015 — Jun 2019)
    • B.S. in Applied Mathematics.
    • Research Interest: Applied Mathematics / Machine Learning

Work Experience

  • BeingBeyond. (Beijing, China. Mar 2025 — Present)
    • Startup Team.
    • Foundation Models / VLA / Embodied AI
  • Beijing Academy of Artificial Intelligence. (Beijing, China. May 2024 — Mar 2025)
    • Research Scientist Intern.
    • Foundation Models / VLM / Embodied AI
  • Tencent AI Lab
    • Research Scientist Intern. (Shenzhen, China. Jun 2020 — Jul 2021)
    • Reinforcement Learning

Patent

  • Multimodal data processing method, device, storage medium, and electronic equipment. (CN119226992B)
  • Method, device and equipment for determining parameters and storage medium. (CN112527104A)
    • Wanpeng Zhang, Dijun Luo, Xi Xiao.
    • Link / PDF

Award

  • National Scholarship. (2025)
  • Top 10 Students at the National Engineering Research Center of Visual Technology. (2025)
  • Merit Student of Peking University. (2025)
  • Presidential Scholarship of Peking University. (2024)
  • Award for Scientific Research of Peking University. (2024)
  • Rhino-bird Elite Training Program of Tencent AI Lab. (2021)
  • Mathematical Contest in Modeling (MCM/ICM), Meritorious Winner (First Prize). (2017)
  • China Undergraduate Mathematical Contest in Modeling (CUMCM), Second Prize. (2016)
  • National High School Mathematics Competition, Second Prize. (2014)

Service

  • Conference Reviewer
    • ICML / NeurIPS / ICLR / CVPR / ICCV / AAAI / ICRA / AISTATS
  • Journal Reviewer
    • TNNLS / TIST / RAL
  • Teaching Assistant
    • Deep Reinforcement Learning, Peking University. (Spring, 2025)