白辰甲 Bai Chenjia

Research Scientist
Institute of Artificial Intelligence (TeleAI), China Telecom

Biography

I am a Research Scientist at Institute of Artificial Intelligence (TeleAI), China Telecom and the Director of Embodied AI research center, specialized in the cutting-edge field of Embodied AI and Reinforcement Learning (RL). Our group is dedicated to develop embodied technologies encompassing perception, planning, locomotion, manipulation, and promoting the industrial application of embodied AI. Our group thrives under the leadership of Prof. Xuelong Li, who serves as the dean of TeleAI. Previously, I was a Researcher at Shanghai AI Laboratory, affiliated with IPEC group. My research interests include diffusion/transformer policy, LLM-driven planning, world model, preference learning, RL/MPC-based locomotion, dexterous manipulation, representation learning, sim-to-real, multi-agent collaboration, as well as real-world applications for robot arm, dexterous hand, quadruped robot, and humanoid robot.

I holds a Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT), advised by Prof. Peng Liu. I am fortunate to have been collaborated with many fantastic researchers. I was a joint PhD student at University of Toronto and Vector Institute, working with Prof. Animesh Garg. I also used to be an intern at Huawei Noah’s Ark Lab (advised by Prof. Jianye Hao), Tencent Robotics X (advised by Dr. Lei Han), and Alibaba. I received my Bachelor’s degree and Master’s degree in Computer Science from HIT.

中文简介：白辰甲，博士，现任中国电信人工智能研究院（TeleAI）研究科学家、具身智能研究中心主任，兼任清华大学、上海交通大学、复旦大学行业导师，联合培养专项工程博士。入选第十届中国科协青年“托举”人才，上海市青年科技英才“扬帆计划”，上海市徐汇区“光启”青年人才。负责研发TeleBot系列人形机器人和轮式机器人研发，在技术上构建软硬件一体化、大小脑协同等具身智能体系，实现具身智能体的通用策略学习，推动机器人应用。围绕具身大脑构建了跨本体适配模型、灵巧操作任务规划平台、和数据合成平台；围绕具身小脑构建了首个开源机器人全身仿人运动控制框架、文本驱动的通用小脑、感控一体人形机器人等。学术成果方面，已发表高水平论文80余篇，包括机器学习顶会NeurIPS、ICML、ICLR，人工智能顶刊AI Journal、TPAMI、SCIC，机器人顶会ICRA、CoRL等，撰写《强化学习：前沿算法与应用》专著，由机械工出版社出版。撰写了国内首篇《大模型驱动的具身智能：发展与挑战》综述，下载量2万余次，获中国科学年度最具人气论文。承担国家自然科学基金、国家重点研发计划课题、上海市科委项目、中国电信内部立项等项目。获世界人工智能大会优秀论文提名奖、ICCV多地形人形机器人挑战赛冠军、哈尔滨工业大学优秀博士论文奖等奖项，相关成果受到MIT Technology Review、CCTV等媒体报道。担任顶级会议NeurIPS、ICML、AAMAS、ICME、PRCV领域主席，并担任多个顶级期刊和会议的审稿人。

团队招收具身智能方向全职研究人员、实习生、联培博士生，具体详见链接.

Interests

Embodied AI
Reinforcement Learning
Foundation Model for Decision Making

Education

PhD in Computer Science, 2017-2022
Harbin Institute of Technology
Joint PhD Program, 2020-2022
University of Toronto

Book

强化学习：前沿算法与应用

具身大脑

PRTS
新一代强化学习原生的机器人视觉-语言-动作(VLA)模型

GN0
首个涵盖"数据-仿真-模型-评测"的全链路的具身导航框架 GN0

具身小脑

KungfuBot
首个开源的人类视频到人形机器人全身控制框架

TextOp
首个文本驱动的人形机器人运动框架

Husky
首个人形机器人滑板运动突破

Publications

“✉” denotes corresponding author

Quickly discover relevant content by filtering publications.

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

Under Review. 2026

We introduce PRTS, a New Generation of Reinforcement Learning-Native Robotic Vision-Language-Action(VLA) foundation model.

Yang Zhang , Jiangyuan Zhao , Chenyou Fan , Fangzheng Yan , Tian Li , Xuaner Wu , Qizhen Weng , Xiu Li , Weinan Zhang , Chi Zhang , Chenjia Bai^✉ , Xuelong Li^✉

PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations

GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language-Navigation

Under Review. 2026

We present foundation VLN model via developing an automated pipeline for large-scale navigation data generation, resulting in the GN-Matrix dataset and a high-fidelity simulation platform that supports interactive roaming and collision-aware navigation.

Xinhai Li , Xiaotao Zhang , Yuehao Huang , Jiankun Dong , Tianhang Wang , Sunyao Zhou , Yunzi Wu , Chenguo Sun , Yunfei Ge , Qizhen Weng , Chi Zhang , Chenjia Bai^✉ , Xuelong Li^✉

GN0: Toward a Unified Paradigm for Generation, Evaluation, and Policy Learning in Visual-Language-Navigation

AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly

arXiv preprint

AssemLM introduces a spatial multimodal LLM and AssemBench to advance 3D reasoning and 6D pose prediction for robotic assembly.

Zhi Jing , Jinbin Qiao , Ouyang Lu , Jicong Ao , Shuang Qiu , Yu-Gang Jiang , Chenjia Bai^✉

AssemLM: Spatial Reasoning Multimodal Large Language Models for Robotic Assembly

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

arXiv preprint

DeCoNav introduces event-triggered dialogue and dynamic replanning to improve long-horizon collaborative VLN in synchronized multi-robot settings.

Sunyao Zhou , Yunzi Wu , Tianhang Wang , Xinhai Li , Guang Chen , Lizheng Liu , Chenjia Bai , Xuelong Li

DeCoNav: Dialog enhanced Long-Horizon Collaborative Vision-Language Navigation

Re^2MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) findings, 2026

Re^2MoGen combines LLM reasoning, keyframe-guided completion, and physics-aware RL refinement for open-vocabulary text-to-motion generation.

Jiakun Zheng , Ting Xiao , Shiqin Cao , Xinran Li , Zhe Wang , Chenjia Bai^✉

Re^2MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement

HALO: Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

arXiv preprint

HALO introduces a MuJoCo XLA-based two-stage identification pipeline that closes the heavy-load sim-to-real gap for agile humanoid skills.

Xingyi Wang , Chenyun Zhang , Weiji Xie , Chao Yu , Wei Song , Chenjia Bai^✉ , Shiqiang Zhu

HALO: Closing Sim-to-Real Gap for Heavy-loaded Humanoid Agile Motion Skills via Differentiable Simulation

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

arXiv preprint

We introduce RuleSafe and VQ-Memory, a VQ-VAE temporal representation that boosts long-horizon manipulation in non-Markovian benchmarks.

Honghui Wang , Zhi Jing , Jicong Ao , Shiji Song , Xuelong Li , Gao Huang , Chenjia Bai^✉

Beyond Short-Horizon: VQ-Memory for Robust Long-Horizon Manipulation in Non-Markovian Simulation Benchmarks

Pro-HOI: Perceptive Root-guided Humanoid-Object Interaction

arXiv preprint

We introduce Pro-HOI, a generalizable framework for robust humanoid loco-manipulation via perceptive root-guided control.

Yuhang Lin , Jiyuan Shi , Dewei Wang , Jipeng Kong , Yong Liu , Chenjia Bai^✉ , Xuelong Li

Pro-HOI: Perceptive Root-guided Humanoid-Object Interaction

TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control

arXiv

Real-time text-driven humanoid motion generation and control using a two-level diffusion-and-tracking architecture (TextOp).

Weiji Xie , Jiakun Zheng , Jinrui Han , Jiyuan Shi , Weinan Zhang^✉ , Chenjia Bai^✉ , Xuelong Li

TextOp: Real-time Interactive Text-Driven Humanoid Robot Motion Generation and Control

Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets

In International Conference on Machine Learning (ICML), 2026

V2A unifies dynamics alignment, value alignment, and value assignment for heterogeneous cross-domain offline RL and improves robust source-data filtering.

Zhongjian Qiao , Jiafei Lyu , Chenjia Bai , Peisong Wang , Siyang Gao , Shuang Qiu

Unifying Value Alignment and Assignment in Cross-Domain Offline Reinforcement Learning with Heterogeneous Datasets

Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework

arXiv preprint

Perception-Action integrated Decision-making (PAiD), a progressive framework for humanoid soccer skills.

Jipeng Kong , Xinzhe Liu , Yuhang Lin , Jinrui Han , Sören Schwertfeger , Chenjia Bai^✉ , Xuelong Li

Learning Soccer Skills for Humanoid Robots: A Progressive Perception-Action Framework

HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control

In Robotics: Science and Systems (RSS), 2026

We address humanoid skateboarding, a highly challenging task requiring stable dynamic maneuvering on an humanoid platform via physics-aware whole-body control for humanoid skateboarding in dynamic settings.

Jinrui Han , Dewei Wang , Chenyun Zhang , Xinzhe Liu , Ping Luo , Chenjia Bai^✉ , Xuelong Li

HUSKY: Humanoid Skateboarding System via Physics-Aware Whole-Body Control

X-Loco: Towards Generalist Humanoid Locomotion Control via Synergetic Policy Distillation

In Robotics: Science and Systems (RSS), 2026

We introduce X-Loco, a framework for training a vision-based generalist humanoid locomotion controller through synergetic policy distillation.

Dewei Wang , Xinmiao Wang , Chenyun Zhang , Jiyuan Shi , Yingnan Zhao , Chenjia Bai^✉ , Xuelong Li

X-Loco: Towards Generalist Humanoid Locomotion Control via Synergetic Policy Distillation

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

In International Conference on Learning Representations (ICLR), 2026

We propose Align-Then-stEer (ATE), a framework that adapts VLAs to novel robots and tasks through unified latent guidance. ATE can handle significant domain shifts without compromising performance and compatible to Pi0, RDT, and etc.

Yang Zhang , Chenwei Wang , Ouyang Lu , Yuan Zhao , Yunfei Ge , Zhenglong Sun , Xiu Li , Chi Zhang , Chenjia Bai^✉ , Xuelong Li^✉

Align-Then-stEer: Adapting the Vision-Language Action Models through Unified Latent Guidance

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

arXiv preprint arXiv:2512.02834

We propose TACO, a test-time-scaling framework for VLAs that improves inference stability and success rates by preventing distribution shifts at test time.

Siyuan Yang , Yang Zhang , Haoran He , Ling Pan , Xiu Li , Chenjia Bai^✉ , Xuelong Li^✉

Steering Vision-Language-Action Models as Anti-Exploration: A Test-Time Scaling Approach

Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering

arXiv preprint

We propose DVDF, a method for cross-domain offline RL that filters source data by both dynamics and value alignment, achieving strong performance in challenging settings.

Zhongjian Qiao , Rui Yang , Jiafei Lyu , Chenjia Bai , Xiu Li , Zhuoran Yang , Siyang Gao , Shuang Qiu

Cross-Domain Offline Policy Adaptation with Dynamics- and Value-Aligned Data Filtering

KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control

In IEEE International Conference on Robotics & Automation(ICRA), 2026

We present VMS, a unified whole-body controller that enables humanoid robots to learn diverse and dynamic behaviors within a single policy through hybrid tracking and orthogonal mixture of experts.

Jinrui Han , Weiji Xie , Jiakun Zheng , Jiyuan Shi , Weinan Zhang , Ting Xiao , Chenjia Bai^✉

KungfuBot2: Learning Versatile Motion Skills for Humanoid Whole-Body Control

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

In AAAI Conference on Artificial Intelligence (AAAI), 2026 Oral

We propose a framework for adaptive humanoid control via multi-behavior distillation and reinforced fine-tuning, achieving state-of-the-art performance and AAAI-26 Oral recommendation.

Yingnan Zhao , Xinmiao Wang , Dewei Wang , Xinzhe Liu , Dan Lu , Qilong Han , Peng Liu , Chenjia Bai^✉

Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

under review at ARR rounding 2026

We propose the Learn as Individuals, Evolve as a Team (LIET) framework to enable multi-agent LLMs to adapt to embodied environments through individual learning and team evolution

Xinran Li , Chenjia Bai^✉ , Zijian Li , Jiakun Zheng , Ting Xiao , Jun Zhang^✉

Learn as Individuals, Evolve as a Team: Multi-agent LLMs Adaptation in Embodied Environments

Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction

In IEEE International Conference on Multimedia & Expo (ICME), 2026

We propose a novel bimanual foundation policy that leverages text-to-video models to predict robot trajectories and uses optical flow as an intermediate variable to improve generalization.

Chenyou Fan , Fangzheng Yan , Chenjia Bai^✉ , Jiepeng Wang , Chi Zhang , Zhen Wang , Xuelong Li^✉

Towards a Generalizable Bimanual Foundation Policy via Flow-based Video Prediction

MoRE: Mixture of Residual Experts for Humanoid Lifelike Gaits Learning on Complex Terrains

In Pattern Recognition, 2025 (under review)

We propose a novel framework that enables humanoid robots to traverse complex terrains with controllable human-like gaits using a mixture of latent residual experts and multi-discriminators.

Dewei Wang , Xinmiao Wang , Xinzhe Liu , Jiyuan Shi , Yingnan Zhao , Chenjia Bai^✉ , Xuelong Li^✉