Thanks to visit codestin.com
Credit goes to baichenjia.cn

Biography


I am a Research Scientist at Institute of Artificial Intelligence (TeleAI), China Telecom and the Director of Embodied AI research center, specialized in the cutting-edge field of Embodied AI and Reinforcement Learning (RL). Our group is dedicated to develop embodied technologies encompassing perception, planning, locomotion, manipulation, and promoting the industrial application of embodied AI. Our group thrives under the leadership of Prof. Xuelong Li, who serves as the dean of TeleAI. Previously, I was a Researcher at Shanghai AI Laboratory, affiliated with IPEC group. My research interests include diffusion/transformer policy, LLM-driven planning, world model, preference learning, RL/MPC-based locomotion, dexterous manipulation, representation learning, sim-to-real, multi-agent collaboration, as well as real-world applications for robot arm, dexterous hand, quadruped robot, and humanoid robot.

I holds a Ph.D. degree in Computer Science from Harbin Institute of Technology (HIT), advised by Prof. Peng Liu. I am fortunate to have been collaborated with many fantastic researchers. I was a joint PhD student at University of Toronto and Vector Institute, working with Prof. Animesh Garg. I also used to be an intern at Huawei Noah’s Ark Lab (advised by Prof. Jianye Hao), Tencent Robotics X (advised by Dr. Lei Han), and Alibaba. I received my Bachelor’s degree and Master’s degree in Computer Science from HIT.

中文简介:白辰甲,博士,现任中国电信人工智能研究院(TeleAI)研究科学家、具身智能研究中心主任,兼任清华大学、上海交通大学、复旦大学行业导师,联合培养专项工程博士。入选第十届中国科协青年“托举”人才,上海市青年科技英才“扬帆计划”,上海市徐汇区“光启”青年人才。负责研发TeleBot系列人形机器人和轮式机器人研发,在技术上构建软硬件一体化、大小脑协同等具身智能体系,实现具身智能体的通用策略学习,推动机器人应用。围绕具身大脑构建了跨本体适配模型、灵巧操作任务规划平台、和数据合成平台;围绕具身小脑构建了首个开源机器人全身仿人运动控制框架、文本驱动的通用小脑、感控一体人形机器人等。学术成果方面,已发表高水平论文80余篇,包括机器学习顶会NeurIPS、ICML、ICLR,人工智能顶刊AI Journal、TPAMI、SCIC,机器人顶会ICRA、CoRL等,撰写《强化学习:前沿算法与应用》专著,由机械工出版社出版。撰写了国内首篇《大模型驱动的具身智能:发展与挑战》综述,下载量2万余次,获中国科学年度最具人气论文。承担国家自然科学基金、国家重点研发计划课题、上海市科委项目、中国电信内部立项等项目。获世界人工智能大会优秀论文提名奖、ICCV多地形人形机器人挑战赛冠军、哈尔滨工业大学优秀博士论文奖等奖项,相关成果受到MIT Technology Review、CCTV等媒体报道。担任顶级会议NeurIPS、ICML、AAMAS、ICME、PRCV领域主席,并担任多个顶级期刊和会议的审稿人。

团队招收具身智能方向全职研究人员、实习生、联培博士生,具体详见链接.

Interests
  • Embodied AI
  • Reinforcement Learning
  • Foundation Model for Decision Making
Education
  • PhD in Computer Science, 2017-2022

    Harbin Institute of Technology

  • Joint PhD Program, 2020-2022

    University of Toronto

Publications

“✉” denotes corresponding author

Quickly discover relevant content by filtering publications.
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
Under Review. 2026
We introduce PRTS, a New Generation of Reinforcement Learning-Native Robotic Vision-Language-Action(VLA) foundation model.
PRTS: A Primitive Reasoning and Tasking System via Contrastive Representations
Re^2MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement
In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) findings, 2026
Re^2MoGen combines LLM reasoning, keyframe-guided completion, and physics-aware RL refinement for open-vocabulary text-to-motion generation.
Re^2MoGen: Open-Vocabulary Motion Generation via LLM Reasoning and Physics-Aware Refinement
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control.
In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2025
We develop a learning framework combining offline diffusion planner and online preference alignment with weak preference labeling for legged locomotion control.
Preference Aligned Diffusion Planner for Quadrupedal Locomotion Control.
Online Preference Alignment for Language Models via Count-based Exploration.
In International Conference on Learning Representations (ICLR), 2025     Spotlight
We propose count-based online preference optimization for LLM alignment that leverages coin-flip counting to encourage exploration in online RLHF.
Online Preference Alignment for Language Models via Count-based Exploration.
On the Value of Myopic Behavior in Policy Reuse.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
We present a framework called Selective Myopic bEhavior Control~(SMEC), which results from the insight that the short-term behaviors of prior policies are sharable across tasks.
On the Value of Myopic Behavior in Policy Reuse.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
In Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track, 2024
We introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods where one needs to transfer policies across different domains with dynamics mismatch.
ODRL: A Benchmark for Off-Dynamics Reinforcement Learning.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
In IEEE International Conference on Robotics and Automation (ICRA), 2024     Oral
We consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion.
Robust Quadrupedal Locomotion via Risk-Averse Policy Learning.
False Correlation Reduction for Offline Reinforcement Learning.
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
We propose falSe COrrelation REduction (SCORE) for offline RL, a practically effective and theoretically provable algorithm.
False Correlation Reduction for Offline Reinforcement Learning.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
In Neural Information Processing Systems (NeurIPS), 2022     Spotlight
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
RORL: Robust Offline Reinforcement Learning via Conservative Smoothing.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We propose monotonic quantile network (MQN) with conservative quantile regression (CQR) for risk-averse policy learning.
Monotonic Quantile Network for Worst-Case Offline Reinforcement Learning.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
IEEE Transactions on Neural Networks and Learning Systems, 2022
We conduct a comprehensive survey on existing exploration methods for both single-agent RL and multiagent RL.
Exploration in Deep Reinforcement Learning: From Single-Agent to Multiagent Domain.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
International Conference on Learning Representations (ICLR), 2022     Spotlight
We propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints.
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
In Neural Information Processing Systems (NeurIPS), 2021
We propose a Dynamic Bottleneck (DB) model, which attains a dynamics-relevant representation based on the information-bottleneck principle.
Dynamic Bottleneck for Robust Self-Supervised Exploration.
Principled Exploration via Optimistic Bootstrapping and Backward Induction.
In International Conference on Machine Learning (ICML), 2021     Spotlight
We propose a principled exploration method for DRL through Optimistic Bootstrapping and Backward Induction (OB2I).
Principled Exploration via Optimistic Bootstrapping and Backward Induction.

Talks

Service

  • Senior Program Committee Member (SPC) / Area Chair (AC) of AAMAS (2024 - 2025) [proof]
  • Area Chair (AC) of Pattern Recognition and Computer Vision (PRCV) (2025 - ) [proof]
  • Industry Program Chair of IEEE International Conference on Multimedia and Expo (ICME) 2026 [proof]
  • Program Committee Members (PC) / Conference Reviewer of RSS (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of NeurIPS (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICLR (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICML (2022 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of AAAI (2021 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ICRA (2024 - 2025)
  • Program Committee Members (PC) / Conference Reviewer of ECAI (2023 - 2025)
  • Journal Reviewer: IEEE Trans. PAMI, IEEE Trans. Cybernetics, IEEE Trans. TNNLS, IEEE Trans. TETCI, IEEE Trans. Intelligent Vehicles, Pattern Recognition.

Experience

 
 
 
 
 
Research Scientist, Lead of Embodied AI Team
TeleAI, China Telecom
2024 – Present China
 
 
 
 
 
Researcher
Shanghai AI Laboratory
2022 – 2024 China
 
 
 
 
 
Joint PhD Student
University of Toronto
2020 – 2022 Canada
 
 
 
 
 
PhD Student
Harbin Institute of Technology
2017 – 2022 China