Thanks to visit codestin.com
Credit goes to chenxinli001.github.io

Chenxin Li | 李宸鑫

Hi! I’m Chenxin “Jason” Li, a final-year Ph.D. candidate at The Chinese University of Hong Kong (CUHK). I work on 🧠 multimodal LLM, 🤖 reasoning/agent via RL, and 🌍 world model.

I am currently interning at ByteDance Seed, scaling VLM via reasoning/agentic RL. I built hands-on experience in (i) scaling multimodal models (data, architecture, training, benchmarking) and (ii) post-training via RL (reasoning, multi-turn agent, reward modeling and shaping). Previously, I did internships at Tencent AI, Ant Ling and Hedra AI etc. and research visits with UT Austin and UMD.

I anticipate graduating in the summer of 2026 and am interested in industrial positions (Profile). Please feel free to reach out via email ([email protected]) or WeChat (jasonchenxinli).

LinkedIn | Google Scholar Scholar | GitHub | X

profile photo
📑 Selected Work
* Equal contribution, † Project Leader, ‡ Corresponding author
Seed-1.8
Seed-1.8: Towards Generalized Real-World Agency
ByteDance Seed Team

[Project] [Model Card]

Contributed to multimodal generative reward model (GRM) and agentic capacities (tooluse, visual code sandbox, GUI grounding/clicking) via RL.

UI-TARS-2
UI-TARS-2: Advancing GUI Agent with Multi-Turn Reinforcement Learning
ByteDance Seed Team

[Project] [Report]

Contributed to GUI grounding and visual referring capabilities.

Ling
Ling: Open-sourced LLM with MoE Architecture by InclusionAI
Ant Group InclusionAI Team

[Project]

Contributed to long-context memory RL and hallucination verifiers.

Hedra Character-3
Hedra Character-3: A New Generation of AI-Native Video Creation
Hedra AI Team

[Project] [Announcement]

Contributed to omnimodal (audio, image, pose) injection and MMDiT architecture implementation.

IR3D-Bench Framework
IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering
Parker Liu*, Chenxin Li*, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyuan Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng
NeurIPS 2025

[Project] [Paper] [Code]

Evaluating scene understanding capabilities of VLM via inverse rendering tasks.

InfoBridge Framework
InfoBridge: Balanced Multimodal Alignment by Maximizing Cross-modal Conditional Mutual Information
Chenxin Li, Yifan Liu, Xinyu Liu, Wuyang Li, Hengyu Liu, Cheng Wang, Weihao Yu, Yunlong Lin, Yixuan Yuan
ICCV 2025

[Project] [Paper] [Code]

Enhanced multimodal alignment by maximizing cross-modal conditional mutual information.

U-KAN Framework
U-KAN: U-KAN Makes Strong Backbone for Image Segmentation and Generation
Chenxin Li*, Xinyu Liu*, Wuyang Li*, Cheng Wang*, Hengyu Liu, Yixuan Yuan
AAAI 2025

[Project] [Paper] [Code] 🏆 Top 1 most influential papers in AAAI 2025

Integrating Linear Attention mechanism like KAN into vision backbone

JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration
Yunlong Lin*, Zixu Lin*, Haoyu Chen*, Panwang Pan*, Chenxin Li, Sixiang Chen, Kairun Wen, Yeying Jin, Wenbo Li, Xinghao Ding‡
CVPR 2025

[Project] [Paper] [Code]

JarvisIR is a VLM-powered intelligent system that dynamically schedules expert models for restoration.

JarvisArt
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent
Yunlong Lin, Zixu Lin, Kunjie Lin, Jinbin Bai, Panwang Pan, Chenxin Li, Haoyu Chen, Zhongdao Wang, Xinghao Ding†, Wenbo Li, Shuicheng Yan†
Preprint 2025

[Project] [Paper] [Code]

VLM-powered agentic photo retouching system that orchestrates expert models for professional-grade image editing.

EMNLP 2024 VLM fine-tuning
Visual Large Language Model Fine-Tuning via Simple Parameter-Efficient Modification
Mengjiao Li, Zhiyuan Ji, Chenxin Li†, Lianliang Nie, Zhiyang Li, Masashi Sugiyama
EMNLP 2024

[Project] [Paper] [Code]

Simple yet efficient fine-tuning strategy for VLM.

🧑‍💻 Selected Experience

Internship on (i) scaling multimodal models and (ii) RL post-training (reasoning, agents, reward modeling) on 1k~10k scale GPU clusters:

  • ByteDance Seed: VLM scaling via reasoning/agentic RL
  • Tencent AI: World model simulation via Blender agent
  • Ant Ling: Long-context memory RL, hallucination verifiers
  • Hedra AI: Omnimodal (audio, image, pose) injection for video generation
ScholaGO

ScholaGO (Co-founder): LLM-backend Education Startup

Co-founded ScholaGO Education Technology Company Limited (学旅通教育科技有限公司) to build LLM-powered education products that turn static content into immersive, interactive, multimodal learning experiences. Grateful to receiving funding from HKSTP, HK Tech 300, and Alibaba Cloud.

💼 Professional Activities
  • Workshop Organizer: AIM-FM: Advancements In Foundation Models Towards Intelligent Agents (NeurIPS 2024)
  • Talks: "UKAN" at VALSE Summit (Jun 2025) and DAMTP, University of Cambridge (Jul 2024)
  • Conference Reviewer: ICLR, NeurIPS, ICML, CVPR, ICCV, ECCV, EMNLP, AAAI, ACM MM, MICCAI, BIBM
  • Journal Reviewer: Nature Machine Intelligence, PAMI, TIP, DMLR, PR, TNNLS
🌟 Beyond Work
📚 Reading: I dedicate substantial time to reading, especially history, philosophy, and sociology, which shapes my perspective on what AGI should be from first principles.

📈 Investment: Investment is real-world RL: returns provide fast feedback to iteratively improve individual decision policy. Recently, I am fascinated by the idea that how to (i) build benchmarks for LLMs that quantify real-world investment utility (in the similar spirit of GPT-5.2’s gdpeval benchmark), and (ii) extending quantitative financial metrics to more general event and trend forecasting.