|
My research focuses on building efficient and scalable machine learning systems. Specifically, I develop
full-stack infrastructure that pushes the efficiency frontier across the foundation-model lifecycle, spanning
datacenter scheduling, large-scale pre-training,
post-training with reinforcement learning, and model
serving.
My work emphasizes algorithm–system co-design
for emerging workloads (long-context, multimodal,
reasoning, agentic), and extends to broader system scenarios (networking, robotics).
|
[ASPLOS '26]
|
Taming the Long-Tail: Efficient Reasoning RL Training with Adaptive Drafter
Qinghao Hu*, Shang Yang*, Junxian Guo, Xiaozhe Yao, Yujun Lin, Yuxian Gu, Han
Cai, Chuang Gan, Ana Klimovic, Song Han
Paper / Code
|
[EuroSys '26]
|
Zeppelin: Balancing Variable-length Workloads in Data Parallel Large Model Training
Chang Chen, Tiancheng Chen, Jiangfei Duan, Qianchao Zhu, Zerui Wang, Qinghao Hu,
Peng Sun, Xiuhong Li, Chao Yang, Torsten Hoefler
Paper
|
[NeurIPS '25]
|
Jet-Nemotron: Efficient Language Model with Post Neural Architecture Search
Yuxian Gu, Qinghao Hu, Shang Yang, Haocheng Xi, Junyu Chen, Song Han, Han Cai
Paper /
Code
|
[NeurIPS '25]
|
Scaling up Reasoning to Long Videos in VLMs
Yukang Chen, Wei Huang, Baifeng Shi, Qinghao Hu, Hanrong Ye, Ligeng Zhu, Zhijian
Liu, Pavlo Molchanov, Jan Kautz, Xiaojuan Qi, Sifei Liu, Hongxu Yin, Yao Lu, Song Han
Paper /
Code
|
[SOSP '25]
|
Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters
Foteini Strati, Zhendong Zhang, George Manos, Ixeia Sánchez Périz, Qinghao Hu,
Tiancheng Chen, Berk Buzcu, Song Han, Pamela Delgado, Ana Klimovic
Paper /
Code
|
[MLSys '25]
|
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
Shang Yang*, Junxian Guo*, Haotian Tang, Qinghao Hu, Guangxuan Xiao, Jiaming Tang,
Yujun Lin, Zhijian Liu, Yao Lu, Song Han
Paper /
Code
|
[ICLR '25]
|
LongVILA: Scaling Long-Context Visual Language Models for Long Videos
Yukang Chen*, Fuzhao Xue*, Dacheng Li*, Qinghao Hu*, Ligeng Zhu, Xiuyu Li, Yunhao
Fang, Haotian Tang, Shang Yang, Zhijian Liu, Ethan He, Hongxu Yin, Pavlo Molchanov, Jan Kautz, Linxi Fan,
Yuke Zhu, Yao Lu, Song Han
Paper /
Code
|
[EuroSys '25]
|
DeltaServe: Multi-Tenant Language Model Serving via Delta Compression
Xiaozhe Yao, Qinghao Hu, Ana Klimovic
Paper /
Code
|
[NSDI '24]
|
Characterization of Large Language Model Development in the Datacenter
Qinghao Hu*, Zhisheng Ye*, Zerui Wang*, Guoteng Wang, Meng Zhang, Qiaoling
Chen, Peng Sun, Dahua Lin, Xiaolin Wang, Yingwei Luo, Yonggang
Wen, Tianwei Zhang
Paper /
System /
Data /
USENIX
;login:
|
[SC '24]
|
TorchGT: A Holistic System for Large-scale Graph Transformer Training
Meng Zhang*, Jie Sun*, Qinghao Hu, Peng Sun, Zeke Wang,
Yonggang Wen, Tianwei Zhang
Paper /
Code /
Artifact Badges: Available
Functional
Reproduced
|
[ICDE '24]
|
Sylvie: 3D-adaptive and Universal System for Large-scale Graph Neural Network Training
Meng Zhang, Qinghao Hu, Cheng Wan, Haozhao Wang, Peng Sun, Yonggang Wen,
Tianwei Zhang
Paper /
Code
|
[CSUR '24]
|
Deep Learning Workload Scheduling in GPU Datacenters: A Survey
Zhisheng Ye*, Wei Gao*, Qinghao Hu*, Peng Sun, Xiaolin Wang, Yingwei Luo,
Tianwei Zhang, Yonggang Wen
Paper /
Awesome List /
ACM Computing Surveys
|
[OSDI '23]
|
Hydro: Surrogate-Based Hyperparameter Tuning Service in Datacenters
Qinghao Hu, Zhisheng Ye, Meng Zhang, Qiaoling Chen, Peng Sun, Yonggang
Wen, Tianwei Zhang
Paper /
Code /
Artifact Badges: Available
Functional
Reproduced
|
[ASPLOS '23]
|
Lucid: A Non-Intrusive, Scalable and Interpretable Scheduler for Deep Learning Training Jobs
Qinghao Hu*, Meng Zhang*, Peng Sun, Yonggang Wen, Tianwei Zhang
Paper /
Code /
Artifact Badges: Available
Functional
Reproduced
Distinguished Paper Award
|
[ATC '22]
|
Primo: Practical Learning-Augmented Systems with Interpretable Models
Qinghao Hu, Harsha Nori, Peng Sun, Yonggang Wen, Tianwei Zhang
Paper /
Code /
Artifact Badges: Available
Functional
Reproduced
|
[SC '21]
|
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU
Datacenters
Qinghao Hu, Peng Sun, Shengen Yan, Yonggang Wen, Tianwei Zhang
Paper /
Code /
Data /
Artifact Badges: Available
Functional
Reproduced
|
[arXiv '24]
|
LoongTrain: Efficient Training of Long-Sequence LLMs with Head-Context Parallelism
Diandian Gu, Peng Sun, Qinghao Hu, Ting Huang, Xun Chen, Yingtong Xiong,
Guoteng Wang, Qiaoling Chen, Shangchun Zhao, Jiarui Fang, Yonggang Wen, Tianwei Zhang, Xin Jin, Xuanzhe
Liu
Paper /
Submitted to a Conference
|
[arXiv '24]
|
InternEvo: Efficient Long-Sequence Large Language Model Training via Hybrid Parallelism and
Redundant Sharding
Qiaoling Chen, Diandian Gu, Guoteng Wang, Xun Chen, Yingtong Xiong, Ting Huang, Qinghao Hu, Xin Jin, Yonggang Wen, Tianwei Zhang, Peng Sun
Liu
Paper /
Submitted to a Conference
|
[arXiv '23]
|
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Qiaoling Chen, Qinghao Hu, Zhisheng Ye, Guoteng Wang, Peng Sun, Yonggang
Wen, Tianwei Zhang
Paper /
Submitted to a Conference
|
|
[CVPR '25]
|
Workshop: Efficient Large Vision Models Workshop (ELVM)
|
|
[MICRO '24]
|
Workshop: Hardware and Architectural Support for Security and Privacy (HASP)
|
|
[ICLR '26]
|
Reviewer
|
|
[CVPR '26]
|
Reviewer
|
|
[ICLR '25]
|
Reviewer
|
|
[EuroSys '23-'25]
|
Shadow Committee Member
|
|
[OSDI '22]
|
AE Committee Member
|
|
[ATC '22]
|
AE Committee Member
|
|
[EuroSys '22]
|
AE Committee Member
|
|
[SOSP '21]
|
AE Committee Member
|
|
[TPDS]
|
IEEE Transactions on Parallel and Distributed Systems
|
|
[TACO]
|
ACM Transactions on Architecture and Code Optimization
|
|
[TOCS]
|
ACM Transactions on Computer Systems
|
|
[CSUR]
|
ACM Computing Surveys
|
| Rising Star in ML and Systems |
2024
|
| Best Ph.D. Thesis Award |
2024
|
| National Scholarship for Outstanding Graduates |
2024
|
| Google PhD Fellowship |
2023
|
| Distinguished Paper Award of ASPLOS |
2023
|
| Best Paper Award of WAIC |
2023
|
| Best Undergraduate Thesis Award |
2018
|
| Outstanding Graduates of Zhejiang University |
2018
|
|