Egocentric Video-Language Pretraining
Kevin QH. Lin, Alex JP. Wang, M. Soldan, M. Wray, R. Yan, Eric ZC. Xu, D. Gao, R. Tu,
W. Zhao, W. Kong, C. Cai, H. Wang, D. Damen, B. Ghanemå, W. Liu, Mike Z. Shou.
NeurIPS 2022
Spotlight (1.7%)
[project]
[paper]
[EgoVLPv2]
[code]
[poster]
[twitter]
[media]
EgoVis Distinguished Paper Award.
PREMIA Best Student Paper
Award, Gold Award.
Double champions in Ego4D & Epic-Kitchens CVPR 2022
challenges.
|
ShowUI: One Vision-Language-Action
Model for GUI Visual Agent
Kevin QH. Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan WX.
Lei, Lijuan Wang, Mike Z. Shou.
CVPR 2025
[paper]
[code]
[huggingface]
[dataset]
[demo]
[twitter]
#1 Huggingface daily paper.
Oral talk and Outstanding Paper Award, NeurIPS Open-World Agents Workshop
2024.
The model has been downloaded for over 240,000 times. 1.6K github stars.
|
VLog: Video-Language Models by Generative
Retrieval of Narration Vocabulary
Kevin QH. Lin, Mike Z. Shou.
CVPR 2025
[paper]
[code]
[twitter]
580+ github stars.
|
UniVTG: Towards Unified Video-Language
Temporal Grounding
Kevin QH. Lin, Pengchuan Zhang, Joya Chen, Shraman Pramanick, Difei Gao, Alex JP.
Wang, Rui Yan, Mike Z. Shou.
ICCV 2023
[paper]
[code]
[demo]
[twitter]
370+ github stars.
|
Learning Video Context as Interleaved
Multimodal Sequences Kevin QH. Lin, Pengchuan Zhang, Difei Gao,
Xide Xia, Joya Chen, Ziteng Gao, Jinheng Xie, Xuhong Xiao, Mike Z. Shou.
ECCV 2024
[paper]
[code]
|
VideoGUI: A Benchmark for GUI Automation
from Instructional Videos
Kevin QH. Lin, Linjie Li, Difei Gao, Qinchen Wu, Mingyi Yan, Zhengyuan Yang, Lijuan
Wang, Mike Z. Shou.
NeurIPS 2025 Spotlight
[project]
[paper]
[code]
[twitter]
|
VideoMind: A Chain-of-LoRA Agent for Long
Video Reasoning
Ye Liu†, Kevin QH. Lin†, Chang Wen Chen, Mike Z. Shou.
ICLR 2026
NeurIPS LAW workshop, 2025. Spotlight
[project]
[paper]
[code]
[dataset]
[demo]
[twitter]
|
Think or Not? Selective Reasoning via
Reinforcement Learning for Vision-Language Models
Jiaqi Wang†, Kevin QH. Lin†, James
Cheng, Mike Z. Shou.
NeurIPS 2025
[paper]
[code]
[huggingface]
[twitter]
|
Paper2Poster: Towards Multimodal Poster
Automation from Scientific Papers
Wei Pang†, Kevin QH. Lin†, Xiangru
Jian†, Xi He, Philip Torr.
NeurIPS 2025
[project]
[paper]
[code]
[datasets]
[demo]
[poster]
[twitter]
3K github stars. 1.2K twitter likes.
Oral talk at ICML 2025 MAS workshop
|
Paper2Video: Automatic Video Generation from Scientific Papers
Zeyu Zhu†, Kevin QH. Lin†, Mike Z. Shou.
Preprint 2025
[project]
[paper]
[code]
[dataset]
[twitter]
#2 Huggingface daily paper.
1.9K github stars. 1M+ twitter views. Highlighted by YC Hacker News
|
Code2Video: A Code-centric Paradigm for Educational Video Generation
Yanzhe Chen†, Kevin QH. Lin†, Mike Z. Shou.
Preprint 2025
[project]
[paper]
[code]
[dataset]
[twitter]
1.4K github stars.
|
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie†, Weijia Mao†, Zechen Bai†, David JH. Zhang†, Weihao Wang, Kevin QH. Lin, Yuchao Gu, Zhijie Chen, Zhenheng Yang, Mike Z. Shou.
ICLR 2025
[project]
[paper]
[code]
[huggingface]
[demo]
[twitter]
1.8K github stars.
Most Influential ICLR Papers #4
|