[arxiv], [project page], [online demo] [Huggface paper page]
Jiawei Qin1, Xucong Zhang2, Yusuke Sugano1,
1The University of Tokyo, 2Delft University of Technology
This repository contains the official PyTorch implementation of both MAE pre-training and unigaze.
- ✅ Release pre-trained MAE checkpoints (B, L, H) and gaze estimation training code.
- ✅ Release UniGaze models for inference.
- ✅ Code for predicting gaze from videos
- ✅ (2025 June 08 updated) Release the MAE pre-training code.
- ✅ (2025 August 25 updated) Online demo is available.
To install the required dependencies, run:
pip install -r requirements.txtPlease refer to MAE Pre-Training.
For detailed training instructions, please refer to UniGaze Training.
We provide the following trained models:
| Filename | Backbone | Training Data | Checkpoint |
|---|---|---|---|
unigaze_b16_joint.pth.tar |
UniGaze-B | Joint Datasets | Download (Google Drive) |
unigaze_L16_joint.pth.tar |
UniGaze-L | Joint Datasets | Download (Google Drive) |
unigaze_h14_joint.pth.tar |
UniGaze-H | Joint Datasets | Download (Google Drive) |
unigaze_h14_cross_X.pth.tar |
UniGaze-H | ETH-XGaze | Download (Google Drive) |
- You can refer to load_gaze_model.ipynb for instructions on loading the model and integrating it into your own codebase.
- If you want to load the MAE, use
custom_pretrained_patharguments. - If you want to load the UniGaze (MAE + gaze_fc), directly use
load_state_dict
- If you want to load the MAE, use
## Loading MAE-backbone only - this will not load the gaze_fc
mae_h14 = MAE_Gaze(model_type='vit_h_14', custom_pretrained_path='checkpoints/mae_h14/mae_h14_checkpoint-299.pth')
## Loading UniGaze
unigaze_h14_crossX = MAE_Gaze(model_type='vit_h_14') ## custom_pretrained_path does not matter because it will be overwritten by the UniGaze weight
weight = torch.load('logs/unigaze_h14_cross_X.pth.tar', map_location='cpu')['model_state']
unigaze_h14_crossX.load_state_dict(weight, strict=True)To predict gaze direction from videos, use the following script:
projdir=<...>/UniGaze/unigaze
cd ${projdir}
python predict_gaze_video.py \
--model_cfg_path configs/model/mae_b_16_gaze.yaml \
-i ./input_video \
--ckpt_resume logs/unigaze_b16_joint.pth.tarIf you find our work useful for your research, please consider citing:
@article{qin2025unigaze,
title={UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training},
author={Qin, Jiawei and Zhang, Xucong and Sugano, Yusuke},
journal={arXiv preprint arXiv:2502.02307},
year={2025}
}
We also acknowledge the excellent work on MAE.
This model is licensed under the ModelGo Attribution-NonCommercial-ResponsibleAI License, Version 2.0 (MG-NC-RAI-2.0); you may use this model only in compliance with the License. You may obtain a copy of the License at
https://github.com/Xtra-Computing/ModelGo/blob/main/MGL/V2/MG-BY-NC-RAI/LICENSE
A comprehensive introduction to the ModelGo license can be found here: https://www.modelgo.li/
Our method also works for different "faces":
If you have any questions, feel free to contact Jiawei Qin at [email protected].