UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training

[arxiv], [project page], [online demo] [Huggface paper page]

Jiawei Qin¹, Xucong Zhang², Yusuke Sugano¹,

¹The University of Tokyo, ²Delft University of Technology

Overview

This repository contains the official PyTorch implementation of both MAE pre-training and unigaze.

Todo:

✅ Release pre-trained MAE checkpoints (B, L, H) and gaze estimation training code.
✅ Release UniGaze models for inference.
✅ Code for predicting gaze from videos
✅ (2025 June 08 updated) Release the MAE pre-training code.
✅ (2025 August 25 updated) Online demo is available.

Installation

To install the required dependencies, run:

pip install -r requirements.txt

Pre-training (MAE)

Please refer to MAE Pre-Training.

Training (Gaze Estimation)

For detailed training instructions, please refer to UniGaze Training.

Usage of UniGaze

Available Models

We provide the following trained models:

Filename	Backbone	Training Data	Checkpoint
`unigaze_b16_joint.pth.tar`	UniGaze-B	Joint Datasets	Download (Google Drive)
`unigaze_L16_joint.pth.tar`	UniGaze-L	Joint Datasets	Download (Google Drive)
`unigaze_h14_joint.pth.tar`	UniGaze-H	Joint Datasets	Download (Google Drive)
`unigaze_h14_cross_X.pth.tar`	UniGaze-H	ETH-XGaze	Download (Google Drive)

Loading Pretrained Models

You can refer to load_gaze_model.ipynb for instructions on loading the model and integrating it into your own codebase.
- If you want to load the MAE, use custom_pretrained_path arguments.
- If you want to load the UniGaze (MAE + gaze_fc), directly use load_state_dict

## Loading MAE-backbone only - this will not load the gaze_fc
mae_h14 = MAE_Gaze(model_type='vit_h_14', custom_pretrained_path='checkpoints/mae_h14/mae_h14_checkpoint-299.pth')

## Loading UniGaze
unigaze_h14_crossX = MAE_Gaze(model_type='vit_h_14') ## custom_pretrained_path does not matter because it will be overwritten by the UniGaze weight
weight = torch.load('logs/unigaze_h14_cross_X.pth.tar', map_location='cpu')['model_state']
unigaze_h14_crossX.load_state_dict(weight, strict=True)

Predicting Gaze from Videos

To predict gaze direction from videos, use the following script:

projdir=<...>/UniGaze/unigaze
cd ${projdir}
python predict_gaze_video.py \
    --model_cfg_path configs/model/mae_b_16_gaze.yaml  \
    -i ./input_video \
    --ckpt_resume logs/unigaze_b16_joint.pth.tar

Citation

If you find our work useful for your research, please consider citing:

@article{qin2025unigaze,
  title={UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training},
  author={Qin, Jiawei and Zhang, Xucong and Sugano, Yusuke},
  journal={arXiv preprint arXiv:2502.02307},
  year={2025}
}

We also acknowledge the excellent work on MAE.

License:

This model is licensed under the ModelGo Attribution-NonCommercial-ResponsibleAI License, Version 2.0 (MG-NC-RAI-2.0); you may use this model only in compliance with the License. You may obtain a copy of the License at

https://github.com/Xtra-Computing/ModelGo/blob/main/MGL/V2/MG-BY-NC-RAI/LICENSE

A comprehensive introduction to the ModelGo license can be found here: https://www.modelgo.li/

Beyond human eye gaze estimation:

Our method also works for different "faces":

Contact

If you have any questions, feel free to contact Jiawei Qin at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 88 Commits
MAE		MAE
docs		docs
facedata_preparation		facedata_preparation
unigaze		unigaze
LICENSE.txt		LICENSE.txt
README.md		README.md
gaze_cat_dog.png		gaze_cat_dog.png
requirements.txt		requirements.txt
teaser.png		teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training

Overview

Todo:

Installation

Pre-training (MAE)

Training (Gaze Estimation)

Usage of UniGaze

Available Models

Loading Pretrained Models

Predicting Gaze from Videos

Citation

License:

Beyond human eye gaze estimation:

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

ut-vision/UniGaze

Folders and files

Latest commit

History

Repository files navigation

UniGaze: Towards Universal Gaze Estimation via Large-scale Pre-Training

Overview

Todo:

Installation

Pre-training (MAE)

Training (Gaze Estimation)

Usage of UniGaze

Available Models

Loading Pretrained Models

Predicting Gaze from Videos

Citation

License:

Beyond human eye gaze estimation:

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages