SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360º videos

Code and models for “SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360º videos” (PDF)

Edurne Bernal, Daniel Martín, Diego Gutierrez, and Belen Masia. Computers & Graphics

Abstract

Virtual reality (VR) has the potential to change the way people consume content, and has been predicted to become the next big computing paradigm. However, much remains unknown about the grammar and visual language of this new medium, and understanding and predicting how humans behave in virtual environments remains an open problem. In this work, we propose a novel saliency prediction model which exploits the joint potential of spherical convolutions and recurrent neural networks to extract and model the inherent spatio-temporal features from 360º videos. We employ Convolutional Long Short-Term Memory cells (ConvLSTMs) to account for temporal information at the time of feature extraction rather than to post-process spatial features as in previous works. To facilitate spatio-temporal learning, we provide the network with an estimation of the optical flow between 360º frames, since motion is known to be a highly salient feature in dynamic content. Our model is trained with a novel spherical Kullback–Leibler Divergence (KLDiv) loss function specifically tailored for saliency prediction in 360º content. Our approach outperforms previous state-of-the-art works, being able to mimic human visual attention when exploring dynamic 360º videos.

Visit our website for more information and supplementary material.

Requirements

The code has been tested with:

matplotlib==3.3.4 
numba==0.53.1 
numpy==1.20.1
opencv_python==4.5.4.58 
Pillow==9.1.1 
scipy==1.6.2 
torch==1.5.1+cu92 
torchvision==0.6.1+cu92 tqdm

Download our model

Our model can be found in the models folder. SST-Sal predicts saliency maps for 360º videos from their RGB frames and their optical flow estimations. A less accurate variation of our model that does not require optical flow information can also be found in the same folder. Please refer to section 4.4.4 of our paper for more information regarding the performance of our model without optical flow.

Perform inference with your own videos

SST-Sal

To perform inference with our model, modify the inference parameters and data loader sections in config.py, and use:

inference_model = 'models/SST_Sal.pth'
of_available = True

SST-Sal requires that you first obtain the optical flow estimations from your own videos. You can use utils.py/frames_extraction(config.videos_folder) to extract the different frames, and employ RAFT to obtain the optical flow estimation associated with each frame. Please make sure that the name of the frame and its optical flow estimation are identical. Both RGB image names are expected to have the following numerical format: videonumber_framenumber.png (i.g., 0001_1023.png).

Once you have the extracted frames and the associated optical flow estimations, you can run inference.py to obtain the predicted saliency in config.results_dir folder.

python inference.py

Note: To extract the optic flow with RAFT, we recommend using the highest quality image possible (e.g., 2048x1080), and the raft-sintel.pth model to obtain the best result

SST-Sal without optical flow estimations

If you do not have the optical flow estimations of your videos, you can try our less accurate alternative of SST-Sal. Modify the inference parameters and data loader sections in config.py, and use:

inference_model = 'models/SST_Sal_wo_OF.pth'
of_available = False

Then run inference.py to obtain the saliency predictions

python inference.py

Cite

If you use this work, please consider citing our paper with the following Bibtex code:

@article{BERNALBERDUN2022200,
title = {SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360$^{\circ}$ videos},
author = {Edurne Bernal-Berdun and Daniel Martin and Diego Gutierrez and Belen Masia}
journal = {Computers & Graphics},
volume = {106},
pages = {200-209},
year = {2022},
issn = {0097-8493},
doi = {https://doi.org/10.1016/j.cag.2022.06.002},
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
data		data
models		models
spherenet		spherenet
DataLoader360Video.py		DataLoader360Video.py
Modules.py		Modules.py
README.md		README.md
config.py		config.py
inference.py		inference.py
models.py		models.py
requirements.txt		requirements.txt
sphericalKLDiv.py		sphericalKLDiv.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360º videos

Abstract

Requirements

Download our model

Perform inference with your own videos

SST-Sal

SST-Sal without optical flow estimations

Cite

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

edurnebernal/SST-Sal

Folders and files

Latest commit

History

Repository files navigation

SST-Sal: A spherical spatio-temporal approach for saliency prediction in 360º videos

Abstract

Requirements

Download our model

Perform inference with your own videos

SST-Sal

SST-Sal without optical flow estimations

Cite

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages