Temporal Sentiment Localization: Listen and Look in Untrimmed Videos

Zhicheng Zhang and Jufeng Yang

Key motivation: a video may convey multiple sentiments and each sentiment appears with varying lengths and locations. Images come from "The Wolf of Wall Street".

This repository contains the official implementation of our work in ACM MM 2022. TSL-300 dataset and pytorch training/validation code for weakly-supervised framework TSL-Net are released. More details can be viewed in our paper. [PDF] [Video]

Abstract

Video sentiment analysis aims to uncover the underlying attitudes of viewers, which has a wide range of applications in real world. Existing works simply classify a video into a single sentimental category, ignoring the fact that sentiment in untrimmed videos may appear in multiple segments with varying lengths and unknown locations. To address this, we propose a challenging task, i.e., Temporal Sentiment Localization (TSL), to find which parts of the video convey sentiment. To systematically investigate fully- and weakly-supervised settings for TSL, we first build a benchmark dataset named TSL-300, which is consisting of 300 videos with a total length of 1,291 minutes. Each video is labeled in two ways, one of which is frame-by-frame annotation for the fully-supervised setting, and the other is single-frame annotation, i.e., only a single frame with strong sentiment is labeled per segment for the weakly-supervised setting. Due to the high cost of labeling a densely annotated dataset, we propose TSL-Net in this work, employing single-frame supervision to localize sentiment in videos. In detail, we generate the pseudo labels for unlabeled frames using a greedy search strategy, and fuse the affective features of both visual and audio modalities to predict the temporal sentiment distribution. Here, a reverse mapping strategy is designed for feature fusion, and a contrastive loss is utilized to maintain the consistency between the original feature and the reverse prediction. Extensive experiments show the superiority of our method against the state-of-the-art approaches.

Dependencies

You can set up the environments by using pip3 install -r requirements.txt.

Recommended Environment

Python 3.6.13
Pytorch 1.10.2
CUDA 11.3

TSL-300 dataset

If you need the TSL-300 dataset for academic purposes, please download the application form and fill out the request information, then send it to [email protected]. We will process your application as soon as possible. Please make sure that the email used comes from your educational institution.

Data Preparation

Prepare TSL-300 dataset.
- We have provided constructed dataset and pre-extracted features.
Extract features with two-stream I3D networks
- We recommend extracting features using this repo.
- For convenience, we provide the features we used, which is also included in our dataset.
- Link the features folder by using sudo ln -s path-to-feature ./dataset/VideoSenti/.
Place the features inside the dataset folder.
- Please ensure the data structure is as below.

├── dataset
   └── VideoSenti
       ├── gt.json
       ├── split_train.txt
       ├── split_test.txt
       ├── fps_dict.json
       ├── time.json
       ├── videosenti_gt.json
       ├── point_gaussian
           └── point_labels.csv
           ├── train
       └── features
           ├── train
               ├── rgb
                   ├── 1_Ekman6_disgust_3.npy
                   ├── 2_Ekman6_joy_1308.npy
                   └── ...
               └── logmfcc
                   ├── 1_Ekman6_disgust_3.npy
                   ├── 2_Ekman6_joy_1308.npy
                   └── ...
           └── test
               ├── rgb
                   ├── 9_CMU_MOSEI_lzVA--tIse0.npy
                   ├── 17_CMU_MOSEI_CbRexsp1HKw.npy
                   └── ...
               └── logmfcc
                   ├── 9_CMU_MOSEI_lzVA--tIse0.npy
                   ├── 17_CMU_MOSEI_CbRexsp1HKw.npy
                   └── ...

Model Zoo

Metric	mAP@ 0.1	mAP@ 0.2	mAP@ 0.3	mAP@AVG	Recall@AVG	F2@AVG	Url
TSL-Net	27.27	20.53	12.06	19.85	75.24	33.69	Baidu drive
							Google drive

Running

You can easily train and evaluate the model by running the script below.

You can include more details such as epoch, batch size, etc. Please refer to options.py.

$ bash run_train.sh

Evaulation

The pre-trained model can be found in pretrained model.

You can evaluate the model by running the command below.

$ bash run_eval.sh

References

We referenced the repos below for the code.

Citation

If you find this repo useful in your project or research, please consider citing the relevant publication.

@inproceedings{zhang2022temporal,
  title={Temporal Sentiment Localization: Listen and Look in Untrimmed Videos},
  author={Zhang, Zhicheng and Yang, Jufeng},
  booktitle={Proceedings of the 30th ACM International Conference on Multimedia},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
__pycache__		__pycache__
assests		assests
dataset/VideoSenti		dataset/VideoSenti
eval		eval
scripts		scripts
LICENSE		LICENSE
README.md		README.md
config.py		config.py
main.py		main.py
main_eval.py		main_eval.py
model_tsl.py		model_tsl.py
options.py		options.py
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh
run_train.sh		run_train.sh
senti_features.py		senti_features.py
test_all.py		test_all.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

Temporal Sentiment Localization: Listen and Look in Untrimmed Videos

Abstract

Dependencies

Recommended Environment

TSL-300 dataset

Data Preparation

Model Zoo

Running

Evaulation

References

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

License

Uh oh!

nku-zhichengzhang/TSL300

Folders and files

Latest commit

History

Repository files navigation

Temporal Sentiment Localization: Listen and Look in Untrimmed Videos

Abstract

Dependencies

Recommended Environment

TSL-300 dataset

Data Preparation

Model Zoo

Running

Evaulation

References

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages