Abstract_Goal

Variational modeling of abstract goal for next action anticipation

This code accompines the paper Predicting the Next Action by Modeling the Abstract Goal
Ranked 1 on EK55 by a large margin - See https://competitions.codalab.org/competitions/20071#results (username: royd, team name: IHPC-AISG-LAHA)
Ranked 2 on EK100 action and noun and 3rd on verb on Unseen Kitchens
Best results on EGTEA, see paper

Features

Download RGB, Flow and OBJ features from RULSTM project, specifically this script https://github.com/fpv-iplab/rulstm/blob/master/RULSTM/scripts/download_data_ek55.sh The data is now in <pwd>/data/ek55/<rgb><flow><obj>
Download EK55 annotations and suppose you save in <annot>
Change the path of the following variables in main.py
- train_ann_file = '<annot>/data/ek55/training.csv'
- val_ann_file = '<annot>/data/ek55/validation.csv'
- test_ann_files = ['<annot>/data/ek55/test_seen.csv', '<annot>/data/ek55/test_unseen.csv']
- paths = { 'rgb': '<pwd>/data/ek55/rgb', 'flow': '<pwd>/data/ek55/flow', 'obj': '<pwd>/data/ek55/obj'}

Training

We train two models - one for verb and another for noun using each feature RGB/FLOW/OBJ to get 6 models.
Example, train a verb model with RGB features

python main.py --modality rgb --dataset ek55 --outputs verb --obs_sec 2 --ant_sec 1.0 --latent_dim 128 --num_act_cand 10 --num_goal_cand 3 --hidden_dim 256 --n_layers 1 --nepochs 15 --losses og na ng oa gc --scheduler none --batch_size 256 --sampling 10

Explanation of training options
- --modality, choices=['rgb', 'flow', 'obj' ], 'Choose tsn (rgb or flow) or obj or a combination for fusion'
- --dataset, choices=['ek55', 'ek100', 'egtea'], 'Choose between EK55, EK100 and EGTEA'
- --outputs, choices=['verb', 'noun', 'act'], Choose between verb or noun (for all datasets) or act (only for EGTEA)
- --obs_sec, default=2, 'Choose observed duration in secs 1-6'
- --ant_sec, choices=[2, 1.75, 1.5, 1.25, 1, 0.75, 0.5, 0.25], 'Choose anticipation time' - (1 for EK55 and EK100, 0.5 for EGTEA)
- --latent_dim, default=128, 'Choose latent dimension for observed and next goal distribution'
- --num_act_cand, default=10, 'Choose number of candidates to sample for next verb/noun'
- --num_goal_cand, default=3, 'Choose number of abstract goals to sample'
- --hidden_dim, type=int, default=256, 'Choose hidden dimension for RNN'
- --n_layers, default=1, help='Choose # layers in RNN for next visual feature
- --dropout, default=0.8, 'Dropout rate'
- --sampling, default=10, 'Choose sampling freq of input features' (10 means 30 fps/10 = 3fps)
- --nepochs, default=10, 'Choose number of training epochs'
- --scheduler, default='none', choices=['cosine', 'none'], 'Choose scheduler - cosineLR or none (for AdamW optimizer)'
- --batch_size, default=256, choices=[32, 64, 128, 256], 'Choose batch_size'
- --losses', choices=[og', 'na', 'ng', 'oa', 'gc'], 'Choose losses'
  - og - KLD for observed goal
  - na - CE for next action(required)
  - ng - KLD for next goal(required)
  - oa - CE for observed action
  - gc - symmetric KLD between observed and next goal

Testing on test set

Train all the 6 models - 3 for verb using RGB, Flow and Obj and 3 for noun using RGB, Flow and Obj
- hidden_dim=256 for verb models and hidden_dim=1024 for noun models
Run main.py as follows

python main.py --verb_fusion --verb_modes rgb flow obj --verb_weights 0.45 0.45 0.1 --noun_fusion --noun_modes rgb flow obj --noun_weights 0.45 0.1 0.45 --dataset ek55 --obs_sec 2 --ant_sec 1.0 --num_act_cand 10 --num_goal_cand 3 --n_layers 1 --losses og na ng oa gc --scheduler none --batch_size 256 --sampling 10 --challenge
Exaplanation of challenge options
- --challenge, 'Generate json on test set'
- --verb_fusion, 'Late fusion for verb'
- --verb_modes, choices=['rgb', 'flow', 'obj'], 'Modality models for verb'
- --verb_weights, 'Late fusion weights for model outputs in the order of verb_modes (best - 0.45, 0.45, 0.1)'
- --noun_fusion, 'Late fusion for noun')
- --noun_modes, choices=['rgb', 'flow', 'obj'], 'Modality models for noun'
- --noun_weights, 'Late fusion weights for model outputs in the order of verb_modes (best - 0.45, 0.1, 0.45)'

Citation

If you find this code useful and use it in your project, please cite this work

@article{roy2022predicting,
  title={Predicting the Next Action by Modeling the Abstract Goal},
  author={Roy, Debaditya and Fernando, Basura},
  journal={arXiv preprint arXiv:2209.05044},
  year={2022}
}

@inproceedings{roy2022predicting,
  title={Predicting the Next Action by Modeling the Abstract Goal},
  author={Roy, Debaditya and Fernando, Basura},
  booktitle={ICPR},
  year={2024}
}

Acknowledgment

This research/project is supported by the National Research Foundation, Singapore, under its NRF Fellowship (Award# NRF-NRFF14-2022-0001). This research is also supported by funding allocation to B.F. by the Agency for Science, Technology and Research (A*STAR) under its SERC Central Research Fund (CRF), as well as its Centre for Frontier AI Research (CFAR).

In case of issues, please write to debadityaroy5555 at gmail dot com

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
README.md		README.md
dataset.py		dataset.py
evaluation.py		evaluation.py
main.py		main.py
models.py		models.py
training.py		training.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Abstract_Goal

Features

Training

Testing on test set

Citation

Acknowledgment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

LUNAProject22/Abstract_Goal

Folders and files

Latest commit

History

Repository files navigation

Abstract_Goal

Features

Training

Testing on test set

Citation

Acknowledgment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages