Codestin Search App

MSSG: Multi-scale Speaker Graph Network for Active Speaker Detection

This repository contains the official implementation of our IEEE Transactions on Multimedia paper, MSSG.
[Paper Link]

Dependencies

First, set up the environment:

conda create -n MSSG python=3.7.9 anaconda
conda activate MSSG
pip install -r requirement.txt

1. Data Preparation

Data preparation is handled by the script data_prep.sh.

There are 9 steps in total. Each step can be executed separately by running:

bash data_prep.sh x

For example:

bash data_prep.sh 0

Step 0: Download the dataset. (Skip if you already have the AVA dataset.)
Step 1: Generate CSV files for later processing.
Steps 2–8: Generate images at multiple scales.

2. Graph Initialization

Graph initialization is handled by the script graph_init.sh.

There are 9 steps in total. Each step can be executed separately:

bash graph_init.sh x

For example:

bash graph_init.sh 0

Step 0: Pretraining to generate multi-scale graph node embeddings.
- Note: A pretrained model is already provided in this repository, so this step is optional.
- Averaging multiple models can further improve performance.
- Pretrained models are stored in ./predata/pretrain_model.
Steps 1–7: Generate graph node embeddings using different pretrained models.
Step 8: Construct the graph.

3. Graph Training

Graph training is handled by the script graph_train.sh.

Simply run:

bash graph_train.sh

Citation

If you find our paper or code useful in your research, please cite:

@article{li2025mssg,
  title={MSSG: Multi-scale Speaker Graph Network for Active Speaker Detection},
  author={Li, Guanjun and Yi, Jiangyan and Wen, Zhengqi and Fu, Ruibo and Wang, Yuwang and Tao, Jianhua},
  journal={IEEE Transactions on Multimedia},
  year={2025},
  publisher={IEEE}
}

Acknowledgments

We would like to thank the authors of the following works for open-sourcing their code, which provided invaluable insights:

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
local		local
model		model
predata/pretrain_model		predata/pretrain_model
utils		utils
ASD.py		ASD.py
LICENSE		LICENSE
README.md		README.md
dataLoader.py		dataLoader.py
data_prep.sh		data_prep.sh
graph_init.sh		graph_init.sh
graph_train.sh		graph_train.sh
loss.py		loss.py
requirements.txt		requirements.txt
train.py		train.py
train_graph.py		train_graph.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MSSG: Multi-scale Speaker Graph Network for Active Speaker Detection

Dependencies

1. Data Preparation

2. Graph Initialization

3. Graph Training

Citation

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

sdqdlgj/MSSG

Folders and files

Latest commit

History

Repository files navigation

MSSG: Multi-scale Speaker Graph Network for Active Speaker Detection

Dependencies

1. Data Preparation

2. Graph Initialization

3. Graph Training

Citation

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages