ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Welcome to the official repository of ExPO: an Explainable Phonetic Trait-Oriented Network for speaker verification. This model introduces a novel approach to enhance explainability in speaker verification by incorporating phonetic traits, bridging the gap between manual forensic voice comparison and neural speaker verification systems.

📄 Abstract

In speaker verification, achieving explainability akin to forensic voice comparison has remained a challenge. ExPO leverages phonetic traits to generate utterance-level speaker embeddings and enables fine-grained analysis and visualization of phonetic traits. This explainable framework enhances trust and transparency while maintaining robust speaker verification performance.

For detailed insights, refer to the paper.

🌟 Features

Explainable Verification: Fine-grained phonetic trait analysis provides a transparent decision-making process.
State-of-the-Art Architecture: Built on the ECAPA-TDNN backbone with integrated phonetic trait layers.
Custom Loss Functions:
- Trait Verification Loss: Ensures consistency of phonetic traits within the same speaker.
- Trait Center Loss: Aligns phonetic traits across utterances for better generalization.
- Additive Angular Margin Loss (AAM): Enhances discriminability of speaker embeddings.
Compatibility: Trained and tested on benchmark datasets including VoxCeleb and LibriSpeech.

📊 Performance

Model	EER (%)	minDCF	Explainability (EVD)
ECAPA-TDNN (Baseline)	1.276	0.157	Limited
ExPO (Full)	1.552	0.184	High

🔧 Installation

Prepare data: The GitHub repository charsiu was used to generate phoneme files. We utilized the Textless Alignment method to generate the phoneme files.

The pipeline for preparing speech samples in this repository is the same as that used in WeSpeaker.

Dataset for training usage: VoxCeleb1、2 training set; MUSAN dataset; RIR dataset.

Dependencies:

git clone https://github.com/mmmmayi/ExPO.git
cd ExPO
pip install -r requirements.txt

Training:
```
cd examples/voxceleb/v2
./run.sh
```

📚 Citation

If you find this project useful in your research, please consider citing our paper:

 @misc{ma2025expoexplainablephonetictraitoriented,
      title={ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification}, 
      author={Yi Ma and Shuai Wang and Tianchi Liu and Haizhou Li},
      year={2025},
      eprint={2501.05729},
      archivePrefix={arXiv},
      primaryClass={cs.SD},
      url={https://arxiv.org/abs/2501.05729}, 
 }

🙏 Acknowledgements

This project builds upon and is inspired by the work of several open-source repositories. We extend our gratitude to the authors and contributors of the following projects:

Thanks for these authors to open source their code!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/workflows		.github/workflows
docs		docs
examples/voxceleb/v2		examples/voxceleb/v2
runtime		runtime
tools		tools
wespeaker		wespeaker
.flake8		.flake8
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ppg.npy		ppg.npy
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

📄 Abstract

🌟 Features

📊 Performance

🔧 Installation

📚 Citation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 10

Uh oh!

Languages

License

mmmmayi/ExPO

Folders and files

Latest commit

History

Repository files navigation

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

📄 Abstract

🌟 Features

📊 Performance

🔧 Installation

📚 Citation

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 10

Uh oh!

Languages

Packages