Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mike-qz-wang/AUDETER

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 

Repository files navigation

AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds

Dataset Description

We introduce AUDETER (AUdio DEepfake TEst Range), a large-scale, highly diverse deepfake audio dataset for comprehensive evaluation and robust development of generalised models for deepfake audio detection. It consists of over 4,500 hours of synthetic audio generated by 11 recent TTS models and 10 vocoders with a broad range of TTS/vocoder patterns, totalling 3 million audio clips.

Dataset Structure

Collection Subset Partition Patterns # Audio / Patt. Total Hrs
TTS In-the-Wild Bona-fide 15 19,784 311.5
Common Voice Val 15 16,372 275.0
Test 15 16,372 265.3
People Speech Val 15 18,622 493.5
Test 15 34,898 909.4
MLS Test 15 3,807 212.7
Test 15 3,769 209.1
Vocoder In-the-Wild Bona-fide 10 19,784 207.6
Common Voice Val 10 16,372 266.7
Test 10 16,372 264.8
People Speech Val 10 18,622 331.7
Test 10 34,898 598.1
MLS Dev 10 3,807 156.7
Test 10 3,769 154.9

Table 2: The structure of the AUDETER dataset.

Uploading in progress

Due to the size of our dataset, we are uploading the dataset in progress. Thanks for your understanding.

You can view the upload in progress on Hugging Face.

Citation

If you use AUDETER in your research, please consider the paper arXiv and giving us a star🌟!

@article{wang2025audeter,
  title={AUDETER: A Large-scale Dataset for Deepfake Audio Detection in Open Worlds},
  author={Wang, Qizhou and Huang, Hanxun and Pang, Guansong and Erfani, Sarah and Leckie, Christopher},
  journal={arXiv preprint arXiv:2509.04345},
  year={2025},
  url={https://arxiv.org/abs/2509.04345}
}

About

Testing github connection on vscode update

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published