Thanks to visit codestin.com
Credit goes to github.com

Skip to content

A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

License

Notifications You must be signed in to change notification settings

cdliang11/3D-Speaker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

65 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



license

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible on ModelScope.

Quickstart

Install 3D-Speaker

git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
conda create -n 3D-Speaker python=3.8
conda activate 3D-Speaker
pip install -r requirements.txt

Running experiments

# Speaker verification: CAM++ on voxceleb
cd egs/sv-cam++/voxceleb/
bash run.sh
# Self-supervised speaker verification: RDINO on voxceleb
cd egs/sv-rdino/voxceleb/
bash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released on Modelscope.

# Install modelscope
pip install modelscope
# CAM++ trained on VoxCeleb
model_id=damo/speech_campplus_sv_en_voxceleb_16k
# CAM++ trained on 200k labeled speakers
model_id=damo/speech_campplus_sv_zh-cn_16k-common
# Run cam++ inference
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path

# RDINO trained on VoxCeleb
model_id=damo/speech_rdino_ecapa_tdnn_sv_en_voxceleb_16k
# Run rdino inference
python speakerlab/bin/infer_sv_rdino.py --model_id $model_id --wavs $wav_path
Task Dataset Model Performance
speaker verification VoxCeleb CAM++ EER = 0.73%
self-supervised speaker verification VoxCeleb RDINO EER = 3.24%

News

  • [2023.4] RDINO training recipes on VoxCeleb released. RDINO is a self-supervised learning framework in speaker verification aiming to alleviate model collapse in non-contrastive methods. It contains teacher and student network with an identical architecture but different parameters. Two regularization terms are proposed in RDINO, namely diversity regularization and redundancy elimination regularization. RDINO achieve 3.05% EER and 0.220 MinDCF in VoxCeleb using single-stage self-supervised training.
  • [2023.4] CAM++ pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.
  • [2023.4] CAM++ training recipe on VoxCeleb released. CAM++ is a fast and efficient speaker embedding extractor based on a densely connected time-delay neural network (D-TDNN). It adopts a novel multi-granularity pooling method to conduct context-aware masking. CAM++ achieves an EER of 0.73% in Voxceleb and 6.78% in CN-Celeb, outperforming other mainstream speaker embedding models such as ECAPA-TDNN and ResNet34, while having lower computational cost and faster inference speed.

To be expected

  • [2023.5] Releasing ERes2Net (Enhanced Res2Net) training framework.
  • [2023.5] Releasing ERes2Net model trained on over 100k labeled speakers.

License

3D-Speaker is released under the Apache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:

Contact

If you have any comment or question about 3D-Speaker, please contact us by

Citations

@inproceedings{rdino,
  title={Pushing the limits of self-supervised speaker verification using regularized distillation framework},
  author={Yafeng Chen and Siqi Zheng and Hui Wang and Luyao Cheng and Qian Chen},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  organization={IEEE}
}
@article{cam++,
  title={CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking},
  author={Hui Wang and Siqi Zheng and Yafeng Chen and Luyao Cheng and Qian Chen},
  journal={arXiv preprint arXiv:2303.00332},
  year={2023}
}

About

A repository for single- and multi-modal speaker verification, speaker recognition, and speaker diarization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.8%
  • Shell 15.4%
  • Perl 3.8%