Codestin Search App

3D-Speaker is an open-source toolkit for single- and multi-modal speaker verification, speaker recognition, and speaker diarization. All pretrained models are accessible on ModelScope.

Quickstart

Install 3D-Speaker

git clone https://github.com/alibaba-damo-academy/3D-Speaker.git && cd 3D-Speaker
conda create -n 3D-Speaker python=3.8
conda activate 3D-Speaker
pip install -r requirements.txt

Running experiments

# Speaker verification: CAM++ on voxceleb
cd egs/sv-cam++/voxceleb/
bash run.sh
# Self-supervised speaker verification: RDINO on voxceleb
cd egs/sv-rdino/voxceleb/
bash run.sh

Inference using pretrained models from Modelscope

All pretrained models are released on Modelscope.

# Install modelscope
pip install modelscope
# CAM++ trained on VoxCeleb
model_id=damo/speech_campplus_sv_en_voxceleb_16k
# CAM++ trained on 200k labeled speakers
model_id=damo/speech_campplus_sv_zh-cn_16k-common
# Run cam++ inference
python speakerlab/bin/infer_sv.py --model_id $model_id --wavs $wav_path

# RDINO trained on VoxCeleb
model_id=damo/speech_rdino_ecapa_tdnn_sv_en_voxceleb_16k
# Run rdino inference
python speakerlab/bin/infer_sv_rdino.py --model_id $model_id --wavs $wav_path

Task	Dataset	Model	Performance
speaker verification	VoxCeleb	CAM++	EER = 0.73%
self-supervised speaker verification	VoxCeleb	RDINO	EER = 3.24%

News

[2023.4] RDINO training recipes on VoxCeleb released. RDINO is a self-supervised learning framework in speaker verification aiming to alleviate model collapse in non-contrastive methods. It contains teacher and student network with an identical architecture but different parameters. Two regularization terms are proposed in RDINO, namely diversity regularization and redundancy elimination regularization. RDINO achieve 3.05% EER and 0.220 MinDCF in VoxCeleb using single-stage self-supervised training.
[2023.4] CAM++ pretrained model released, trained on a Mandarin dataset of 200k labeled speakers.
[2023.4] CAM++ training recipe on VoxCeleb released. CAM++ is a fast and efficient speaker embedding extractor based on a densely connected time-delay neural network (D-TDNN). It adopts a novel multi-granularity pooling method to conduct context-aware masking. CAM++ achieves an EER of 0.73% in Voxceleb and 6.78% in CN-Celeb, outperforming other mainstream speaker embedding models such as ECAPA-TDNN and ResNet34, while having lower computational cost and faster inference speed.

To be expected

[2023.5] Releasing ERes2Net (Enhanced Res2Net) training framework.
[2023.5] Releasing ERes2Net model trained on over 100k labeled speakers.

License

3D-Speaker is released under the Apache License 2.0.

Acknowledge

3D-Speaker contains third-party components and code modified from some open-source repos, including:

Contact

If you have any comment or question about 3D-Speaker, please contact us by

email: [email protected], [email protected]

Citations

@inproceedings{rdino,
  title={Pushing the limits of self-supervised speaker verification using regularized distillation framework},
  author={Yafeng Chen and Siqi Zheng and Hui Wang and Luyao Cheng and Qian Chen},
  booktitle={ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023},
  organization={IEEE}
}
@article{cam++,
  title={CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking},
  author={Hui Wang and Siqi Zheng and Yafeng Chen and Luyao Cheng and Qian Chen},
  journal={arXiv preprint arXiv:2303.00332},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
docs/images		docs/images
egs		egs
speakerlab		speakerlab
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Quickstart

Install 3D-Speaker

Running experiments

Inference using pretrained models from Modelscope

News

To be expected

License

Acknowledge

Contact

Citations

About

Uh oh!

Releases

Packages

Languages

License

cdliang11/3D-Speaker

Folders and files

Latest commit

History

Repository files navigation

Quickstart

Install 3D-Speaker

Running experiments

Inference using pretrained models from Modelscope

News

To be expected

License

Acknowledge

Contact

Citations

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages