Thanks to visit codestin.com
Credit goes to github.com

Skip to content

LID-7: VoxLingua107 recipe#6174

Merged
sw005320 merged 36 commits intoespnet:masterfrom
Qingzheng-Wang:lid_release7
Sep 12, 2025
Merged

LID-7: VoxLingua107 recipe#6174
sw005320 merged 36 commits intoespnet:masterfrom
Qingzheng-Wang:lid_release7

Conversation

@Qingzheng-Wang
Copy link
Contributor

What did you change?

  • local/data.sh, local/prepare_voxlingua107.py: download and prepare VoxLingua107 dataset.
  • run.sh, mms_ecapa_baseline.yaml: run script and training configuration.

Why did you make this change?

This PR adds a recipe for spoken language identification (LID) on the VoxLingua107 dataset.


Is your PR small enough?

Yes.


Additional Context

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 26, 2025
@dosubot dosubot bot added the Recipe label Jun 26, 2025
@Fhrozen Fhrozen requested a review from Copilot June 27, 2025 05:48
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds a complete recipe for spoken language identification on the VoxLingua107 dataset, including dataset download/preparation, training/inference orchestration, and evaluation utilities.

  • Introduces data download and preparation scripts to generate Kaldi-style mappings.
  • Provides a run script (run.sh) and baseline training configuration (mms_ecapa_baseline.yaml).
  • Supplies scoring and data-copy utilities adapted for language identification.

Reviewed Changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
egs2/voxlingua107/lid1/utils Symlink to shared utility scripts from TEMPLATE
egs2/voxlingua107/lid1/steps Symlink to shared Kaldi-like steps from TEMPLATE
egs2/voxlingua107/lid1/scripts Symlink to shared helper scripts from TEMPLATE
egs2/voxlingua107/lid1/run.sh Main entry-point for training and inference
egs2/voxlingua107/lid1/pyscripts Symlink to shared Python helper scripts from TEMPLATE
egs2/voxlingua107/lid1/path.sh Environment path setup
egs2/voxlingua107/lid1/local/score.py Computes overall, per-language, and error-frequency scores
egs2/voxlingua107/lid1/local/prepare_voxlingua107.py Generates wav.scp and utt2lang with ISO3 language codes
egs2/voxlingua107/lid1/local/data.sh Automates dataset download, extraction, and data preparation
egs2/voxlingua107/lid1/local/copy_data_dir.sh Copies and prefixes data directories for language IDs
egs2/voxlingua107/lid1/lid.sh Symlink to core recipe script from TEMPLATE
egs2/voxlingua107/lid1/db.sh Database path configuration
egs2/voxlingua107/lid1/conf/slurm.conf Slurm scheduler configuration
egs2/voxlingua107/lid1/conf/queue.conf SGE scheduler configuration
egs2/voxlingua107/lid1/conf/pbs.conf PBS scheduler configuration
egs2/voxlingua107/lid1/conf/mms_ecapa_baseline.yaml ECAPA-TDNN baseline model configuration
egs2/voxlingua107/lid1/cmd.sh Dispatch script for run.pl / queue.pl / slurm.pl, etc.
egs2/voxlingua107/lid1/README.md Recipe overview and reported results
Comments suppressed due to low confidence (1)

egs2/voxlingua107/lid1/README.md:11

  • [nitpick] There is a stray Markdown bold marker (**) on this line which likely causes a formatting glitch. Consider removing it.
**

@codecov
Copy link

codecov bot commented Jun 28, 2025

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.82%. Comparing base (39684b0) to head (022ab14).
⚠️ Report is 14 commits behind head on master.

Files with missing lines Patch % Lines
espnet2/train/lid_trainer.py 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6174      +/-   ##
==========================================
+ Coverage   53.53%   55.82%   +2.29%     
==========================================
  Files         888      889       +1     
  Lines       84131    84275     +144     
==========================================
+ Hits        45039    47049    +2010     
+ Misses      39092    37226    -1866     
Flag Coverage Δ
test_integration_espnet2 46.15% <ø> (?)
test_integration_espnetez 36.94% <ø> (ø)
test_python_espnet2 50.53% <0.00%> (ø)
test_python_espnetez 12.82% <0.00%> (ø)
test_utils 18.77% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Aug 10, 2025
@Fhrozen Fhrozen added this to the v.202509 milestone Aug 11, 2025
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Aug 20, 2025
@sw005320
Copy link
Contributor

sw005320 commented Sep 9, 2025

@ftshijt, can you review this PR?

@sw005320 sw005320 requested a review from ftshijt September 10, 2025 13:13
Copy link
Collaborator

@ftshijt ftshijt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sharing the update. Please remember to add the data entry in "egs2/README.md"

@Qingzheng-Wang
Copy link
Contributor Author

Thanks for sharing the update. Please remember to add the data entry in "egs2/README.md"

Thanks for your review. Already added the voxlingua107 data entry to egs2/README.md.

@sw005320
Copy link
Contributor

@Fhrozen Fhrozen modified the milestones: v.202509, v.202512 Sep 12, 2025
@sw005320 sw005320 merged commit 9333dcc into espnet:master Sep 12, 2025
29 of 32 checks passed
@sw005320
Copy link
Contributor

Thanks!

@Fhrozen Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ESPnet2 README Recipe size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants