Codestin Search App

Qingzheng-Wang · 2025-06-26T05:43:55Z

What did you change?

local/data.sh, local/prepare_voxlingua107.py: download and prepare VoxLingua107 dataset.
run.sh, mms_ecapa_baseline.yaml: run script and training configuration.

Why did you make this change?

This PR adds a recipe for spoken language identification (LID) on the VoxLingua107 dataset.

Is your PR small enough?

Yes.

Additional Context

Depends on:

Copilot

Pull Request Overview

Adds a complete recipe for spoken language identification on the VoxLingua107 dataset, including dataset download/preparation, training/inference orchestration, and evaluation utilities.

Introduces data download and preparation scripts to generate Kaldi-style mappings.
Provides a run script (run.sh) and baseline training configuration (mms_ecapa_baseline.yaml).
Supplies scoring and data-copy utilities adapted for language identification.

Reviewed Changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
egs2/voxlingua107/lid1/utils	Symlink to shared utility scripts from TEMPLATE
egs2/voxlingua107/lid1/steps	Symlink to shared Kaldi-like steps from TEMPLATE
egs2/voxlingua107/lid1/scripts	Symlink to shared helper scripts from TEMPLATE
egs2/voxlingua107/lid1/run.sh	Main entry-point for training and inference
egs2/voxlingua107/lid1/pyscripts	Symlink to shared Python helper scripts from TEMPLATE
egs2/voxlingua107/lid1/path.sh	Environment path setup
egs2/voxlingua107/lid1/local/score.py	Computes overall, per-language, and error-frequency scores
egs2/voxlingua107/lid1/local/prepare_voxlingua107.py	Generates `wav.scp` and `utt2lang` with ISO3 language codes
egs2/voxlingua107/lid1/local/data.sh	Automates dataset download, extraction, and data preparation
egs2/voxlingua107/lid1/local/copy_data_dir.sh	Copies and prefixes data directories for language IDs
egs2/voxlingua107/lid1/lid.sh	Symlink to core recipe script from TEMPLATE
egs2/voxlingua107/lid1/db.sh	Database path configuration
egs2/voxlingua107/lid1/conf/slurm.conf	Slurm scheduler configuration
egs2/voxlingua107/lid1/conf/queue.conf	SGE scheduler configuration
egs2/voxlingua107/lid1/conf/pbs.conf	PBS scheduler configuration
egs2/voxlingua107/lid1/conf/mms_ecapa_baseline.yaml	ECAPA-TDNN baseline model configuration
egs2/voxlingua107/lid1/cmd.sh	Dispatch script for run.pl / queue.pl / slurm.pl, etc.
egs2/voxlingua107/lid1/README.md	Recipe overview and reported results

Comments suppressed due to low confidence (1)

egs2/voxlingua107/lid1/README.md:11

[nitpick] There is a stray Markdown bold marker (**) on this line which likely causes a formatting glitch. Consider removing it.

**

egs2/voxlingua107/lid1/local/copy_data_dir.sh

egs2/voxlingua107/lid1/local/prepare_voxlingua107.py

codecov · 2025-06-28T23:18:47Z

Codecov Report

❌ Patch coverage is 0% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 55.82%. Comparing base (39684b0) to head (022ab14).
⚠️ Report is 14 commits behind head on master.

Files with missing lines	Patch %	Lines
espnet2/train/lid_trainer.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #6174      +/-   ##
==========================================
+ Coverage   53.53%   55.82%   +2.29%     
==========================================
  Files         888      889       +1     
  Lines       84131    84275     +144     
==========================================
+ Hits        45039    47049    +2010     
+ Misses      39092    37226    -1866

Flag	Coverage Δ
test_integration_espnet2	`46.15% <ø> (?)`
test_integration_espnetez	`36.94% <ø> (ø)`
test_python_espnet2	`50.53% <0.00%> (ø)`
test_python_espnetez	`12.82% <0.00%> (ø)`
test_utils	`18.77% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

for more information, see https://pre-commit.ci

sw005320 · 2025-09-09T14:17:55Z

@ftshijt, can you review this PR?

ftshijt

Thanks for sharing the update. Please remember to add the data entry in "egs2/README.md"

egs2/voxlingua107/lid1/run.sh

Co-authored-by: Jiatong <[email protected]>

Qingzheng-Wang · 2025-09-11T18:19:06Z

Thanks for sharing the update. Please remember to add the data entry in "egs2/README.md"

Thanks for your review. Already added the voxlingua107 data entry to egs2/README.md.

sw005320 · 2025-09-11T18:38:19Z

Fix https://github.com/espnet/espnet/actions/runs/17653529184/job/50170382150?pr=6174

for more information, see https://pre-commit.ci

sw005320 · 2025-09-12T15:22:01Z

Thanks!

Qingzheng-Wang added 5 commits June 26, 2025 01:32

Add voxlingua107 recipe.

a4cc40c

Add readme.

4aa7e6e

Add run script.

4585fa9

Add config.

331b48c

Add data process scripts.

fabff2c

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 26, 2025

mergify bot added ESPnet2 README labels Jun 26, 2025

dosubot bot added the Recipe label Jun 26, 2025

Fhrozen requested a review from Copilot June 27, 2025 05:48

Copilot AI reviewed Jun 27, 2025

View reviewed changes

egs2/voxlingua107/lid1/local/copy_data_dir.sh Outdated Show resolved Hide resolved

egs2/voxlingua107/lid1/local/copy_data_dir.sh Outdated Show resolved Hide resolved

egs2/voxlingua107/lid1/local/prepare_voxlingua107.py Outdated Show resolved Hide resolved

Qingzheng-Wang and others added 2 commits June 28, 2025 18:02

Remove redundant code.

1bd82ac

Remove spk_map.

fbe5c41

Qingzheng-Wang and others added 3 commits June 29, 2025 11:22

Remove unnecessary cat.

97ca01b

Merge branch 'espnet:master' into lid_release7

c9a9327

Add scripts for prepare out-of-domain test set.

2d8a348

dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels Aug 10, 2025

Fhrozen added this to the v.202509 milestone Aug 11, 2025

Qingzheng-Wang and others added 9 commits August 14, 2025 22:17

Fix lengthy lines.

229b6e2

[pre-commit.ci] auto fixes from pre-commit.com hooks

74138cf

for more information, see https://pre-commit.ci

Re CI.

0b358de

[pre-commit.ci] auto fixes from pre-commit.com hooks

e65572f

for more information, see https://pre-commit.ci

Re CI.

6de5d94

Merge branch 'master' into lid_release7

68b60c7

Fix.

596b409

Fix cp.

48b7769

Fix symlinks.

3130bb9

Qingzheng-Wang mentioned this pull request Aug 17, 2025

LID-8: CI and unit tests #6210

Merged

Qingzheng-Wang added 4 commits August 18, 2025 12:49

Update result table.

47a42f8

Update.

c54fb17

Update supported languages.

40bd676

Improve README.

1099f68

Qingzheng-Wang mentioned this pull request Aug 20, 2025

LID-9: Geolocation-aware LID recipe and codes #6212

Open

Qingzheng-Wang added 2 commits August 19, 2025 21:21

Fix: keep update with early prs.

a34f530

Fix local files.

f73f67b

dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:XXL This PR changes 1000+ lines, ignoring generated files. labels Aug 20, 2025

Qingzheng-Wang and others added 2 commits August 20, 2025 16:28

Merge branch 'master' into lid_release7

8768605

Merge branch 'master' into lid_release7

63d4e7d

Merge branch 'master' into lid_release7

221b0ed

sw005320 requested a review from ftshijt September 10, 2025 13:13

ftshijt reviewed Sep 11, 2025

View reviewed changes

egs2/voxlingua107/lid1/run.sh Outdated Show resolved Hide resolved

egs2/voxlingua107/lid1/run.sh Outdated Show resolved Hide resolved

Qingzheng-Wang and others added 3 commits September 11, 2025 09:32

Remove nj and ngpu config.

e372460

Co-authored-by: Jiatong <[email protected]>

Rename save_every to checkpoint_interval.

25d542d

Add voxlingua107 item to egs2 readme.

bcf0a22

Qingzheng-Wang and others added 3 commits September 11, 2025 14:27

Fix lengthy lines.

74c9cb2

Merge branch 'master' into lid_release7

0138d81

[pre-commit.ci] auto fixes from pre-commit.com hooks

39684b0

for more information, see https://pre-commit.ci

Fhrozen modified the milestones: v.202509, v.202512 Sep 12, 2025

Merge branch 'master' into lid_release7

022ab14

sw005320 merged commit 9333dcc into espnet:master Sep 12, 2025
29 of 32 checks passed

Fhrozen modified the milestones: v.202512, v.202511 Nov 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LID-7: VoxLingua107 recipe#6174

LID-7: VoxLingua107 recipe#6174
sw005320 merged 36 commits intoespnet:masterfrom
Qingzheng-Wang:lid_release7

Qingzheng-Wang commented Jun 26, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 28, 2025 •

edited

Loading

Uh oh!

sw005320 commented Sep 9, 2025

Uh oh!

ftshijt left a comment

Uh oh!

Uh oh!

Uh oh!

Qingzheng-Wang commented Sep 11, 2025

Uh oh!

sw005320 commented Sep 11, 2025

Uh oh!

Uh oh!

sw005320 commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Qingzheng-Wang commented Jun 26, 2025

What did you change?

Why did you make this change?

Is your PR small enough?

Additional Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Sep 9, 2025

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Qingzheng-Wang commented Sep 11, 2025

Uh oh!

sw005320 commented Sep 11, 2025

Uh oh!

Uh oh!

sw005320 commented Sep 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Jun 28, 2025 •

edited

Loading