LID-1: Training and task setup#6155
Conversation
for more information, see https://pre-commit.ci
|
This pull request introduces significant enhancements to the Batch Sampling Enhancements:
Scheduler Integration:
Preprocessor for LID Tasks:
|
There was a problem hiding this comment.
Pull Request Overview
Adds foundational support for language identification (LID) training and evaluation in ESPnet by defining a new preprocessor and extending task configuration for sampling strategies.
- Introduce
LIDPreprocessorinespnet2/train/preprocessor.py - Integrate new samplers and schedulers in
espnet2/tasks/abs_task.py - Expose CLI arguments for CategoryPowerSampler and CategoryDatasetPowerSampler
Reviewed Changes
Copilot reviewed 2 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| espnet2/train/preprocessor.py | Add LIDPreprocessor class for LID-specific preprocessing |
| espnet2/tasks/abs_task.py | Import new samplers/schedulers and hook up CLI options |
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
for more information, see https://pre-commit.ci
|
@Qingzheng-Wang, can you try to fix the CI errors for all your PRs? |
Yes, I'm fixing them. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6155 +/- ##
==========================================
- Coverage 55.45% 46.52% -8.94%
==========================================
Files 882 542 -340
Lines 82812 49601 -33211
==========================================
- Hits 45927 23079 -22848
+ Misses 36885 26522 -10363
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
LGTM. |
What did you change?
This PR introduces the core training and evaluation pipeline for the LID task in ESPnet, including:
espnet2/bin/lid_train.py: training entry pointespnet2/tasks/lid.py: task definitionespnet2/tasks/lid.pyare split into three parts in three PRs:espnet2/train/preprocessor.py: updated to support LID-specific preprocessingWhy did you make this change?
This is the foundational step to enable language identification (LID) training within ESPnet. It defines the task and enables downstream LID model training.
Is your PR small enough?
Yes
Additional Context