Codestin Search App

chinjouli · 2025-06-23T17:25:25Z

This pull request introduces a new recipe for the IPAPack++ dataset in the ESPnet2 framework, along with several enhancements and configurations to support its usage. The changes include the addition of dataset-specific configurations, utilities, and scripts, as well as updates to the general framework to handle large corpora and improve flexibility in training.

IPAPack++ Recipe Additions:

New Recipe Directory: Added a new directory egs2/ipapack_plus/s2t1/ containing scripts and configurations for the IPAPack++ dataset, including README.md, cmd.sh, conf/, local/utils.py, path.sh, and pyscripts. These files provide the structure and guidelines for training and evaluation on IPAPack++. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11]

Framework Enhancements:

Support for Large Corpus in BPE Training: Introduced a new option --bpe_largecorpus in egs2/TEMPLATE/s2t1/s2t.sh to enable training on extremely large corpora. This includes associated logic for handling large datasets during BPE training. [1] [2] [3]

Dataset Integration:

IPAPack++ Dataset Registration: Added the IPAPack++ dataset to the list of supported datasets in egs2/README.md and updated egs2/TEMPLATE/asr1/db.sh to include a placeholder for its download directory. [1] [2]

Configuration Updates:

Audio and Feature Extraction: Added configurations for feature extraction (fbank.conf and pitch.conf) and decoding (decode_s2t_pr.yaml) tailored for the IPAPack++ dataset. [1] [2] [3]
Training Configurations: Introduced a new training configuration file train_s2t_ebf_conv2d_size768_e9_d9_piecewise_lr5e-4_warmup60k_flashattn.yaml optimized for multilingual phone recognition tasks.

Utilities and Symbols:

Language and Phoneme Support: Added a utility file local/utils.py with definitions for shared symbols, task tokens, supported languages, and phoneme vocabulary to streamline multilingual and phoneme-based processing.

What did you change?

New s2t recipe for IPAPack++

Why did you make this change?

Provide a basic setup for developing a multitask phone recognition model

Is your PR small enough?

Yes

Additional Context

Related to #6169

for more information, see https://pre-commit.ci

Copilot

Pull Request Overview

This pull request adds a new S2T recipe for the IPAPack++ dataset and integrates it into the existing ESPnet2 framework while also enhancing related training and configuration options.

New recipe directory with dataset-specific utilities, scripts, and configurations
Integration of IPAPack++ into dataset registration and ASR database
Enhancement of BPE training support by introducing the --bpe_largecorpus option

Reviewed Changes

Copilot reviewed 20 out of 20 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
egs2/ipapack_plus/s2t1/utils	Symlink pointing to the TEMPLATE utilities
egs2/ipapack_plus/s2t1/scripts	Symlink pointing to the TEMPLATE scripts
egs2/ipapack_plus/s2t1/s2t.sh	Symlink for S2T script from TEMPLATE
egs2/ipapack_plus/s2t1/run.sh	New run script setting up training and inference parameters
egs2/ipapack_plus/s2t1/pyscripts	Symlink pointing to the TEMPLATE Python scripts
egs2/ipapack_plus/s2t1/path.sh	Symlink pointing to the TEMPLATE path script
egs2/ipapack_plus/s2t1/local/utils.py	Utility file defining symbols, language tokens, and phoneme vocabulary
egs2/ipapack_plus/s2t1/db.sh	Symlink pointing to the TEMPLATE database script
egs2/ipapack_plus/s2t1/conf/*	New configuration files for tuning, slurm, queue, pitch, pbs, fbank, and decode setups
egs2/ipapack_plus/s2t1/cmd.sh	Command management script for job scheduling and execution
egs2/ipapack_plus/s2t1/README.md	Recipe documentation including data prep and training guidelines
egs2/TEMPLATE/s2t1/s2t.sh	Updates to add the --bpe_largecorpus option for large corpus support
egs2/TEMPLATE/asr1/db.sh	Database script update to include IPAPack++ dataset entry
egs2/README.md	General dataset registration extended to support IPAPack++

egs2/ipapack_plus/s2t1/run.sh

egs2/ipapack_plus/s2t1/README.md

for more information, see https://pre-commit.ci

sw005320 · 2025-06-29T07:04:57Z

Please fix https://github.com/espnet/espnet/actions/runs/15950583079/job/44989986372?pr=6168#step:8:682

chinjouli and others added 7 commits June 23, 2025 12:32

ipapack_plus init commit

a5bf0a0

add ipapack_plus

422079a

Delete egs2/ipapack_plus/s2t1/local/all_symbols

0dbef12

Delete egs2/ipapack_plus/s2t1/local/bad_symbols

d9d8791

remove data checking files

4b17b2c

remove dataprep for PR1

946a733

balancing PR file count

ab85bdf

dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Jun 23, 2025

mergify bot added ESPnet2 README labels Jun 23, 2025

dosubot bot added the Recipe label Jun 23, 2025

[pre-commit.ci] auto fixes from pre-commit.com hooks

cd3f62c

for more information, see https://pre-commit.ci

chinjouli mentioned this pull request Jun 23, 2025

S2T Recipe for IPAPack++: Data Preparation #6169

Merged

sw005320 requested a review from Copilot June 23, 2025 17:34

sw005320 added this to the v.202506 milestone Jun 23, 2025

Copilot AI reviewed Jun 23, 2025

View reviewed changes

chinjouli and others added 2 commits June 23, 2025 20:01

Update README.md

3c08bf4

fix slurm.conf; add xavier_uniform init

c675f5a

sw005320 reviewed Jun 27, 2025

View reviewed changes

egs2/ipapack_plus/s2t1/run.sh Outdated Show resolved Hide resolved

sw005320 reviewed Jun 27, 2025

View reviewed changes

egs2/ipapack_plus/s2t1/README.md Show resolved Hide resolved

chinjouli and others added 2 commits June 28, 2025 22:23

comment out nodelist; readme for ipa_mapping

e0d5c44

[pre-commit.ci] auto fixes from pre-commit.com hooks

b75f5d2

for more information, see https://pre-commit.ci

fix nodelist

c5bf38a

sw005320 merged commit 333b6f7 into espnet:master Jun 30, 2025
38 checks passed

chinjouli deleted the ipapack_recipe branch October 27, 2025 05:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S2T Recipe for IPAPack++: main recipe#6168

S2T Recipe for IPAPack++: main recipe#6168
sw005320 merged 13 commits intoespnet:masterfrom
chinjouli:ipapack_recipe

chinjouli commented Jun 23, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

sw005320 commented Jun 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

chinjouli commented Jun 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

IPAPack++ Recipe Additions:

Framework Enhancements:

Dataset Integration:

Configuration Updates:

Utilities and Symbols:

What did you change?

Why did you make this change?

Is your PR small enough?

Additional Context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

sw005320 commented Jun 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chinjouli commented Jun 23, 2025 •

edited

Loading