Releases: espnet/espnet
ESPnet version 202511
Summary
Release date: 2025‑11‑17
Milestone: 202511 – A version that brings a large number of new features, stability improvements, and a refreshed CI & Docker workflow.
ESPnet 202511 introduces robust parallel processing primitives, a fully‑refactored inference & evaluation pipeline, extensive SpeechLM support, and a modernized Docker/CI stack. The release also resolves lingering bugs in codec EMA logic, MPS device handling, and category‑balanced batching while tightening dependency management and documentation quality.
Highlighted Pull Requests
| # | Title | Category | Key Impact |
|---|---|---|---|
| 6300 | Bump js‑yaml from 4.1.0 to 4.1.1 in /doc/vuepress |
Dep‑Update | Secures the documentation build against a prototype‑pollution CVE in yaml merge |
| 6284 | codec fix: DDP logic and dead code revival logic | Bugfix | Restores EMA state for dead‑code recovery and synchronizes codec updates across all DDP workers |
| 6286 | [SpeechLM] Deepspeed trainer | New Feature | Adds full DeepSpeed support (train.py + deepspeed_trainer.py) for large‑scale SpeechLM training |
| 6279 | [SpeechLM] model, preprocessor and collect_stats | New Feature | Core SpeechLM components – job templates, preprocessing, multimodal IO, and stats collection |
| 6278 | [SpeechLM] Deepspeed trainer | New Feature | See above – DeepSpeed integration for SpeechLM workflows |
| 6276 | Docker Updates | Refactor | Upgrades Ubuntu 24.04, CUDA 12.6, PyTorch 2.8.0, and transitions to Miniforge; modernizes Dockerfile syntax |
| 6275 | CI Installation fix | Bugfix | Adds --no-build-isolation for editable installs, improving reproducibility across CI environments |
| 6273 | [ESPnet‑Codec] Bug fix on codec activation function | Bugfix | Enables BF16 inference by registering torch.ones for auto‑cast |
| 6272 | Add Pytorch version 2.9 | Dep‑Update | Extends supported PyTorch releases (2.5.1, 2.7.1, 2.8.0, 2.9.0) in CI and docs |
| 6263 | [ESPnet‑3] Merge master into espnet3 branch | Merge | Syncs espnet3 with master, fixing CI and dependency mismatches |
| 6260 | SpeechLM Data Infra: dataset management | New Feature | Implements data registry, dataset loaders, and configuration templates for SpeechLM |
| 6259 | pre‑commit.ci autoupdate | Tooling | Updates black and isort to latest stable versions |
| 6255 | Fix default batch sampler fallback for category iterator | Bugfix | Restores legacy folded → catbel mapping, improving backward compatibility |
| 6253 | Restrict Docker Github Actions to Original Repo | Security | Prevents accidental image publishing from forks or non‑master branches |
| 6249 | [espnet3‑7] Add Callbacks | New Feature | Adds AverageCheckpointsCallback and standard callback factory for Lightning trainers |
| 6248 | Get forced alignments from CTC model | Feature | Enables forced alignment extraction for any CTC‑based S2T model |
| 6246 | MPS Support for loading float64 models | Bugfix | Handles float‑64 to float‑32 conversion for MPS device, avoiding dtype errors |
| 6244 | LID‑7: VoxLingua107 recipe | Recipe | Adds a new spoken‑language‑identification recipe for VoxLingua107 |
| 6243 | [espnet‑3] Merge master into espnet3 and fixed CI | Merge | Syncs espnet3 with master, removing underthesea dependency |
| 6239 | Upgrade pyopenjtalk to 0.4.1 | Dep‑Update | Updates pyopenjtalk installer to the latest version |
| 6238 | Add Pytorch version 2.9 | Dep‑Update | See 6272 |
| 6238 | Package Build Patch | Build | Moves g2p_en & ctc‑segmentation installation to Makefile, fixing pip package build |
| 6238 | Docker Updates | Refactor | See 6276 |
| 6238 | CI Installation fix | Bugfix | See 6275 |
| 6238 | [ESPnet‑Codec] Bug fix on codec activation function | Bugfix | See 6273 |
| 6238 | Add Pytorch version 2.9 | Dep‑Update | See 6272 |
| 6227 | Terry/parallelize spk emb extraction | Feature | Parallel speaker‑embedding extraction for TTS recipes |
| 6210 | LID‑8: CI and unit tests | Test | Adds comprehensive unit tests for LID functionality |
| 6178 | [espnet3‑6] Add evaluation scripts | Feature | Modularizes inference & evaluation pipelines in espnet3 |
| 6179 | [espnet3] ESPnet1 Support Sunset | Refactor | Removes legacy ESPnet1 support, consolidates to espnet2.legacy |
| 6177 | Merge master into espnet3 | Merge | Syncs espnet3 with master, fixing CI issues |
| 6175 | [espnet3‑5] Add parallel module and collect_stats | Feature | Adds Dask‑based parallel processing and collect_stats for data stats collection |
| 6174 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6173 | LID‑8: CI and unit tests | Test | See 6210 |
| 6172 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6171 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6170 | LID‑8: CI and unit tests | Test | See 6210 |
| 6168 | [espnet3‑5] Add parallel module and collect_stats | Feature | See 6175 |
| 6165 | LID‑8: CI and unit tests | Test | See 6210 |
| 6164 | LID‑8: CI and unit tests | Test | See 6210 |
| 6163 | LID‑8: CI and unit tests | Test | See 6210 |
| 6162 | LID‑8: CI and unit tests | Test | See 6210 |
| 6161 | LID‑8: CI and unit tests | Test | See 6210 |
| 6160 | LID‑8: CI and unit tests | Test | See 6210 |
| 6159 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6158 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6157 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6156 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6155 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
| 6154 | LID‑7: VoxLingua107 recipe | Recipe | See 6244 |
Note: The table above summarizes the most impactful PRs for this release. Several PRs are grouped by shared functionality (e.g., SpeechLM, Docker, and LID). Contributors for these changes include
dependabot[bot],whr-a,chinjouli,jctian98,Fhrozen,Masao‑Someki,KanTakahiro,akreal,pre‑commit‑ci[bot],Qingzheng‑Wang,Shikhar‑S,SanderGi,sw005320, andZhuoyanTao.
Key Takeaways
- Parallelism & Scalability – Dask‑based
espnet3.parallel,collect_stats, and new callbacks enable efficient distributed training, inference, and checkpoint ensembling. - SpeechLM Maturity – Core modules, DeepSpeed integration, multimodal IO, and data infrastructure create a solid foundation for large‑scale speech‑language models.
- Stability & Security – Updated dependencies (js‑yaml, PyTorch, CUDA), Docker 12.6, and Miniforge; bugfixes for codec EMA, MPS device handling, and category sampling.
- CI & Packaging – Modernized GitHub Actions, improved pip install flags, and new Docker images for Ubuntu 24.
What's Changed (Full changelog)
New Features
- [SpeechLM] model, preprocessor and collect_stats (See #6279, by @jctian98)
- [SpeechLM] Deepspeed trainer (See #6278, by @jctian98)
- SpeechLM Data Infra: multimodal IO (See #6258, by @jctian98)
- espnet3-7 Add Callbacks (See #6249, by @Masao-Someki)
Recipe
- POWSM-2: update code for data preparation (See #6283, by @chinjouli)
- POWSM-1: renaming directory (See #6282, by @chinjouli)
- SpeechLM Data Infra: Data batchfy, sampling and iterator (See #6260, by @jctian98)
- SpeechLM Data Infra: dataset management (See #6257, by @jctian98)
- Update wham_noise link for LibriMix Recipe (See #6251, by @Fhrozen)
- LID-7: VoxLingua107 recipe (See #6174, by @Qingzheng-Wang)
Bugfix
- [espnet3-8] Bugfix for recipe (See #6270, by @Masao-Someki)
- Fix HF tests by switching them to upstream testing models (See #6261, by @akreal)
- Fix default batch sampler fallback for category iterator (See #6255, by @Qingzheng-Wang)
Documentation
- Bump js-yaml from 4.1.0 to 4.1.1 in /doc/vuepress (See #6300, by @dependabot[bot])
- [espnet3-5] (2) Add parallel module and collect_stats (See #6242, by @Masao-Someki)
- [Doc 1] Add AI-gen documentation to espnetez (See #6241, by @Fhrozen)
- [espnet-3] Merge master into espnet3 and fixed CI (See #6239, by @Masao-Someki)
Refactoring
- [espnet3] ESPnet1 Support Sunset and Migration to
espnet2.legacy(See #6179, by @Masao-Someki)
Others
- codec fix: DDP logic and dead code revival logic (See #6284, by @whr-a)
- [SpeechLM] Minor fix on data loading (See #6280, by @jctian98)
- Docker Updates (See #6276, by @Fhrozen)
- CI Installation fix (See #6275, by @Fhrozen)
- [ESPnet-Codec] Bug fix on codec activation function (See #6273, by @jctian98)
- Add Pytorch version 2.9 (See #6272, by @Fhrozen)
- Codec codebase bug fixes:
detach()in RVQ residual andtarget_bandwidthin inference (See #6268, by @whr-a) - Add support for MPS devices in CTC prefix scoring (See #6266, by @KanTakahiro)
- [ESPnet-3] Merge master into espnet3 branch (See #6263, by @Masao-Someki)
- [pre-commit.ci] pre-commit autoupdate (See #6259, by @pre-commit-ci[bot])
- Restrict Docker Github Actions to Original Repo (See #6253, by @Fhrozen)
- Get forced alignments from CTC model (See #6248, by @Shikhar-S)
- MPS Support for loading float64 models like OWSM as float32 (See #6246, by @SanderGi)
- Package Build Patch (See #6240, by @Fhrozen)
- Upgrade pyopenjtalk to version 0.4.1 (See #6238, by @sw005320)
- Terry/parallelize spk emb extraction (See #6227, by @ZhuoyanTao)
- LID-8: CI and unit tests (See #6210, by @Qingzheng-Wang)
- [espnet3-6] Add evaluation scripts (See #6178, by @Masao-Someki)
Acknowledgements
@Fhrozen, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @SanderGi, @Shikhar-S, @ZhuoyanTao, @akreal, @chinjouli, @dependabot[bot], @jctian98, @sw005320, @whr-a.
ESPnet version 202509
Summary
The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.
Overview
The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:
| Area | Change |
|---|---|
| Python / Dependencies | Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; numpy bumped to ≥ 2.2.0; removed Chainer‑related build steps. |
| OS Support | Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility). |
| Warp‑Transducer | Adopted ljn7/warp-transducer (FastEmit, modern CUDA/CMake). |
| LID | Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools). |
| Training | Added HybridOptim/HybridLRS for multi‑optimizer and scheduler configurations. |
| Recipes | New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template. |
| CI / DevOps | Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options. |
The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.
Important Pull Requests
| # | Category | Title | Key Impact |
|---|---|---|---|
| 6228 | Deprecation | EOL of Debian 11 support in favor of Debian 12 | CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation. |
| 6226 | Bug‑Fix | Fix GPU flag handling in tts.sh script | Prevents Versa from erroneously enabling GPU when gpu_inference is False. |
| 6221 | Dependency | Update numpy version | Upgrades numpy ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts. |
| 6220 | Refactor | Remove old speechlm module | Cleaned out obsolete SpeechLM code. |
| 6187 | Core | Switch warp‑transducer to ljn7 fork | Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts. |
| 6159 | Core | LID‑5: Tri‑stage learning rate scheduler | Stabilizes training with warm‑up / hold / decay phases. |
| 6158 | Core | LID‑4: Category‑ and dataset‑aware balanced sampler | Addresses language and dataset imbalance via power‑law sampling. |
| 6156 | Feature | LID‑2: Model, loss and pooling modules | Introduces language‑identification models, custom losses, and pooling strategies. |
| 6208 | Bug‑Fix | Guard against having both use_sid and use_spk_embed set to true | Prevents conflicting speaker‑ID/embedding settings. |
| 6206 | DevOps | Add environment tag to publish docker image | Clarifies Docker publishing workflow. |
| 6202 | DevOps | Add Automerge Action | Enables automated PR merging with label and review checks. |
| 6205 | Recipe | LongLibriHeavy benchmark | Provides long‑form speech evaluation baseline. |
| 6194 | Recipe | Add recipe for Qwen2‑Audio‑7B‑Chat | Baseline for Dynamic‑SUPERB ASR task. |
| 6176 | Recipe | OWSM v4 Recipe | Adds OWSM v4 training/configuration. |
| 6160 | Recipe | LID‑6: LID recipe template | Offers ready‑to‑use LID experiment scaffold. |
| 6173 | Feature | [espnet3‑4] Add support for multiple optimizers and schedulers | Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS). |
| 6172 | Feature | Additional integration for multi‑optimizer support (see linked PRs) | Supports advanced training strategies. |
| 6132 | Recipe | AVSR recipe for Galaxy Dataset | Adds AVSR training capability for Galaxy. |
Full changelog
What's Changed
New Features
- LID-2: Model, loss and pooling modules (See #6156, by @Qingzheng-Wang)
Enhancement
- [espnet3-4] Add support for multiple optimizers and schedulers (See #6173, by @Masao-Someki)
- LID-4: Category- and dataset-aware balanced sampler (See #6158, by @Qingzheng-Wang)
Recipe
- LongLibriHeavy benchmark (Basic Recipe without training for now) (See #6205, by @Miamoto)
- Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task (See #6194, by @cyhuang-tw)
- OWSM v4 Recipe (See #6176, by @pyf98)
- LID-6: LID recipe template (See #6160, by @Qingzheng-Wang)
- AVSR recipe for Galaxy Dataset (See #6132, by @YJCX330)
Documentation
Refactoring
Others
- Fix GPU flag handling in tts.sh script (See #6226, by @ZhuoyanTao)
- Update numpy version (See #6221, by @Fhrozen)
- Add guard against having both use_sid and use_spk_embed set to true (See #6208, by @ZhuoyanTao)
- Add environment tag to publish docker image (See #6206, by @Fhrozen)
- Add Automerge Action (See #6202, by @Fhrozen)
- Switch warp-transducer to ljn7 fork with FastEmit and modern CUDA/CMa… (See #6187, by @ljn7)
- LID-5: Tri-stage learning rate scheduler (See #6159, by @Qingzheng-Wang)
- LID-3: Inference, embedding extraction and t-SNE visualization (See #6157, by @Qingzheng-Wang)
Acknowledgements
@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.
Happy coding! 🚀
ESPnet version 202506
New Features
- [New Features][ESPnet2][ESPnet3][CI][size:XXL][lgtm] [espnet3-3] Add trainer and model #6172 by @Masao-Someki
- [New Features][ESPnet3][CI][size:XXL][lgtm] [espnet3-1] Add Data Organizer #6167 by @Masao-Someki
- [New Features][ESPnet2][size:XL] LID-1: Training and task setup #6155 by @Qingzheng-Wang
- [New Features][ESPnet2][SID][size:XL] Update SPK recipe for CN-celeb #6154 by @holvan
- [New Features][ESPnet2][SLU] Add code for training turn taking prediction model #5948 by @siddhu001
Recipe
- [Recipe][ESPnet2][size:XXL] S2T Recipe for IPAPack++: Data Preparation #6169 by @chinjouli
- [Recipe][ESPnet2][size:XL] S2T Recipe for IPAPack++: main recipe #6168 by @chinjouli
- [Recipe][ESPnet2][Codec] add: complete codec1 recipe for AudioSet and musdb18 #6068 by @whr-a
- [Recipe][ESPnet2][ASR] Additional results for the discrete ASR challenge #6067 by @juice500ml
- [Recipe][ESPnet2][Installation][SE] Add implementations of USES2 speech enhancement models #5761 by @Emrys365
Bugfix
- [Bugfix][ESPnet2][size:XS] Fix FutureWarning
torch.cuda.amp.autocast(args...)is deprecated #6190 by @KanTakahiro - [Bugfix][ESPnet2][ESPnet1] Resolve logger warnings #6117 by @emmanuel-ferdman
- [Bugfix][ESPnet2] Fix for issue #6112 Lagacy torch tensor constructor causes issue when… #6114 by @advaitvd
Documentation
- [Documentation][ESPnet1][size:S] docs: clarify CBHG encoder vs post‑net roles in Tacotron 1 #6188 by @ZhuoyanTao
- [Documentation][ESPnet3][Docker][CI][size:L] Add devcontainer change from Espnet3 #6145 by @sw005320
- [Documentation][CI][size:M] Update PULL_REQUEST_TEMPLATE.md #6144 by @sw005320
- [Documentation][CI][size:M] Update document to add tutorials + more easy connection to installation #6143 by @juice500ml
- [Documentation][ESPnet3][Docker][size:L][lgtm] Espnet3/devcontainer #6141 by @Masao-Someki
- [Documentation][Installation] Update Makefile #6124 by @sw005320
Refactoring
- [Refactoring][ESPnet2][size:L] Refactor ACESinger's audio segmentation #6151 by @Arllan-lanliu
- [Refactoring][ESPnet2][ESPnet1][CI][size:L][lgtm] Flake8 CI Fixes #6140 by @Fhrozen
Others
- [Others][CI][size:S][lgtm] Workaround for shellcheck v0.11.0 #6197 by @Masao-Someki
- [Others][Installation][size:XS] Update transformers installation #6191 by @Fhrozen
- [Others][ESPnet3][CI][size:L] [espnet3-2] Add Config Loading script #6171 by @Masao-Someki
- [Others][ESPnet2][ESPnet1][ESPnetEZ][Installation][size:L] [espnet3] Format files #6164 by @Masao-Someki
- [Others][ESPnet2][SE] Update BSRNN implementations to support more flexible band-split schemes #6123 by @Emrys365
- [Others][ESPnet2][Music] [SVS1] SingingGenerate and VISinger Inference Fix #6113 by @HANJionghao
- [Others][CI] FIX CI test_import #6111 by @Fhrozen
- [Others][ESPnet2] [Recipe] Create inference recipe for non-native English ASR benchmark (ALLSSTAR) #6110 by @chenehk
- [Others][Docker][Installation][CI] Torch Version Update #6095 by @Fhrozen
- [Others][ESPnet2][ASR] Add explicit typecheck for warning msg #6082 by @ftshijt
- [Others][ESPnet2][ESPnet1][SSL][size:XL] SSL Fine-tuning PR #6069 by @wanchichen
New Contributors
- @Arllan-lanliu made their first contribution in #6090
- @chinjouli made their first contribution in #6109
- @chenehk made their first contribution in #6110
- @advaitvd made their first contribution in #6114
- @whr-a made their first contribution in #6068
- @holvan made their first contribution in #6126
- @Qingzheng-Wang made their first contribution in #6155
- @ZhuoyanTao made their first contribution in #6188
- @KanTakahiro made their first contribution in #6190
Acknowledgements
Special thanks to @Arllan-lanliu, @Emrys365, @Fhrozen, @HANJionghao, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @ZhuoyanTao, @advaitvd, @chenehk, @chinjouli, @emmanuel-ferdman, @ftshijt, @holvan, @juice500ml, @siddhu001, @sw005320, @wanchichen, @whr-a.
Full Changelog: v.202503...v.202506
ESPnet version 202503
New Features
Enhancement
- [Enhancement][ESPnet2][ESPnet1][OWSM] Improving efficiency of large-scale training #6024 by @pyf98
- [Enhancement][ESPnet2][Codec] Update scoring config to support WER/CER information with VERSA #6001 by @ftshijt
- [Enhancement][ESPnet1] Add Scaled Dot Product Attention (SDPA) from PyTorch #5994 by @pyf98
- [Enhancement][ESPnet2][ESPnet1][Installation] Support PyTorch Lightning Trainer in ESPnet2 #5954 by @pyf98
Recipe
- [Recipe][ESPnet2][ASR] cmu_kids #6017 by @wangpuup
- [Recipe][ESPnet2][ASR] EDACC dataset automatic speech recognition #5996 by @uwanny
- [Recipe][ESPnet2][ASR] ml-superb 2024 recipe #5989 by @wanchichen
- [Recipe][ESPnet2] Clotho_v2 Audio Captioning (DCASE 2023 implementation) #5967 by @Shikhar-S
Bugfix
- [Bugfix][Installation] Downgrade Transformers version #6071 by @Fhrozen
- [Bugfix][ESPnet2] Docs Fix #6065 by @Fhrozen
- [Bugfix][ESPnet2][ST] A quick fix for type error when dealing with multi-decoder (ST) #6064 by @ftshijt
- [Bugfix][ESPnet2][SID] fixed few typos on egs2/spk template #6060 by @yigitcatak
- [Bugfix][ESPnet2] Bugfix #6057 #6058 by @Masao-Someki
- [Bugfix][ESPnet2][SID] fix some minor errors in SID recipe #6045 by @shimhz
- [Bugfix][ESPnet2] Fix the deprecated amp interface #6036 by @ftshijt
- [Bugfix][ESPnet2] Add explicit weights_only=False for checkpoint loading #6035 by @ftshijt
- [Bugfix][Installation] Fix boost URL #6034 by @sw005320
- [Bugfix][Installation] Fix minor bug in Makefile #6031 by @juice500ml
- [Bugfix][ESPnet2] Logging bugfix, skip import #6023 by @Shikhar-S
- [Bugfix][ESPnet2][OWSM] Fix minor bug in OWSM-CTC preprocessor #6005 by @pyf98
- [Bugfix][ESPnet2][ASR] Minor formatting fixes in mlsuperb 2 recipe #6003 by @wanchichen
Documentation
Others
- [Others][Installation] Transformers version check #6076 by @Fhrozen
- [Others][ESPnet2][ESPnet1] New SSL Recipe #6053 by @wanchichen
- [Others][Installation] Update tools/README.md #6030 by @popcornell
- [Others][ESPnet2][OWSM] doc: update OWSM data preparation instructions #6026 by @kalvinchang
- [Others][ESPnet2][OWSM] fix: OWSM v3.1 - remove flash attention args #6025 by @kalvinchang
- [Others][ESPnet2][SED] BEATs Tokenizer Inference #6008 by @Shikhar-S
- [Others][ESPnet2][ESPnet1] Implement unified batch decode interface for OWSM-CTC #6007 by @pyf98
- [Others][ESPnet2][TTS] [feature]finish versa eval in TTS recipe #6002 by @Whale-Dolphin
- [Others][ESPnet2][ESPnet1][Installation][CI][SED] Classification Task and AudioSet-20K #5998 by @Shikhar-S
- [Others][ESPnet2][ESPnet1][Installation][CI] remove gtn in setup.py #5982 by @sw005320
- [Others][ESPnet2][ESPnet1][SED] ESC-50 classification with BEATs #5977 by @Shikhar-S
- [Others][ESPnet2][TTS][ASR][SLU] Spoken dialogue systems demo recipe #5975 by @siddhu001
- [Others][ESPnet2][SE] fix: gradient truncation bug in pit_solver.py #5974 by @YuzhuWang-code
Acknowledgements
Special thanks to @Fhrozen, @Masao-Someki, @Shikhar-S, @Whale-Dolphin, @YuzhuWang-code, @ftshijt, @juice500ml, @kalvinchang, @popcornell, @pyf98, @shimhz, @siddhu001, @sw005320, @taiqihe, @uwanny, @wanchichen, @wangpuup, @yigitcatak.
ESPnet version 202412
New Features
Enhancement
- [Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki
Recipe
- [Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
- [Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
- [Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
- [Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz
Bugfix
- [Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
- [Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
- [Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
- [Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
- [Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen
Documentation
- [Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
- [Documentation] Update CI reference #5939 by @emmanuel-ferdman
- [Documentation] fix: collcate_fn -> collate_fn #5925 by @kalvinchang
- [Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon
Others
- [Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
- [Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
- [Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay
Acknowledgements
Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.
ESPnet version 202409
New Features
- [New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
- [New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
- [New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
- [New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
- [New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
- [New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
- [New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti
Enhancement
- [Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
- [Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
- [Enhancement][ESPnet2][ESPnet1] TransformerDecoder forward_one_step with memory_mask #5679 by @albertz
- [Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712
Recipe
- [Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
- [Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
- [Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
- [Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
- [Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
- [Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
- [Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
- [Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
- [Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
- [Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
- [Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
- [Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
- [Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
- [Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
- [Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
- [Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
- [Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
- [Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
- [Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
- [Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
- [Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
- [Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
- [Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
- [Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
- [Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
- [Recipe][ESPnet2][ASR] add interspeech2024_dsu_challenge/asr2 #5627 by @simpleoier
- [Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt
Bugfix
- [Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
- [Bugfix] Bugfix/homepage #5885 by @Masao-Someki
- [Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
- [Bugfix] Bug fix for source link #5883 by @Masao-Someki
- [Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
- [Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
- [Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
- [Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
- [Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
- [Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
- [Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
- [Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
- [Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
- [Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
- [Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
- [Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
- [Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
- [Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
- [Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
- [Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
- [Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
- [Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
- [Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo
Documentation
- [Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
- [Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
- [Documentation] Add script to make release note from milestone #5653 by @kan-bayashi
Refactoring
- [Refactoring] Modified easy to ez #5719 by @Masao-Someki
Others
- [Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
- [Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
- [Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
- [Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
- [Others] Update README info #5852 by @ftshijt
- [Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
- [Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
- [Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
- [Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
- [Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
- [Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
- [Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
- [Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
- [Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
- [Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
- [Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
- [Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
- [Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
- [Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
- [Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
- [Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba
Acknowledgements
Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.
ESPnet version 202402
News
We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!
New Features
- [New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
- [New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
- [New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
- [New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
- [New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
- [New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki
Enhancement
- [Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
- [Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
- [Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
- [Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
- [Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt
Recipe
- [Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
- [Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
- [Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
- [Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
- [Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
- [Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
- [Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
- [Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
- [Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
- [Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
- [Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
- [Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
- [Recipe][ESPnet1] Added clean speech results #5649 by @linan2
- [Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
- [Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
- [Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
- [Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001
Bugfix
- [Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
- [Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
- [Bugfix] Fix CI #5642 by @siddhu001
- [Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
- [Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
- [Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
- [Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
- [Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
- [Bugfix][ESPnet2] Fix except #5567 by @takenori-y
- [Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
- [Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
- [Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
- [Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
- [Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso
Documentation
- [Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
- [Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23
Others
- [Others] Update Discord Invitation Link #5578 by @Fhrozen
- [Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365
Acknowledgements
Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.
ESPnet version 202310
What's Changed
- Support arbitrary language finetune for Whisper models. by @pengchengguo in #5344
- Update Dipco Data URL by @Fhrozen in #5391
- Update readme in TEMPLATE/svs1 by @linyueqian in #5394
- add gramvaani asr recipe by @bloodraven66 in #5366
- ESPnet-SPK: sampler by @Jungjee in #5365
- Adding general data augmentation methods for speech preprocessing by @Emrys365 in #5370
- Update of several SE recipes and some minor fixes by @Emrys365 in #5401
- Reproducing MIMOIRIS by @YoshikiMas in #5409
- Kathbath asr by @bloodraven66 in #5369
- Add pytorch2.0.1 to CI by @kamo-naoyuki in #5413
- [skip ci] Update README.md by @kamo-naoyuki in #5417
- In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in #5416
- Docker updates for local builds by @Fhrozen in #5406
- fix typo in TEMPLATE/svs1/README.md by @linyueqian in #5426
- Update install_mwerSegmenter.sh by @sw005320 in #5437
- Support Whisper-style training as a new task S2T by @pyf98 in #5120
- fix twice numpy installation issue by @kan-bayashi in #5447
- Add Whisper SOT recipe for Librimix by @LiChenda in #5371
- Update for the JOSS paper editor review by @neillu23 in #5418
- Add the VOiCES recipe for ASR by @Emrys365 in #5448
- Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in #5445
- [WIP] create recipe for acesinger by @linyueqian in #5431
- Add BibleTTS recipe by @wyh2000 in #5436
- ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in #5434
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5427
- Simple fix to reduce test_slu_inference time by @siddhu001 in #5460
- Do not use root logger in Beamsearch by @vsd-vector in #5454
- Fix whisper test by @siddhu001 in #5464
- Add doc for OWSM by @pyf98 in #5463
- Speech-to-speech translation Task by @ftshijt in #4859
- AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in #5456
- Support LoRA based large model finetuning. by @pengchengguo in #5400
- Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in #5323
- Add phonemized LibriTTS ASR recipe by @akreal in #5466
- Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in #5414
- speed up TFGridNet code by @zqwang7 in #5395
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5468
- ASR2 recipe on Tedlium3 dataset by @kohei0209 in #5331
- Create README.md in OWSM v1 by @pyf98 in #5489
- Update setup.py by @sw005320 in #5490
- Fix default value in ML-SUPERB by @ftshijt in #5492
- Fix bugs of Whisper SOT. by @pengchengguo in #5494
- Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in #5441
- Add a new SE recipe combining five public corpora by @Emrys365 in #5484
- Update .mergify.yml by @kamo-naoyuki in #5502
- update version to 202310 by @kan-bayashi in #5501
New Contributors
- @linyueqian made their first contribution in #5394
- @mdecerbo made their first contribution in #5416
- @zuazo made their first contribution in #5445
- @wyh2000 made their first contribution in #5436
- @yichen14 made their first contribution in #5434
- @vsd-vector made their first contribution in #5454
- @ms-dot-k made their first contribution in #5456
- @juice500ml made their first contribution in #5323
- @kohei0209 made their first contribution in #5331
Full Changelog: v.202308...v.202310
ESPnet version 202308
What's Changed
- Update tutorial by @ftshijt in #4648
- Update tutorials by @ftshijt in #4898
- add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in #5130
- Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in #5162
- [SVS] Add new recipes by @A-Quarter-Mile in #5158
- Update README.md of CHiME-7 DASR: fixing typos by @popcornell in #5166
- Fix typo in CONTRIBUTING.md by @eltociear in #5167
- CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in #5168
- Update TD-SpeakerBeam by @Emrys365 in #5155
- Add pre-trained causal speech separation model and streaming demo by @LiChenda in #5172
- KSC recipe by @khassanoff in #5171
- [SVS] Add new recipe by @A-Quarter-Mile in #5173
- Update AphasiaBank Recipe by @tjysdsg in #5104
- fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in #5159
- Add installer for ParallelWaveGAN by @ftshijt in #4052
- [GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in #5123
- [SVS] Update docs README.md by @South-Twilight in #5178
- Update SVS README.md by @jerryuhoo in #5180
- Adding eendss models by @soumimaiti in #5157
- 2022fall new task tutorial by @ftshijt in #5186
- [SVS] Updates for recipes by @A-Quarter-Mile in #5187
- [GAN SVS] fix phoneme predictor by @jerryuhoo in #5188
- Update generate_librimix_sd.sh by @leepeiying in #5182
- Bug fix for #5195 by @YosukeHiguchi in #5196
- [SVS] Update on recipes by @A-Quarter-Mile in #5197
- Update preprocessor.py by @sw005320 in #5200
- Minor fixes for ML-SUPERB by @ftshijt in #5202
- Quick fix for whisper specaug by @siddhu001 in #5206
- espnet-spk data preparation part by @Jungjee in #5184
- Fix M4singer multi-spk recipe by @ftshijt in #5201
- Update Dataset link for mlsuperb by @ftshijt in #5216
- Fix bug when score_type is set to normal in ml_superb by @ftshijt in #5217
- Add new functions and fix some bugs in SE by @Emrys365 in #5193
- Update import order by @ftshijt in #5229
- Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in #5228
- Standalone Transducer v1.1 by @b-flo in #5140
- Small fixes for Transducer by @b-flo in #5247
- add asr2 task and librispeech recipe as an example. by @simpleoier in #5181
- fix norm compatibility in scale discriminator by @kan-bayashi in #5240
- CFSD, SECS metrics for TTS by @imdanboy in #5235
- Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in #5246
- Fix bugs in mfa_format.py by @G-Thor in #5223
- New features for SVS by @ftshijt in #5245
- re-fix norm compatibility in scale discriminator by @kan-bayashi in #5249
- add conv1d subsampling 3 and egs2/librispeech/asr2 wavlm_large_21 kmeans (1000/2000) results by @simpleoier in #5252
- Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in #5212
- Fix a bug in score script for ML-SUPERB by @ftshijt in #5254
- Refactor prep_segments in SVS by @jerryuhoo in #5210
- A minor fix for num_splits_ssl for training by @ftshijt in #5262
- [SVS] add singing tacotron by @A-Quarter-Mile in #5233
- Add script to use speaker averaged xvectors in TTS training by @G-Thor in #5244
- Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in #5267
- Some name update for ml-superb by @ftshijt in #5276
- Add support for K2 pruned transducer loss by @b-flo in #5268
- Fix Transducer doc by @b-flo in #5306
- Update installation.md by @kamo-naoyuki in #5291
- Update install_nkf.sh by @sw005320 in #5300
- Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in #5310
- Update README.md by @sw005320 in #5315
- Update setup.py by @sw005320 in #5316
- Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in #5251
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5294
- A few updates for asr2 and hubert by @simpleoier in #5285
- Add decode_options and hyp_cleaner in evaluate_whisper_inference by @pyf98 in #5272
- update pyworld version by @kan-bayashi in #5319
- fix a data preparation issue for librimix recipe. by @LiChenda in #5322
- Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in #5289
- fix the s3prl frontend gradient backprop bug, ensuring feature_grad_mult=1.0 by @simpleoier in #5297
- ESPNet-SPK part 2 - training by @Jungjee in #5258
- remove some tests in espnet1 integration test by @sw005320 in #5328
- Fix random segments by @iamanigeeit in #5274
- Skip CI for draft PR by @ftshijt in #5333
- Update cancel.yml by @kan-bayashi in #5334
- Update several SE recipes and bash scripts by @Emrys365 in #5327
- Add PULL_REQUEST_TEMPLATE.md by @kan-bayashi in #5340
- ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in #5314
- Minimize espnet2 integration test by @kan-bayashi in #5324
- PR Labels for CI control by @Fhrozen in #5320
- Split ci into several jobs by @kan-bayashi in #5343
- Update CONTRIBUTING.md by @sw005320 in #5335
- Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in #5341
- Fix documentation skip CI by @Fhrozen in #5351
- Update the usage by @sw005320 in #5349
- Docker Update by @Fhrozen in #5321
- Update installation.md by @sw005320 in #5348
- Fix doc condition by @kan-bayashi in #5355
- Update issue templates by @sw005320 in #5357
- Update Contribution.md by @Fhrozen in #5352
- Fix .mergify condition by @kan-bayashi in #5354
- Reduce ffmpeg installation time in ci by @kan-bayashi in #5356
- Update CI table by @kan-bayashi in #5359
- Clean workflow files by @kan-bayashi in #5360
- Couple of tweaks for asr2.sh for the HF hub upload by @akreal in #5362
- Update TEMPLATE_HF_Readme.md (fix bash typo) by @akreal in #5361
- Add discrete-token ASR for LibriSpeech 100h by @akreal in #5350
- Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in #5342
- Fix bug in ngram training in slu.sh by @siddhu001 in #5364
- Add musdb18 recipe for music source separation by @Emrys365 in #5338
- Bugfix: JETS CTCLoss by @imdanboy in #5288
- Check the value of
n_shift==upsample_factorin GAN_TTS by @i...
ESPnet version 202304
What's Changed
- Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in #4888
- Apply the latest black by @kamo-naoyuki in #4907
- Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in #4906
- How2 fix README, incorrect url by @roshansh-cmu in #4902
- standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in #4905
- Fix typo in lrs/README.md by @eltociear in #4911
- MSUPERB setting update by @ftshijt in #4913
- Update test_import.yaml to install numba by @kamo-naoyuki in #4918
- update pyopenjtalk version to 0.3.0 by @kan-bayashi in #4912
- CHiME-7 Task1 recipe by @popcornell in #4894
- Update CHiME-7 Task 1 README.md by @popcornell in #4920
- Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in #4922
- Add few shot subset for mSuperb multilingual setting by @guapaQAQ in #4923
- Fix existing bugs in the TSE task by @Emrys365 in #4915
- IAM OCR recipe updates by @kenzheng99 in #4927
- Fixing some issues with chime7-task1 baseline by @popcornell in #4925
- set default none decoder for ASR by @ftshijt in #4917
- Update inference and training setting for mSuperb multilingual model by @guapaQAQ in #4932
- Add E-Branchformer Transducer results by @pyf98 in #4933
- add tf-gridnet by @zqwang7 in #4864
- Fixes + Channel Selection for CHiME-7 Task by @popcornell in #4934
- fix extracted feature dummy generation by @roshansh-cmu in #4926
- Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in #4941
- CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in #4942
- Update show_asr_result.sh by @kamo-naoyuki in #4943
- CHiME-7 DASR correct development results by @popcornell in #4946
- Fix 'floordiv is deprecated' warnings by @fujimotos in #4945
- Added WSLII installation instruction by @sw005320 in #4949
- Update Muskits by @A-Quarter-Mile in #4931
- Set a longer time execution threshold for related failed time-outs CI by @ftshijt in #4962
- Modify data prep for mSUPERB multilingual by @guapaQAQ in #4965
- Add E-Branchformer results in some recipes by @pyf98 in #4958
- Add 'six' as a required Python module by @fujimotos in #4964
- add msuperb linguistic analysis by @hhhaaahhhaa in #4938
- Fix a 'ref_channel'-related issue in espnet2/bin/enh_inference.py by @Emrys365 in #4972
- Add E-Branchformer results in slurp_entity by @pyf98 in #4971
- Add Conformer and E-Branchformer results in fisher_spanish_callhome ASR by @pyf98 in #4976
- [SVS] Add Joint-training by @A-Quarter-Mile in #4977
- Update the chunk iterator for the TSE task by @Emrys365 in #4929
- update msuperb LID scoring script by @hhhaaahhhaa in #4979
- add multilingual+lid lid score generation by @hhhaaahhhaa in #4982
- Add python=3.10 to CI by @kamo-naoyuki in #4627
- LID score v2 by @hhhaaahhhaa in #4983
- Fix ci by @kamo-naoyuki in #4985
- Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in #4986
- Remove six by @kamo-naoyuki in #4988
- Modify format_wav_scp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in #4997
- Fix Whisper tokenizer CI error by @slSeanWU in #5004
- fix s3prl upstream attribute bug by @jwrh in #5003
- [Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in #4994
- Fix typeguard version by @silvanocerza in #5009
- Add .pre-commit-config.yaml by @kamo-naoyuki in #5011
- Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in #4998
- Modfiy .pre-commit-config.yaml by @kamo-naoyuki in #5012
- Modify .pre-commit-config.yaml by @kamo-naoyuki in #5014
- Modify .pre-commit-config.yaml by @kamo-naoyuki in #5015
- [Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in #5019
- Modify asr.sh by @kamo-naoyuki in #5020
- [SVS] Improve visinger by @jerryuhoo in #5022
- Use scripts/utils/print_args.sh instead of pyscripts/utils/print_args.py by @kamo-naoyuki in #5025
- Add docstring in extra_path.sh by @kamo-naoyuki in #5028
- Update installation.md by @kamo-naoyuki in #5029
- Update README.md by @kamo-naoyuki in #5030
- Update README.md by @kamo-naoyuki in #5031
- Change bc to python by @kamo-naoyuki in #5032
- Update tools/Makefile and path.sh by @kamo-naoyuki in #5027
- Fix for format_wav_scp.py by @kamo-naoyuki in #5038
- Add execute permission to install_ice_g2p.sh by @kamo-naoyuki in #5040
- Bug fix of #5025 by @kamo-naoyuki in #5039
- [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5041
- Update README.md by @kamo-naoyuki in #5042
- Update README.md by @kamo-naoyuki in #5043
- Update README.md by @kamo-naoyuki in #5045
- Fix in gen_task1_data.sh from CHiME7 by @boeddeker in #4953
- Update README.md by @eml914 in #5044
- Add installers/install_ffmpeg.sh by @kamo-naoyuki in #5046
- Fix broken links reported by #5048 by @ShigekiKarita in #5050
- fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in #4978
- Add miniconda in gitignore by @pyf98 in #5052
- CHiME-7 DASR fixes from participants feedback by @popcornell in #4999
- Fix the condition for maxlen warning in beam search by @pyf98 in #5055
- Fixed SQLalchemy version for MFA by @Fhrozen in #5059
- Support Multi-Blank Transducer in Espnet2 by @jctian98 in #4876
- Fix chime7 DASR task1 run.sh by @kamo-naoyuki in #5060
- CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in #5061
- Add test_format_wav_scp_sh.bats by @kamo-naoyuki in #5062
- Update documentation by @kamo-naoyuki in #5063
- Support SOT training on LibriMix data. by @pengchengguo in #4861
- Update check_install.py by @kamo-naoyuki in #5066
- Tedlium3 recipe by @Some-random in #5068
- Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in #5074
- Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in #5075
- Clean ML-SUPERB by @ftshijt in #5067
- CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in #5054
- Chime7-task1 diarization (updated results) by @popcornell in #5088
- Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in #5084
- [SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pu...