Summary

@jctian98

Summary

Release date: 2025‑11‑17
Milestone: 202511 – A version that brings a large number of new features, stability improvements, and a refreshed CI & Docker workflow.

ESPnet 202511 introduces robust parallel processing primitives, a fully‑refactored inference & evaluation pipeline, extensive SpeechLM support, and a modernized Docker/CI stack. The release also resolves lingering bugs in codec EMA logic, MPS device handling, and category‑balanced batching while tightening dependency management and documentation quality.

Highlighted Pull Requests

#	Title	Category	Key Impact
6300	Bump js‑yaml from 4.1.0 to 4.1.1 in `/doc/vuepress`	Dep‑Update	Secures the documentation build against a prototype‑pollution CVE in `yaml merge`
6284	codec fix: DDP logic and dead code revival logic	Bugfix	Restores EMA state for dead‑code recovery and synchronizes codec updates across all DDP workers
6286	[SpeechLM] Deepspeed trainer	New Feature	Adds full DeepSpeed support (train.py + deepspeed_trainer.py) for large‑scale SpeechLM training
6279	[SpeechLM] model, preprocessor and collect_stats	New Feature	Core SpeechLM components – job templates, preprocessing, multimodal IO, and stats collection
6278	[SpeechLM] Deepspeed trainer	New Feature	See above – DeepSpeed integration for SpeechLM workflows
6276	Docker Updates	Refactor	Upgrades Ubuntu 24.04, CUDA 12.6, PyTorch 2.8.0, and transitions to Miniforge; modernizes Dockerfile syntax
6275	CI Installation fix	Bugfix	Adds `--no-build-isolation` for editable installs, improving reproducibility across CI environments
6273	[ESPnet‑Codec] Bug fix on codec activation function	Bugfix	Enables BF16 inference by registering `torch.ones` for auto‑cast
6272	Add Pytorch version 2.9	Dep‑Update	Extends supported PyTorch releases (2.5.1, 2.7.1, 2.8.0, 2.9.0) in CI and docs
6263	[ESPnet‑3] Merge master into espnet3 branch	Merge	Syncs espnet3 with master, fixing CI and dependency mismatches
6260	SpeechLM Data Infra: dataset management	New Feature	Implements data registry, dataset loaders, and configuration templates for SpeechLM
6259	pre‑commit.ci autoupdate	Tooling	Updates black and isort to latest stable versions
6255	Fix default batch sampler fallback for category iterator	Bugfix	Restores legacy `folded` → `catbel` mapping, improving backward compatibility
6253	Restrict Docker Github Actions to Original Repo	Security	Prevents accidental image publishing from forks or non‑master branches
6249	[espnet3‑7] Add Callbacks	New Feature	Adds `AverageCheckpointsCallback` and standard callback factory for Lightning trainers
6248	Get forced alignments from CTC model	Feature	Enables forced alignment extraction for any CTC‑based S2T model
6246	MPS Support for loading float64 models	Bugfix	Handles float‑64 to float‑32 conversion for MPS device, avoiding dtype errors
6244	LID‑7: VoxLingua107 recipe	Recipe	Adds a new spoken‑language‑identification recipe for VoxLingua107
6243	[espnet‑3] Merge master into espnet3 and fixed CI	Merge	Syncs espnet3 with master, removing `underthesea` dependency
6239	Upgrade pyopenjtalk to 0.4.1	Dep‑Update	Updates pyopenjtalk installer to the latest version
6238	Add Pytorch version 2.9	Dep‑Update	See 6272
6238	Package Build Patch	Build	Moves `g2p_en` & `ctc‑segmentation` installation to Makefile, fixing pip package build
6238	Docker Updates	Refactor	See 6276
6238	CI Installation fix	Bugfix	See 6275
6238	[ESPnet‑Codec] Bug fix on codec activation function	Bugfix	See 6273
6238	Add Pytorch version 2.9	Dep‑Update	See 6272
6227	Terry/parallelize spk emb extraction	Feature	Parallel speaker‑embedding extraction for TTS recipes
6210	LID‑8: CI and unit tests	Test	Adds comprehensive unit tests for LID functionality
6178	[espnet3‑6] Add evaluation scripts	Feature	Modularizes inference & evaluation pipelines in espnet3
6179	[espnet3] ESPnet1 Support Sunset	Refactor	Removes legacy ESPnet1 support, consolidates to espnet2.legacy
6177	Merge master into espnet3	Merge	Syncs espnet3 with master, fixing CI issues
6175	[espnet3‑5] Add parallel module and collect_stats	Feature	Adds Dask‑based parallel processing and `collect_stats` for data stats collection
6174	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6173	LID‑8: CI and unit tests	Test	See 6210
6172	[espnet3‑5] Add parallel module and collect_stats	Feature	See 6175
6171	[espnet3‑5] Add parallel module and collect_stats	Feature	See 6175
6170	LID‑8: CI and unit tests	Test	See 6210
6168	[espnet3‑5] Add parallel module and collect_stats	Feature	See 6175
6165	LID‑8: CI and unit tests	Test	See 6210
6164	LID‑8: CI and unit tests	Test	See 6210
6163	LID‑8: CI and unit tests	Test	See 6210
6162	LID‑8: CI and unit tests	Test	See 6210
6161	LID‑8: CI and unit tests	Test	See 6210
6160	LID‑8: CI and unit tests	Test	See 6210
6159	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6158	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6157	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6156	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6155	LID‑7: VoxLingua107 recipe	Recipe	See 6244
6154	LID‑7: VoxLingua107 recipe	Recipe	See 6244

Note: The table above summarizes the most impactful PRs for this release. Several PRs are grouped by shared functionality (e.g., SpeechLM, Docker, and LID). Contributors for these changes include dependabot[bot], whr-a, chinjouli, jctian98, Fhrozen, Masao‑Someki, KanTakahiro, akreal, pre‑commit‑ci[bot], Qingzheng‑Wang, Shikhar‑S, SanderGi, sw005320, and ZhuoyanTao.

Key Takeaways

Parallelism & Scalability – Dask‑based espnet3.parallel, collect_stats, and new callbacks enable efficient distributed training, inference, and checkpoint ensembling.
SpeechLM Maturity – Core modules, DeepSpeed integration, multimodal IO, and data infrastructure create a solid foundation for large‑scale speech‑language models.
Stability & Security – Updated dependencies (js‑yaml, PyTorch, CUDA), Docker 12.6, and Miniforge; bugfixes for codec EMA, MPS device handling, and category sampling.
CI & Packaging – Modernized GitHub Actions, improved pip install flags, and new Docker images for Ubuntu 24.

What's Changed (Full changelog)

New Features

[SpeechLM] model, preprocessor and collect_stats (See #6279, by @jctian98)
[SpeechLM] Deepspeed trainer (See #6278, by @jctian98)
SpeechLM Data Infra: multimodal IO (See #6258, by @jctian98)
espnet3-7 Add Callbacks (See #6249, by @Masao-Someki)

Recipe

POWSM-2: update code for data preparation (See #6283, by @chinjouli)
POWSM-1: renaming directory (See #6282, by @chinjouli)
SpeechLM Data Infra: Data batchfy, sampling and iterator (See #6260, by @jctian98)
SpeechLM Data Infra: dataset management (See #6257, by @jctian98)
Update wham_noise link for LibriMix Recipe (See #6251, by @Fhrozen)
LID-7: VoxLingua107 recipe (See #6174, by @Qingzheng-Wang)

Bugfix

[espnet3-8] Bugfix for recipe (See #6270, by @Masao-Someki)
Fix HF tests by switching them to upstream testing models (See #6261, by @akreal)
Fix default batch sampler fallback for category iterator (See #6255, by @Qingzheng-Wang)

Documentation

Bump js-yaml from 4.1.0 to 4.1.1 in /doc/vuepress (See #6300, by @dependabot[bot])
[espnet3-5] (2) Add parallel module and collect_stats (See #6242, by @Masao-Someki)
[Doc 1] Add AI-gen documentation to espnetez (See #6241, by @Fhrozen)
[espnet-3] Merge master into espnet3 and fixed CI (See #6239, by @Masao-Someki)

Refactoring

[espnet3] ESPnet1 Support Sunset and Migration to espnet2.legacy (See #6179, by @Masao-Someki)

Others

codec fix: DDP logic and dead code revival logic (See #6284, by @whr-a)
[SpeechLM] Minor fix on data loading (See #6280, by @jctian98)
Docker Updates (See #6276, by @Fhrozen)
CI Installation fix (See #6275, by @Fhrozen)
[ESPnet-Codec] Bug fix on codec activation function (See #6273, by @jctian98)
Add Pytorch version 2.9 (See #6272, by @Fhrozen)
Codec codebase bug fixes: detach() in RVQ residual and target_bandwidth in inference (See #6268, by @whr-a)
Add support for MPS devices in CTC prefix scoring (See #6266, by @KanTakahiro)
[ESPnet-3] Merge master into espnet3 branch (See #6263, by @Masao-Someki)
[pre-commit.ci] pre-commit autoupdate (See #6259, by @pre-commit-ci[bot])
Restrict Docker Github Actions to Original Repo (See #6253, by @Fhrozen)
Get forced alignments from CTC model (See #6248, by @Shikhar-S)
MPS Support for loading float64 models like OWSM as float32 (See #6246, by @SanderGi)
Package Build Patch (See #6240, by @Fhrozen)
Upgrade pyopenjtalk to version 0.4.1 (See #6238, by @sw005320)
Terry/parallelize spk emb extraction (See #6227, by @ZhuoyanTao)
LID-8: CI and unit tests (See #6210, by @Qingzheng-Wang)
[espnet3-6] Add evaluation scripts (See #6178, by @Masao-Someki)

Acknowledgements

@Fhrozen, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @SanderGi, @Shikhar-S, @ZhuoyanTao, @akreal, @chinjouli, @dependabot[bot], @jctian98, @sw005320, @whr-a.

@Qingzheng-Wang

Summary

The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.

Overview

The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:

Area	Change
Python / Dependencies	Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; `numpy` bumped to ≥ 2.2.0; removed Chainer‑related build steps.
OS Support	Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility).
Warp‑Transducer	Adopted `ljn7/warp-transducer` (FastEmit, modern CUDA/CMake).
LID	Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools).
Training	Added `HybridOptim`/`HybridLRS` for multi‑optimizer and scheduler configurations.
Recipes	New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template.
CI / DevOps	Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options.

The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.

Important Pull Requests

#	Category	Title	Key Impact
6228	Deprecation	EOL of Debian 11 support in favor of Debian 12	CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation.
6226	Bug‑Fix	Fix GPU flag handling in tts.sh script	Prevents Versa from erroneously enabling GPU when `gpu_inference` is False.
6221	Dependency	Update numpy version	Upgrades `numpy` ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts.
6220	Refactor	Remove old speechlm module	Cleaned out obsolete SpeechLM code.
6187	Core	Switch warp‑transducer to ljn7 fork	Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts.
6159	Core	LID‑5: Tri‑stage learning rate scheduler	Stabilizes training with warm‑up / hold / decay phases.
6158	Core	LID‑4: Category‑ and dataset‑aware balanced sampler	Addresses language and dataset imbalance via power‑law sampling.
6156	Feature	LID‑2: Model, loss and pooling modules	Introduces language‑identification models, custom losses, and pooling strategies.
6208	Bug‑Fix	Guard against having both use_sid and use_spk_embed set to true	Prevents conflicting speaker‑ID/embedding settings.
6206	DevOps	Add environment tag to publish docker image	Clarifies Docker publishing workflow.
6202	DevOps	Add Automerge Action	Enables automated PR merging with label and review checks.
6205	Recipe	LongLibriHeavy benchmark	Provides long‑form speech evaluation baseline.
6194	Recipe	Add recipe for Qwen2‑Audio‑7B‑Chat	Baseline for Dynamic‑SUPERB ASR task.
6176	Recipe	OWSM v4 Recipe	Adds OWSM v4 training/configuration.
6160	Recipe	LID‑6: LID recipe template	Offers ready‑to‑use LID experiment scaffold.
6173	Feature	[espnet3‑4] Add support for multiple optimizers and schedulers	Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS).
6172	Feature	Additional integration for multi‑optimizer support (see linked PRs)	Supports advanced training strategies.
6132	Recipe	AVSR recipe for Galaxy Dataset	Adds AVSR training capability for Galaxy.

Full changelog

What's Changed

New Features

LID-2: Model, loss and pooling modules (See #6156, by @Qingzheng-Wang)

Enhancement

[espnet3-4] Add support for multiple optimizers and schedulers (See #6173, by @Masao-Someki)
LID-4: Category- and dataset-aware balanced sampler (See #6158, by @Qingzheng-Wang)

Recipe

LongLibriHeavy benchmark (Basic Recipe without training for now) (See #6205, by @Miamoto)
Add recipe for Qwen2-Audio-7B-Chat on Dynamic-SUPERB ASR task (See #6194, by @cyhuang-tw)
OWSM v4 Recipe (See #6176, by @pyf98)
LID-6: LID recipe template (See #6160, by @Qingzheng-Wang)
AVSR recipe for Galaxy Dataset (See #6132, by @YJCX330)

Documentation

EOL of debian 11 support in favor of debian 12 (See #6228, by @Fhrozen)

Refactoring

Remove old speechlm module (See #6220, by @jctian98)

Others

Fix GPU flag handling in tts.sh script (See #6226, by @ZhuoyanTao)
Update numpy version (See #6221, by @Fhrozen)
Add guard against having both use_sid and use_spk_embed set to true (See #6208, by @ZhuoyanTao)
Add environment tag to publish docker image (See #6206, by @Fhrozen)
Add Automerge Action (See #6202, by @Fhrozen)
Switch warp-transducer to ljn7 fork with FastEmit and modern CUDA/CMa… (See #6187, by @ljn7)
LID-5: Tri-stage learning rate scheduler (See #6159, by @Qingzheng-Wang)
LID-3: Inference, embedding extraction and t-SNE visualization (See #6157, by @Qingzheng-Wang)

Acknowledgements

@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.

Happy coding! 🚀

@Masao-Someki

New Features

[New Features][ESPnet2][ESPnet3][CI][size:XXL][lgtm] [espnet3-3] Add trainer and model #6172 by @Masao-Someki
[New Features][ESPnet3][CI][size:XXL][lgtm] [espnet3-1] Add Data Organizer #6167 by @Masao-Someki
[New Features][ESPnet2][size:XL] LID-1: Training and task setup #6155 by @Qingzheng-Wang
[New Features][ESPnet2][SID][size:XL] Update SPK recipe for CN-celeb #6154 by @holvan
[New Features][ESPnet2][SLU] Add code for training turn taking prediction model #5948 by @siddhu001

Recipe

[Recipe][ESPnet2][size:XXL] S2T Recipe for IPAPack++: Data Preparation #6169 by @chinjouli
[Recipe][ESPnet2][size:XL] S2T Recipe for IPAPack++: main recipe #6168 by @chinjouli
[Recipe][ESPnet2][Codec] add: complete codec1 recipe for AudioSet and musdb18 #6068 by @whr-a
[Recipe][ESPnet2][ASR] Additional results for the discrete ASR challenge #6067 by @juice500ml
[Recipe][ESPnet2][Installation][SE] Add implementations of USES2 speech enhancement models #5761 by @Emrys365

Bugfix

[Bugfix][ESPnet2][size:XS] Fix FutureWarning torch.cuda.amp.autocast(args...) is deprecated #6190 by @KanTakahiro
[Bugfix][ESPnet2][ESPnet1] Resolve logger warnings #6117 by @emmanuel-ferdman
[Bugfix][ESPnet2] Fix for issue #6112 Lagacy torch tensor constructor causes issue when… #6114 by @advaitvd

Documentation

[Documentation][ESPnet1][size:S] docs: clarify CBHG encoder vs post‑net roles in Tacotron 1 #6188 by @ZhuoyanTao
[Documentation][ESPnet3][Docker][CI][size:L] Add devcontainer change from Espnet3 #6145 by @sw005320
[Documentation][CI][size:M] Update PULL_REQUEST_TEMPLATE.md #6144 by @sw005320
[Documentation][CI][size:M] Update document to add tutorials + more easy connection to installation #6143 by @juice500ml
[Documentation][ESPnet3][Docker][size:L][lgtm] Espnet3/devcontainer #6141 by @Masao-Someki
[Documentation][Installation] Update Makefile #6124 by @sw005320

Refactoring

[Refactoring][ESPnet2][size:L] Refactor ACESinger's audio segmentation #6151 by @Arllan-lanliu
[Refactoring][ESPnet2][ESPnet1][CI][size:L][lgtm] Flake8 CI Fixes #6140 by @Fhrozen

Others

[Others][CI][size:S][lgtm] Workaround for shellcheck v0.11.0 #6197 by @Masao-Someki
[Others][Installation][size:XS] Update transformers installation #6191 by @Fhrozen
[Others][ESPnet3][CI][size:L] [espnet3-2] Add Config Loading script #6171 by @Masao-Someki
[Others][ESPnet2][ESPnet1][ESPnetEZ][Installation][size:L] [espnet3] Format files #6164 by @Masao-Someki
[Others][ESPnet2][SE] Update BSRNN implementations to support more flexible band-split schemes #6123 by @Emrys365
[Others][ESPnet2][Music] [SVS1] SingingGenerate and VISinger Inference Fix #6113 by @HANJionghao
[Others][CI] FIX CI test_import #6111 by @Fhrozen
[Others][ESPnet2] [Recipe] Create inference recipe for non-native English ASR benchmark (ALLSSTAR) #6110 by @chenehk
[Others][Docker][Installation][CI] Torch Version Update #6095 by @Fhrozen
[Others][ESPnet2][ASR] Add explicit typecheck for warning msg #6082 by @ftshijt
[Others][ESPnet2][ESPnet1][SSL][size:XL] SSL Fine-tuning PR #6069 by @wanchichen

New Contributors

@Arllan-lanliu made their first contribution in #6090
@chinjouli made their first contribution in #6109
@chenehk made their first contribution in #6110
@advaitvd made their first contribution in #6114
@whr-a made their first contribution in #6068
@holvan made their first contribution in #6126
@Qingzheng-Wang made their first contribution in #6155
@ZhuoyanTao made their first contribution in #6188
@KanTakahiro made their first contribution in #6190

Acknowledgements

Special thanks to @Arllan-lanliu, @Emrys365, @Fhrozen, @HANJionghao, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @ZhuoyanTao, @advaitvd, @chenehk, @chinjouli, @emmanuel-ferdman, @ftshijt, @holvan, @juice500ml, @siddhu001, @sw005320, @wanchichen, @whr-a.

Full Changelog: v.202503...v.202506

@taiqihe

New Features

[New Features][ESPnet2] Add Hugging Face Front End #5913 by @taiqihe

Enhancement

[Enhancement][ESPnet2][ESPnet1][OWSM] Improving efficiency of large-scale training #6024 by @pyf98
[Enhancement][ESPnet2][Codec] Update scoring config to support WER/CER information with VERSA #6001 by @ftshijt
[Enhancement][ESPnet1] Add Scaled Dot Product Attention (SDPA) from PyTorch #5994 by @pyf98
[Enhancement][ESPnet2][ESPnet1][Installation] Support PyTorch Lightning Trainer in ESPnet2 #5954 by @pyf98

Recipe

[Recipe][ESPnet2][ASR] cmu_kids #6017 by @wangpuup
[Recipe][ESPnet2][ASR] EDACC dataset automatic speech recognition #5996 by @uwanny
[Recipe][ESPnet2][ASR] ml-superb 2024 recipe #5989 by @wanchichen
[Recipe][ESPnet2] Clotho_v2 Audio Captioning (DCASE 2023 implementation) #5967 by @Shikhar-S

Bugfix

[Bugfix][Installation] Downgrade Transformers version #6071 by @Fhrozen
[Bugfix][ESPnet2] Docs Fix #6065 by @Fhrozen
[Bugfix][ESPnet2][ST] A quick fix for type error when dealing with multi-decoder (ST) #6064 by @ftshijt
[Bugfix][ESPnet2][SID] fixed few typos on egs2/spk template #6060 by @yigitcatak
[Bugfix][ESPnet2] Bugfix #6057 #6058 by @Masao-Someki
[Bugfix][ESPnet2][SID] fix some minor errors in SID recipe #6045 by @shimhz
[Bugfix][ESPnet2] Fix the deprecated amp interface #6036 by @ftshijt
[Bugfix][ESPnet2] Add explicit weights_only=False for checkpoint loading #6035 by @ftshijt
[Bugfix][Installation] Fix boost URL #6034 by @sw005320
[Bugfix][Installation] Fix minor bug in Makefile #6031 by @juice500ml
[Bugfix][ESPnet2] Logging bugfix, skip import #6023 by @Shikhar-S
[Bugfix][ESPnet2][OWSM] Fix minor bug in OWSM-CTC preprocessor #6005 by @pyf98
[Bugfix][ESPnet2][ASR] Minor formatting fixes in mlsuperb 2 recipe #6003 by @wanchichen

Documentation

[Documentation][ESPnet2][CI] [Doc] Update parser on lightning_train #6020 by @Fhrozen

Others

[Others][Installation] Transformers version check #6076 by @Fhrozen
[Others][ESPnet2][ESPnet1] New SSL Recipe #6053 by @wanchichen
[Others][Installation] Update tools/README.md #6030 by @popcornell
[Others][ESPnet2][OWSM] doc: update OWSM data preparation instructions #6026 by @kalvinchang
[Others][ESPnet2][OWSM] fix: OWSM v3.1 - remove flash attention args #6025 by @kalvinchang
[Others][ESPnet2][SED] BEATs Tokenizer Inference #6008 by @Shikhar-S
[Others][ESPnet2][ESPnet1] Implement unified batch decode interface for OWSM-CTC #6007 by @pyf98
[Others][ESPnet2][TTS] [feature]finish versa eval in TTS recipe #6002 by @Whale-Dolphin
[Others][ESPnet2][ESPnet1][Installation][CI][SED] Classification Task and AudioSet-20K #5998 by @Shikhar-S
[Others][ESPnet2][ESPnet1][Installation][CI] remove gtn in setup.py #5982 by @sw005320
[Others][ESPnet2][ESPnet1][SED] ESC-50 classification with BEATs #5977 by @Shikhar-S
[Others][ESPnet2][TTS][ASR][SLU] Spoken dialogue systems demo recipe #5975 by @siddhu001
[Others][ESPnet2][SE] fix: gradient truncation bug in pit_solver.py #5974 by @YuzhuWang-code

Acknowledgements

Special thanks to @Fhrozen, @Masao-Someki, @Shikhar-S, @Whale-Dolphin, @YuzhuWang-code, @ftshijt, @juice500ml, @kalvinchang, @popcornell, @pyf98, @shimhz, @siddhu001, @sw005320, @taiqihe, @uwanny, @wanchichen, @wangpuup, @yigitcatak.

@RayYuki

New Features

[New Features][ESPnet2][Codec] Add HiFiCodec model #5898 by @RayYuki

Enhancement

[Enhancement][ESPnetEZ] Add missing functionalities for espnetez #5890 by @Masao-Someki

Recipe

[Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
[Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
[Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
[Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz

Bugfix

[Bugfix][ESPnet2][Diarization] [Bugfix] fix keyword argument error in stage 7 of diar.sh #5969 by @eric102004
[Bugfix][ESPnetEZ] Bug fixed for #5949 #5950 by @Masao-Someki
[Bugfix][ESPnet2][ASR] removed ''continue'' statement from the for loop in run_mono.sh #5946 by @Trikaldarshi
[Bugfix][ESPnet2] Add SWBD text processing fix #5941 by @siddhu001
[Bugfix][ESPnet2][ESPnet1] Training code patches #5931 by @wanchichen

Documentation

[Documentation] Fix bug in document that overflows the page #5940 by @juice500ml
[Documentation] Update CI reference #5939 by @emmanuel-ferdman
[Documentation] fix: collcate_fn -> collate_fn #5925 by @kalvinchang
[Documentation][Docker][Installation][CI] Migration from Anaconda to conda-forge #5924 by @yoshipon

Others

[Others][ESPnet2][Codec] Fix versa interface #5951 by @ftshijt
[Others][ESPnet2][ESPnet1] Add OWSM-CTC #5933 by @pyf98
[Others][ESPnet2] Recipe/ogi kids speech #5916 by @anyuyay

Acknowledgements

Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.

@wyh2000

New Features

[New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
[New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
[New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
[New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
[New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
[New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
[New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti

Enhancement

[Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
[Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
[Enhancement][ESPnet2][ESPnet1] TransformerDecoder forward_one_step with memory_mask #5679 by @albertz
[Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712

Recipe

[Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
[Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
[Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
[Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
[Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
[Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
[Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
[Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
[Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
[Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
[Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
[Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
[Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
[Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
[Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
[Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
[Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
[Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
[Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
[Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
[Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
[Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
[Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
[Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
[Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
[Recipe][ESPnet2][ASR] add interspeech2024_dsu_challenge/asr2 #5627 by @simpleoier
[Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt

Bugfix

[Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
[Bugfix] Bugfix/homepage #5885 by @Masao-Someki
[Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
[Bugfix] Bug fix for source link #5883 by @Masao-Someki
[Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
[Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
[Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
[Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
[Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
[Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
[Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
[Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
[Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
[Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
[Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
[Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
[Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
[Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
[Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
[Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
[Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
[Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
[Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo

Documentation

[Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
[Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
[Documentation] Add script to make release note from milestone #5653 by @kan-bayashi

Refactoring

[Refactoring] Modified easy to ez #5719 by @Masao-Someki

Others

[Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
[Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
[Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
[Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
[Others] Update README info #5852 by @ftshijt
[Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
[Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
[Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
[Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
[Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
[Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
[Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
[Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
[Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
[Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
[Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
[Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
[Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
[Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
[Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
[Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba

Acknowledgements

Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.

@LiChenda

News

We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!

New Features

[New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
[New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
[New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
[New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
[New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
[New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki

Enhancement

[Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
[Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
[Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
[Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
[Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt

Recipe

[Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
[Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
[Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
[Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
[Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
[Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
[Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
[Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
[Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
[Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
[Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
[Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
[Recipe][ESPnet1] Added clean speech results #5649 by @linan2
[Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
[Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
[Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
[Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001

Bugfix

[Bugfix][ESPnet2] Fix path dependency in ESPnet tutorial #5645 by @siddhu001
[Bugfix][ESPnet2] Fix ESPnet tutorial #5644 by @siddhu001
[Bugfix] Fix CI #5642 by @siddhu001
[Bugfix][ESPnet2] Fixed bug by copying missing Kaldi scripts #5636 by @VicentCano
[Bugfix][ESPnet1][ASR] CTC prefix score, fix if blank == eos #5620 by @albertz
[Bugfix][ESPnet2] Fix minor OWSM data prep bug #5607 by @juice500ml
[Bugfix][ESPnet2][ESPnet1][CI] E721 #5589 by @sw005320
[Bugfix][ESPnet2][ESPnet1] Make minlenratio effective #5581 by @jctian98
[Bugfix][ESPnet2] Fix except #5567 by @takenori-y
[Bugfix][ESPnet1][Installation][CI] Improve error robustness of unit tests #5535 by @Emrys365
[Bugfix][ESPnet2][AV] Fix bug in lrs3 data preprocessing #5520 by @ms-dot-k
[Bugfix][ESPnet1] replace old mustc links with new instructions #5516 by @brianyan918
[Bugfix][ESPnet2][ST] Fix s2st HF model uploading #5504 by @tjysdsg
[Bugfix][ESPnet2][ESPnet1] bug fixes for must_c v2 recipe #5640 by @jasonmusespresso

Documentation

[Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
[Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23

Others

[Others] Update Discord Invitation Link #5578 by @Fhrozen
[Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.

@pengchengguo

What's Changed

Support arbitrary language finetune for Whisper models. by @pengchengguo in #5344
Update Dipco Data URL by @Fhrozen in #5391
Update readme in TEMPLATE/svs1 by @linyueqian in #5394
add gramvaani asr recipe by @bloodraven66 in #5366
ESPnet-SPK: sampler by @Jungjee in #5365
Adding general data augmentation methods for speech preprocessing by @Emrys365 in #5370
Update of several SE recipes and some minor fixes by @Emrys365 in #5401
Reproducing MIMOIRIS by @YoshikiMas in #5409
Kathbath asr by @bloodraven66 in #5369
Add pytorch2.0.1 to CI by @kamo-naoyuki in #5413
[skip ci] Update README.md by @kamo-naoyuki in #5417
In spec_augment.py, check whether an array is writeable before modifying it inplace by @mdecerbo in #5416
Docker updates for local builds by @Fhrozen in #5406
fix typo in TEMPLATE/svs1/README.md by @linyueqian in #5426
Update install_mwerSegmenter.sh by @sw005320 in #5437
Support Whisper-style training as a new task S2T by @pyf98 in #5120
fix twice numpy installation issue by @kan-bayashi in #5447
Add Whisper SOT recipe for Librimix by @LiChenda in #5371
Update for the JOSS paper editor review by @neillu23 in #5418
Add the VOiCES recipe for ASR by @Emrys365 in #5448
Improve diacritic compatibility in data_prep.pl preprocessing scripts by @zuazo in #5445
[WIP] create recipe for acesinger by @linyueqian in #5431
Add BibleTTS recipe by @wyh2000 in #5436
ASR2 CHiME4 & Gigaspeech Recipes by @yichen14 in #5434
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5427
Simple fix to reduce test_slu_inference time by @siddhu001 in #5460
Do not use root logger in Beamsearch by @vsd-vector in #5454
Fix whisper test by @siddhu001 in #5464
Add doc for OWSM by @pyf98 in #5463
Speech-to-speech translation Task by @ftshijt in #4859
AVSR recipes on LRS3 using pre-trained AV-HuBERT model by @ms-dot-k in #5456
Support LoRA based large model finetuning. by @pengchengguo in #5400
Multilingual Librispeech (MLS) refactor ASR1 recipe by @juice500ml in #5323
Add phonemized LibriTTS ASR recipe by @akreal in #5466
Update the Enh framework to support training with variable numbers of speakers by @Emrys365 in #5414
speed up TFGridNet code by @zqwang7 in #5395
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5468
ASR2 recipe on Tedlium3 dataset by @kohei0209 in #5331
Create README.md in OWSM v1 by @pyf98 in #5489
Update setup.py by @sw005320 in #5490
Fix default value in ML-SUPERB by @ftshijt in #5492
Fix bugs of Whisper SOT. by @pengchengguo in #5494
Multilingual Librispeech ASR2 + ASR1 baselines by @juice500ml in #5441
Add a new SE recipe combining five public corpora by @Emrys365 in #5484
Update .mergify.yml by @kamo-naoyuki in #5502
update version to 202310 by @kan-bayashi in #5501

New Contributors

@linyueqian made their first contribution in #5394
@mdecerbo made their first contribution in #5416
@zuazo made their first contribution in #5445
@wyh2000 made their first contribution in #5436
@yichen14 made their first contribution in #5434
@vsd-vector made their first contribution in #5454
@ms-dot-k made their first contribution in #5456
@juice500ml made their first contribution in #5323
@kohei0209 made their first contribution in #5331

Full Changelog: v.202308...v.202310

@ftshijt

What's Changed

Update tutorial by @ftshijt in #4648
Update tutorials by @ftshijt in #4898
add e-branchformer result for tedlium3 and add checker for text output length by @Some-random in #5130
Limit the Numpy version (<1.24) to fix CI error temporarily. by @simpleoier in #5162
[SVS] Add new recipes by @A-Quarter-Mile in #5158
Update README.md of CHiME-7 DASR: fixing typos by @popcornell in #5166
Fix typo in CONTRIBUTING.md by @eltociear in #5167
CHiME-7 DASR: Update install_dependencies.sh, fix lhotse version by @popcornell in #5168
Update TD-SpeakerBeam by @Emrys365 in #5155
Add pre-trained causal speech separation model and streaming demo by @LiChenda in #5172
KSC recipe by @khassanoff in #5171
[SVS] Add new recipe by @A-Quarter-Mile in #5173
Update AphasiaBank Recipe by @tjysdsg in #5104
fix the gradient backward issue when joint training with s3prl frontend by @simpleoier in #5159
Add installer for ParallelWaveGAN by @ftshijt in #4052
[GAN SVS] Add VISinger2, UHifiGAN, Avocodo by @jerryuhoo in #5123
[SVS] Update docs README.md by @South-Twilight in #5178
Update SVS README.md by @jerryuhoo in #5180
Adding eendss models by @soumimaiti in #5157
2022fall new task tutorial by @ftshijt in #5186
[SVS] Updates for recipes by @A-Quarter-Mile in #5187
[GAN SVS] fix phoneme predictor by @jerryuhoo in #5188
Update generate_librimix_sd.sh by @leepeiying in #5182
Bug fix for #5195 by @YosukeHiguchi in #5196
[SVS] Update on recipes by @A-Quarter-Mile in #5197
Update preprocessor.py by @sw005320 in #5200
Minor fixes for ML-SUPERB by @ftshijt in #5202
Quick fix for whisper specaug by @siddhu001 in #5206
espnet-spk data preparation part by @Jungjee in #5184
Fix M4singer multi-spk recipe by @ftshijt in #5201
Update Dataset link for mlsuperb by @ftshijt in #5216
Fix bug when score_type is set to normal in ml_superb by @ftshijt in #5217
Add new functions and fix some bugs in SE by @Emrys365 in #5193
Update import order by @ftshijt in #5229
Closed CHiME-7 DASR adding evaluation inference + adding support to use diarization baseline "pre-computed" JSONs (new PR) by @popcornell in #5228
Standalone Transducer v1.1 by @b-flo in #5140
Small fixes for Transducer by @b-flo in #5247
add asr2 task and librispeech recipe as an example. by @simpleoier in #5181
fix norm compatibility in scale discriminator by @kan-bayashi in #5240
CFSD, SECS metrics for TTS by @imdanboy in #5235
Add new SE recipes: chime1/enh1, chime2/enh1, reverb/enh1, and wsj0_2mix/tse1 by @Emrys365 in #5246
Fix bugs in mfa_format.py by @G-Thor in #5223
New features for SVS by @ftshijt in #5245
re-fix norm compatibility in scale discriminator by @kan-bayashi in #5249
add conv1d subsampling 3 and egs2/librispeech/asr2 wavlm_large_21 kmeans (1000/2000) results by @simpleoier in #5252
Revise the ESPnet-SE++ Joss paper to incorporate the feedback from the reviewer. by @neillu23 in #5212
Fix a bug in score script for ML-SUPERB by @ftshijt in #5254
Refactor prep_segments in SVS by @jerryuhoo in #5210
A minor fix for num_splits_ssl for training by @ftshijt in #5262
[SVS] add singing tacotron by @A-Quarter-Mile in #5233
Add script to use speaker averaged xvectors in TTS training by @G-Thor in #5244
Fix filling of waveform_buffer with samples for streaming inference by @espnetUser in #5267
Some name update for ml-superb by @ftshijt in #5276
Add support for K2 pruned transducer loss by @b-flo in #5268
Fix Transducer doc by @b-flo in #5306
Update installation.md by @kamo-naoyuki in #5291
Update install_nkf.sh by @sw005320 in #5300
Fix Cython version to pass the installation of libraries with Cython by @kan-bayashi in #5310
Update README.md by @sw005320 in #5315
Update setup.py by @sw005320 in #5316
Migrate recipe for nit_song070 from Muskit by @wwwbxy123 in #5251
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5294
A few updates for asr2 and hubert by @simpleoier in #5285
Add decode_options and hyp_cleaner in evaluate_whisper_inference by @pyf98 in #5272
update pyworld version by @kan-bayashi in #5319
fix a data preparation issue for librimix recipe. by @LiChenda in #5322
Update README.md in egs2/librimix/tse1 and egs2/wsj0_2mix/tse1 by @Emrys365 in #5289
fix the s3prl frontend gradient backprop bug, ensuring feature_grad_mult=1.0 by @simpleoier in #5297
ESPNet-SPK part 2 - training by @Jungjee in #5258
remove some tests in espnet1 integration test by @sw005320 in #5328
Fix random segments by @iamanigeeit in #5274
Skip CI for draft PR by @ftshijt in #5333
Update cancel.yml by @kan-bayashi in #5334
Update several SE recipes and bash scripts by @Emrys365 in #5327
Add PULL_REQUEST_TEMPLATE.md by @kan-bayashi in #5340
ESPnet-Spk part 3 - inference every epoch using EER by @Jungjee in #5314
Minimize espnet2 integration test by @kan-bayashi in #5324
PR Labels for CI control by @Fhrozen in #5320
Split ci into several jobs by @kan-bayashi in #5343
Update CONTRIBUTING.md by @sw005320 in #5335
Update Scoring for Speech Summarization from NLG-Eval to Huggingface Evaluate by @roshansh-cmu in #5341
Fix documentation skip CI by @Fhrozen in #5351
Update the usage by @sw005320 in #5349
Docker Update by @Fhrozen in #5321
Update installation.md by @sw005320 in #5348
Fix doc condition by @kan-bayashi in #5355
Update issue templates by @sw005320 in #5357
Update Contribution.md by @Fhrozen in #5352
Fix .mergify condition by @kan-bayashi in #5354
Reduce ffmpeg installation time in ci by @kan-bayashi in #5356
Update CI table by @kan-bayashi in #5359
Clean workflow files by @kan-bayashi in #5360
Couple of tweaks for asr2.sh for the HF hub upload by @akreal in #5362
Update TEMPLATE_HF_Readme.md (fix bash typo) by @akreal in #5361
Add discrete-token ASR for LibriSpeech 100h by @akreal in #5350
Whisper fine-tuning recipes for CHiME-4 and WSJ by @YoshikiMas in #5342
Fix bug in ngram training in slu.sh by @siddhu001 in #5364
Add musdb18 recipe for music source separation by @Emrys365 in #5338
Bugfix: JETS CTCLoss by @imdanboy in #5288
Check the value of n_shift == upsample_factor in GAN_TTS by @i...

@simpleoier

What's Changed

Update collect stats stage so that less memory cost in Utt_mvn by @simpleoier in #4888
Apply the latest black by @kamo-naoyuki in #4907
Add pytorch=1.13.1 to CI configuration by @kamo-naoyuki in #4906
How2 fix README, incorrect url by @roshansh-cmu in #4902
standardized inference and number of iterations for mSuperb single lang track by @DanBerrebbi in #4905
Fix typo in lrs/README.md by @eltociear in #4911
MSUPERB setting update by @ftshijt in #4913
Update test_import.yaml to install numba by @kamo-naoyuki in #4918
update pyopenjtalk version to 0.3.0 by @kan-bayashi in #4912
CHiME-7 Task1 recipe by @popcornell in #4894
Update CHiME-7 Task 1 README.md by @popcornell in #4920
Use native CPU version of STFT on newer pytorch versions, fix librosa window size < ftt by @bmilde in #4922
Add few shot subset for mSuperb multilingual setting by @guapaQAQ in #4923
Fix existing bugs in the TSE task by @Emrys365 in #4915
IAM OCR recipe updates by @kenzheng99 in #4927
Fixing some issues with chime7-task1 baseline by @popcornell in #4925
set default none decoder for ASR by @ftshijt in #4917
Update inference and training setting for mSuperb multilingual model by @guapaQAQ in #4932
Add E-Branchformer Transducer results by @pyf98 in #4933
add tf-gridnet by @zqwang7 in #4864
Fixes + Channel Selection for CHiME-7 Task by @popcornell in #4934
fix extracted feature dummy generation by @roshansh-cmu in #4926
Fix device mismatch error in GPU decoding with PyTorch 1.13 by @pyf98 in #4941
CHiME-7 DASR MD5 checksum fix for mixer6/train_call by @popcornell in #4942
Update show_asr_result.sh by @kamo-naoyuki in #4943
CHiME-7 DASR correct development results by @popcornell in #4946
Fix 'floordiv is deprecated' warnings by @fujimotos in #4945
Added WSLII installation instruction by @sw005320 in #4949
Update Muskits by @A-Quarter-Mile in #4931
Set a longer time execution threshold for related failed time-outs CI by @ftshijt in #4962
Modify data prep for mSUPERB multilingual by @guapaQAQ in #4965
Add E-Branchformer results in some recipes by @pyf98 in #4958
Add 'six' as a required Python module by @fujimotos in #4964
add msuperb linguistic analysis by @hhhaaahhhaa in #4938
Fix a 'ref_channel'-related issue in espnet2/bin/enh_inference.py by @Emrys365 in #4972
Add E-Branchformer results in slurp_entity by @pyf98 in #4971
Add Conformer and E-Branchformer results in fisher_spanish_callhome ASR by @pyf98 in #4976
[SVS] Add Joint-training by @A-Quarter-Mile in #4977
Update the chunk iterator for the TSE task by @Emrys365 in #4929
update msuperb LID scoring script by @hhhaaahhhaa in #4979
add multilingual+lid lid score generation by @hhhaaahhhaa in #4982
Add python=3.10 to CI by @kamo-naoyuki in #4627
LID score v2 by @hhhaaahhhaa in #4983
Fix ci by @kamo-naoyuki in #4985
Change to use Ubuntu-latest instead of Ubuntu-18.04 in CI by @kamo-naoyuki in #4986
Remove six by @kamo-naoyuki in #4988
Modify format_wav_scp.py to support PCM of uint8, int32, float32, float64, etc. by @kamo-naoyuki in #4997
Fix Whisper tokenizer CI error by @slSeanWU in #5004
fix s3prl upstream attribute bug by @jwrh in #5003
[Recipe] Add iwslt22 low resource speech translation task for egs2 by @freddy5566 in #4994
Fix typeguard version by @silvanocerza in #5009
Add .pre-commit-config.yaml by @kamo-naoyuki in #5011
Copy Kaldi utils/steps/sid and add a new github action to check the consistency by @kamo-naoyuki in #4998
Modfiy .pre-commit-config.yaml by @kamo-naoyuki in #5012
Modify .pre-commit-config.yaml by @kamo-naoyuki in #5014
Modify .pre-commit-config.yaml by @kamo-naoyuki in #5015
[Tuning] iwslt22 low-resource ST decode configuration tuning by @freddy5566 in #5019
Modify asr.sh by @kamo-naoyuki in #5020
[SVS] Improve visinger by @jerryuhoo in #5022
Use scripts/utils/print_args.sh instead of pyscripts/utils/print_args.py by @kamo-naoyuki in #5025
Add docstring in extra_path.sh by @kamo-naoyuki in #5028
Update installation.md by @kamo-naoyuki in #5029
Update README.md by @kamo-naoyuki in #5030
Update README.md by @kamo-naoyuki in #5031
Change bc to python by @kamo-naoyuki in #5032
Update tools/Makefile and path.sh by @kamo-naoyuki in #5027
Fix for format_wav_scp.py by @kamo-naoyuki in #5038
Add execute permission to install_ice_g2p.sh by @kamo-naoyuki in #5040
Bug fix of #5025 by @kamo-naoyuki in #5039
[pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #5041
Update README.md by @kamo-naoyuki in #5042
Update README.md by @kamo-naoyuki in #5043
Update README.md by @kamo-naoyuki in #5045
Fix in gen_task1_data.sh from CHiME7 by @boeddeker in #4953
Update README.md by @eml914 in #5044
Add installers/install_ffmpeg.sh by @kamo-naoyuki in #5046
Fix broken links reported by #5048 by @ShigekiKarita in #5050
fix: resolve upgrade issues with praatio 6.0; lock praatio version by @timmahrt in #4978
Add miniconda in gitignore by @pyf98 in #5052
CHiME-7 DASR fixes from participants feedback by @popcornell in #4999
Fix the condition for maxlen warning in beam search by @pyf98 in #5055
Fixed SQLalchemy version for MFA by @Fhrozen in #5059
Support Multi-Blank Transducer in Espnet2 by @jctian98 in #4876
Fix chime7 DASR task1 run.sh by @kamo-naoyuki in #5060
CHiME-7 DASR recipe, fix display bug for scenario-wide DER and JER by @popcornell in #5061
Add test_format_wav_scp_sh.bats by @kamo-naoyuki in #5062
Update documentation by @kamo-naoyuki in #5063
Support SOT training on LibriMix data. by @pengchengguo in #4861
Update check_install.py by @kamo-naoyuki in #5066
Tedlium3 recipe by @Some-random in #5068
Bug Fix: pretrained s3prl-frontend based models loaded with parameters key mismatch error by @simpleoier in #5074
Mechanism for multi channels input using multi columns wav.scp by @kamo-naoyuki in #5075
Clean ML-SUPERB by @ftshijt in #5067
CHiME-7 DASR: first diarization system based on Pyannote. by @popcornell in #5054
Chime7-task1 diarization (updated results) by @popcornell in #5088
Add InterCTC to E-Branchformer encoder, and the ability to save InterCTC inference output to files by @tjysdsg in #5084
[SVS] Bug fix: sample rate by @A-Quarter-Mile in https://github.com/espnet/espnet/pu...

Releases: espnet/espnet

ESPnet version 202511

Summary

Highlighted Pull Requests

Key Takeaways

What's Changed (Full changelog)

New Features

Recipe

Bugfix

Documentation

Refactoring

Others

Acknowledgements

Contributors

Uh oh!

ESPnet version 202509

Summary

Overview

Important Pull Requests

Full changelog

What's Changed

New Features

Enhancement

Recipe

Documentation

Refactoring

Others

Acknowledgements

Contributors

Uh oh!

ESPnet version 202506

New Features

Recipe

Bugfix

Documentation

Refactoring

Others

New Contributors

Acknowledgements

Contributors

Uh oh!

ESPnet version 202503

New Features

Enhancement

Recipe

Bugfix

Documentation

Others

Acknowledgements

Contributors

Uh oh!

ESPnet version 202412

New Features

Enhancement

Recipe

Bugfix

Documentation

Others

Acknowledgements

Contributors

Uh oh!

ESPnet version 202409

New Features

Enhancement

Recipe

Bugfix

Documentation

Refactoring

Others

Acknowledgements

Contributors

Uh oh!

ESPnet version 202402

News

New Features

Enhancement

Recipe

Bugfix

Documentation

Others