Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Releases: espnet/espnet

ESPnet version 202511

17 Nov 12:46
2d139b9

Choose a tag to compare

Summary

Release date: 2025‑11‑17
Milestone: 202511 – A version that brings a large number of new features, stability improvements, and a refreshed CI & Docker workflow.

ESPnet 202511 introduces robust parallel processing primitives, a fully‑refactored inference & evaluation pipeline, extensive SpeechLM support, and a modernized Docker/CI stack. The release also resolves lingering bugs in codec EMA logic, MPS device handling, and category‑balanced batching while tightening dependency management and documentation quality.


Highlighted Pull Requests

# Title Category Key Impact
6300 Bump js‑yaml from 4.1.0 to 4.1.1 in /doc/vuepress Dep‑Update Secures the documentation build against a prototype‑pollution CVE in yaml merge
6284 codec fix: DDP logic and dead code revival logic Bugfix Restores EMA state for dead‑code recovery and synchronizes codec updates across all DDP workers
6286 [SpeechLM] Deepspeed trainer New Feature Adds full DeepSpeed support (train.py + deepspeed_trainer.py) for large‑scale SpeechLM training
6279 [SpeechLM] model, preprocessor and collect_stats New Feature Core SpeechLM components – job templates, preprocessing, multimodal IO, and stats collection
6278 [SpeechLM] Deepspeed trainer New Feature See above – DeepSpeed integration for SpeechLM workflows
6276 Docker Updates Refactor Upgrades Ubuntu 24.04, CUDA 12.6, PyTorch 2.8.0, and transitions to Miniforge; modernizes Dockerfile syntax
6275 CI Installation fix Bugfix Adds --no-build-isolation for editable installs, improving reproducibility across CI environments
6273 [ESPnet‑Codec] Bug fix on codec activation function Bugfix Enables BF16 inference by registering torch.ones for auto‑cast
6272 Add Pytorch version 2.9 Dep‑Update Extends supported PyTorch releases (2.5.1, 2.7.1, 2.8.0, 2.9.0) in CI and docs
6263 [ESPnet‑3] Merge master into espnet3 branch Merge Syncs espnet3 with master, fixing CI and dependency mismatches
6260 SpeechLM Data Infra: dataset management New Feature Implements data registry, dataset loaders, and configuration templates for SpeechLM
6259 pre‑commit.ci autoupdate Tooling Updates black and isort to latest stable versions
6255 Fix default batch sampler fallback for category iterator Bugfix Restores legacy foldedcatbel mapping, improving backward compatibility
6253 Restrict Docker Github Actions to Original Repo Security Prevents accidental image publishing from forks or non‑master branches
6249 [espnet3‑7] Add Callbacks New Feature Adds AverageCheckpointsCallback and standard callback factory for Lightning trainers
6248 Get forced alignments from CTC model Feature Enables forced alignment extraction for any CTC‑based S2T model
6246 MPS Support for loading float64 models Bugfix Handles float‑64 to float‑32 conversion for MPS device, avoiding dtype errors
6244 LID‑7: VoxLingua107 recipe Recipe Adds a new spoken‑language‑identification recipe for VoxLingua107
6243 [espnet‑3] Merge master into espnet3 and fixed CI Merge Syncs espnet3 with master, removing underthesea dependency
6239 Upgrade pyopenjtalk to 0.4.1 Dep‑Update Updates pyopenjtalk installer to the latest version
6238 Add Pytorch version 2.9 Dep‑Update See 6272
6238 Package Build Patch Build Moves g2p_en & ctc‑segmentation installation to Makefile, fixing pip package build
6238 Docker Updates Refactor See 6276
6238 CI Installation fix Bugfix See 6275
6238 [ESPnet‑Codec] Bug fix on codec activation function Bugfix See 6273
6238 Add Pytorch version 2.9 Dep‑Update See 6272
6227 Terry/parallelize spk emb extraction Feature Parallel speaker‑embedding extraction for TTS recipes
6210 LID‑8: CI and unit tests Test Adds comprehensive unit tests for LID functionality
6178 [espnet3‑6] Add evaluation scripts Feature Modularizes inference & evaluation pipelines in espnet3
6179 [espnet3] ESPnet1 Support Sunset Refactor Removes legacy ESPnet1 support, consolidates to espnet2.legacy
6177 Merge master into espnet3 Merge Syncs espnet3 with master, fixing CI issues
6175 [espnet3‑5] Add parallel module and collect_stats Feature Adds Dask‑based parallel processing and collect_stats for data stats collection
6174 LID‑7: VoxLingua107 recipe Recipe See 6244
6173 LID‑8: CI and unit tests Test See 6210
6172 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6171 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6170 LID‑8: CI and unit tests Test See 6210
6168 [espnet3‑5] Add parallel module and collect_stats Feature See 6175
6165 LID‑8: CI and unit tests Test See 6210
6164 LID‑8: CI and unit tests Test See 6210
6163 LID‑8: CI and unit tests Test See 6210
6162 LID‑8: CI and unit tests Test See 6210
6161 LID‑8: CI and unit tests Test See 6210
6160 LID‑8: CI and unit tests Test See 6210
6159 LID‑7: VoxLingua107 recipe Recipe See 6244
6158 LID‑7: VoxLingua107 recipe Recipe See 6244
6157 LID‑7: VoxLingua107 recipe Recipe See 6244
6156 LID‑7: VoxLingua107 recipe Recipe See 6244
6155 LID‑7: VoxLingua107 recipe Recipe See 6244
6154 LID‑7: VoxLingua107 recipe Recipe See 6244

Note: The table above summarizes the most impactful PRs for this release. Several PRs are grouped by shared functionality (e.g., SpeechLM, Docker, and LID). Contributors for these changes include dependabot[bot], whr-a, chinjouli, jctian98, Fhrozen, Masao‑Someki, KanTakahiro, akreal, pre‑commit‑ci[bot], Qingzheng‑Wang, Shikhar‑S, SanderGi, sw005320, and ZhuoyanTao.


Key Takeaways

  • Parallelism & Scalability – Dask‑based espnet3.parallel, collect_stats, and new callbacks enable efficient distributed training, inference, and checkpoint ensembling.
  • SpeechLM Maturity – Core modules, DeepSpeed integration, multimodal IO, and data infrastructure create a solid foundation for large‑scale speech‑language models.
  • Stability & Security – Updated dependencies (js‑yaml, PyTorch, CUDA), Docker 12.6, and Miniforge; bugfixes for codec EMA, MPS device handling, and category sampling.
  • CI & Packaging – Modernized GitHub Actions, improved pip install flags, and new Docker images for Ubuntu 24.

What's Changed (Full changelog)

New Features

Recipe

Bugfix

Documentation

Refactoring

  • [espnet3] ESPnet1 Support Sunset and Migration to espnet2.legacy (See #6179, by @Masao-Someki)

Others

Acknowledgements

@Fhrozen, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @SanderGi, @Shikhar-S, @ZhuoyanTao, @akreal, @chinjouli, @dependabot[bot], @jctian98, @sw005320, @whr-a.

ESPnet version 202509

13 Sep 09:11
eaf9a83

Choose a tag to compare

Summary

The 202509 release strengthens ESPnet’s foundation on modern OS and Python environments, enhances training flexibility, and completes the LID ecosystem. With a broad suite of new recipes, we continue to support emerging speech‑related benchmarks and models while ensuring CI stability and developer productivity.

Overview

The 202509 release brings a major shift in our infrastructure and tooling. Key highlights include:

Area Change
Python / Dependencies Dropped Python 3.7/3.8, upgraded to Python 3.9–3.13; numpy bumped to ≥ 2.2.0; removed Chainer‑related build steps.
OS Support Ended Debian 11 support, switched CI to Debian 12 containers (ensuring GCC 11+ compatibility).
Warp‑Transducer Adopted ljn7/warp-transducer (FastEmit, modern CUDA/CMake).
LID Completed the LID subsystem (model, loss, pooling, balanced sampler, tri‑stage scheduler, inference tools).
Training Added HybridOptim/HybridLRS for multi‑optimizer and scheduler configurations.
Recipes New recipes for LongLibriHeavy, Qwen2‑Audio‑7B‑Chat, OWSM v4, Galaxy AVSR, and a LID template.
CI / DevOps Updated Docker publishing, added automerge workflow, fixed GPU flag handling, and guard against conflicting speaker options.

The release is backed by 10 core contributors, each driving critical modules or infrastructure changes.


Important Pull Requests

# Category Title Key Impact
6228 Deprecation EOL of Debian 11 support in favor of Debian 12 CI now runs on Debian 12; badges and docs updated; eliminates GCC‑10 GLIBCXX limitation.
6226 Bug‑Fix Fix GPU flag handling in tts.sh script Prevents Versa from erroneously enabling GPU when gpu_inference is False.
6221 Dependency Update numpy version Upgrades numpy ≥ 2.2.0, raises Python 3.9 minimum, removes Chainer artifacts.
6220 Refactor Remove old speechlm module Cleaned out obsolete SpeechLM code.
6187 Core Switch warp‑transducer to ljn7 fork Adds FastEmit support, broader CUDA/CMake compatibility, improved build scripts.
6159 Core LID‑5: Tri‑stage learning rate scheduler Stabilizes training with warm‑up / hold / decay phases.
6158 Core LID‑4: Category‑ and dataset‑aware balanced sampler Addresses language and dataset imbalance via power‑law sampling.
6156 Feature LID‑2: Model, loss and pooling modules Introduces language‑identification models, custom losses, and pooling strategies.
6208 Bug‑Fix Guard against having both use_sid and use_spk_embed set to true Prevents conflicting speaker‑ID/embedding settings.
6206 DevOps Add environment tag to publish docker image Clarifies Docker publishing workflow.
6202 DevOps Add Automerge Action Enables automated PR merging with label and review checks.
6205 Recipe LongLibriHeavy benchmark Provides long‑form speech evaluation baseline.
6194 Recipe Add recipe for Qwen2‑Audio‑7B‑Chat Baseline for Dynamic‑SUPERB ASR task.
6176 Recipe OWSM v4 Recipe Adds OWSM v4 training/configuration.
6160 Recipe LID‑6: LID recipe template Offers ready‑to‑use LID experiment scaffold.
6173 Feature [espnet3‑4] Add support for multiple optimizers and schedulers Unified handling of multiple optimizers/schedulers (HybridOptim/HybridLRS).
6172 Feature Additional integration for multi‑optimizer support (see linked PRs) Supports advanced training strategies.
6132 Recipe AVSR recipe for Galaxy Dataset Adds AVSR training capability for Galaxy.

Full changelog

What's Changed

New Features

Enhancement

Recipe

Documentation

  • EOL of debian 11 support in favor of debian 12 (See #6228, by @Fhrozen)

Refactoring

Others

Acknowledgements

@Fhrozen, @Masao-Someki, @Miamoto, @Qingzheng-Wang, @YJCX330, @ZhuoyanTao, @cyhuang-tw, @jctian98, @ljn7, @pyf98.

Happy coding! 🚀

ESPnet version 202506

11 Aug 20:33
7d395a8

Choose a tag to compare

New Features

  • [New Features][ESPnet2][ESPnet3][CI][size:XXL][lgtm] [espnet3-3] Add trainer and model #6172 by @Masao-Someki
  • [New Features][ESPnet3][CI][size:XXL][lgtm] [espnet3-1] Add Data Organizer #6167 by @Masao-Someki
  • [New Features][ESPnet2][size:XL] LID-1: Training and task setup #6155 by @Qingzheng-Wang
  • [New Features][ESPnet2][SID][size:XL] Update SPK recipe for CN-celeb #6154 by @holvan
  • [New Features][ESPnet2][SLU] Add code for training turn taking prediction model #5948 by @siddhu001

Recipe

  • [Recipe][ESPnet2][size:XXL] S2T Recipe for IPAPack++: Data Preparation #6169 by @chinjouli
  • [Recipe][ESPnet2][size:XL] S2T Recipe for IPAPack++: main recipe #6168 by @chinjouli
  • [Recipe][ESPnet2][Codec] add: complete codec1 recipe for AudioSet and musdb18 #6068 by @whr-a
  • [Recipe][ESPnet2][ASR] Additional results for the discrete ASR challenge #6067 by @juice500ml
  • [Recipe][ESPnet2][Installation][SE] Add implementations of USES2 speech enhancement models #5761 by @Emrys365

Bugfix

  • [Bugfix][ESPnet2][size:XS] Fix FutureWarning torch.cuda.amp.autocast(args...) is deprecated #6190 by @KanTakahiro
  • [Bugfix][ESPnet2][ESPnet1] Resolve logger warnings #6117 by @emmanuel-ferdman
  • [Bugfix][ESPnet2] Fix for issue #6112 Lagacy torch tensor constructor causes issue when… #6114 by @advaitvd

Documentation

  • [Documentation][ESPnet1][size:S] docs: clarify CBHG encoder vs post‑net roles in Tacotron 1 #6188 by @ZhuoyanTao
  • [Documentation][ESPnet3][Docker][CI][size:L] Add devcontainer change from Espnet3 #6145 by @sw005320
  • [Documentation][CI][size:M] Update PULL_REQUEST_TEMPLATE.md #6144 by @sw005320
  • [Documentation][CI][size:M] Update document to add tutorials + more easy connection to installation #6143 by @juice500ml
  • [Documentation][ESPnet3][Docker][size:L][lgtm] Espnet3/devcontainer #6141 by @Masao-Someki
  • [Documentation][Installation] Update Makefile #6124 by @sw005320

Refactoring

  • [Refactoring][ESPnet2][size:L] Refactor ACESinger's audio segmentation #6151 by @Arllan-lanliu
  • [Refactoring][ESPnet2][ESPnet1][CI][size:L][lgtm] Flake8 CI Fixes #6140 by @Fhrozen

Others

  • [Others][CI][size:S][lgtm] Workaround for shellcheck v0.11.0 #6197 by @Masao-Someki
  • [Others][Installation][size:XS] Update transformers installation #6191 by @Fhrozen
  • [Others][ESPnet3][CI][size:L] [espnet3-2] Add Config Loading script #6171 by @Masao-Someki
  • [Others][ESPnet2][ESPnet1][ESPnetEZ][Installation][size:L] [espnet3] Format files #6164 by @Masao-Someki
  • [Others][ESPnet2][SE] Update BSRNN implementations to support more flexible band-split schemes #6123 by @Emrys365
  • [Others][ESPnet2][Music] [SVS1] SingingGenerate and VISinger Inference Fix #6113 by @HANJionghao
  • [Others][CI] FIX CI test_import #6111 by @Fhrozen
  • [Others][ESPnet2] [Recipe] Create inference recipe for non-native English ASR benchmark (ALLSSTAR) #6110 by @chenehk
  • [Others][Docker][Installation][CI] Torch Version Update #6095 by @Fhrozen
  • [Others][ESPnet2][ASR] Add explicit typecheck for warning msg #6082 by @ftshijt
  • [Others][ESPnet2][ESPnet1][SSL][size:XL] SSL Fine-tuning PR #6069 by @wanchichen

New Contributors

Acknowledgements

Special thanks to @Arllan-lanliu, @Emrys365, @Fhrozen, @HANJionghao, @KanTakahiro, @Masao-Someki, @Qingzheng-Wang, @ZhuoyanTao, @advaitvd, @chenehk, @chinjouli, @emmanuel-ferdman, @ftshijt, @holvan, @juice500ml, @siddhu001, @sw005320, @wanchichen, @whr-a.

Full Changelog: v.202503...v.202506

ESPnet version 202503

27 Mar 15:18
92f6cbc

Choose a tag to compare

New Features

  • [New Features][ESPnet2] Add Hugging Face Front End #5913 by @taiqihe

Enhancement

  • [Enhancement][ESPnet2][ESPnet1][OWSM] Improving efficiency of large-scale training #6024 by @pyf98
  • [Enhancement][ESPnet2][Codec] Update scoring config to support WER/CER information with VERSA #6001 by @ftshijt
  • [Enhancement][ESPnet1] Add Scaled Dot Product Attention (SDPA) from PyTorch #5994 by @pyf98
  • [Enhancement][ESPnet2][ESPnet1][Installation] Support PyTorch Lightning Trainer in ESPnet2 #5954 by @pyf98

Recipe

  • [Recipe][ESPnet2][ASR] cmu_kids #6017 by @wangpuup
  • [Recipe][ESPnet2][ASR] EDACC dataset automatic speech recognition #5996 by @uwanny
  • [Recipe][ESPnet2][ASR] ml-superb 2024 recipe #5989 by @wanchichen
  • [Recipe][ESPnet2] Clotho_v2 Audio Captioning (DCASE 2023 implementation) #5967 by @Shikhar-S

Bugfix

  • [Bugfix][Installation] Downgrade Transformers version #6071 by @Fhrozen
  • [Bugfix][ESPnet2] Docs Fix #6065 by @Fhrozen
  • [Bugfix][ESPnet2][ST] A quick fix for type error when dealing with multi-decoder (ST) #6064 by @ftshijt
  • [Bugfix][ESPnet2][SID] fixed few typos on egs2/spk template #6060 by @yigitcatak
  • [Bugfix][ESPnet2] Bugfix #6057 #6058 by @Masao-Someki
  • [Bugfix][ESPnet2][SID] fix some minor errors in SID recipe #6045 by @shimhz
  • [Bugfix][ESPnet2] Fix the deprecated amp interface #6036 by @ftshijt
  • [Bugfix][ESPnet2] Add explicit weights_only=False for checkpoint loading #6035 by @ftshijt
  • [Bugfix][Installation] Fix boost URL #6034 by @sw005320
  • [Bugfix][Installation] Fix minor bug in Makefile #6031 by @juice500ml
  • [Bugfix][ESPnet2] Logging bugfix, skip import #6023 by @Shikhar-S
  • [Bugfix][ESPnet2][OWSM] Fix minor bug in OWSM-CTC preprocessor #6005 by @pyf98
  • [Bugfix][ESPnet2][ASR] Minor formatting fixes in mlsuperb 2 recipe #6003 by @wanchichen

Documentation

  • [Documentation][ESPnet2][CI] [Doc] Update parser on lightning_train #6020 by @Fhrozen

Others

  • [Others][Installation] Transformers version check #6076 by @Fhrozen
  • [Others][ESPnet2][ESPnet1] New SSL Recipe #6053 by @wanchichen
  • [Others][Installation] Update tools/README.md #6030 by @popcornell
  • [Others][ESPnet2][OWSM] doc: update OWSM data preparation instructions #6026 by @kalvinchang
  • [Others][ESPnet2][OWSM] fix: OWSM v3.1 - remove flash attention args #6025 by @kalvinchang
  • [Others][ESPnet2][SED] BEATs Tokenizer Inference #6008 by @Shikhar-S
  • [Others][ESPnet2][ESPnet1] Implement unified batch decode interface for OWSM-CTC #6007 by @pyf98
  • [Others][ESPnet2][TTS] [feature]finish versa eval in TTS recipe #6002 by @Whale-Dolphin
  • [Others][ESPnet2][ESPnet1][Installation][CI][SED] Classification Task and AudioSet-20K #5998 by @Shikhar-S
  • [Others][ESPnet2][ESPnet1][Installation][CI] remove gtn in setup.py #5982 by @sw005320
  • [Others][ESPnet2][ESPnet1][SED] ESC-50 classification with BEATs #5977 by @Shikhar-S
  • [Others][ESPnet2][TTS][ASR][SLU] Spoken dialogue systems demo recipe #5975 by @siddhu001
  • [Others][ESPnet2][SE] fix: gradient truncation bug in pit_solver.py #5974 by @YuzhuWang-code

Acknowledgements

Special thanks to @Fhrozen, @Masao-Someki, @Shikhar-S, @Whale-Dolphin, @YuzhuWang-code, @ftshijt, @juice500ml, @kalvinchang, @popcornell, @pyf98, @shimhz, @siddhu001, @sw005320, @taiqihe, @uwanny, @wanchichen, @wangpuup, @yigitcatak.

ESPnet version 202412

04 Dec 04:40
cccc290

Choose a tag to compare

New Features

  • [New Features][ESPnet2][Codec] Add HiFiCodec model #5898 by @RayYuki

Enhancement

Recipe

  • [Recipe][ESPnet2][ASR] My Science Tutor (MyST) Children's Conversational Speech Corpus #5964 by @eric102004
  • [Recipe][ESPnet2] Feature/improve is24 asr2 #5938 by @juice500ml
  • [Recipe][ESPnet2][ASR] Add asr1 recipe for libriheavy_small #5932 by @Miamoto
  • [Recipe][ESPnet2][SID] Add RATS dataset for SV task #5840 by @shimhz

Bugfix

Documentation

Others

Acknowledgements

Special thanks to @Masao-Someki, @Miamoto, @RayYuki, @Trikaldarshi, @anyuyay, @emmanuel-ferdman, @eric102004, @ftshijt, @juice500ml, @kalvinchang, @pyf98, @shimhz, @siddhu001, @wanchichen, @yoshipon.

ESPnet version 202409

01 Oct 06:28
6bae9d2

Choose a tag to compare

New Features

  • [New Features][ESPnet2][TTS][Codec] Support Codec feature for TTS2 task #5857 by @wyh2000
  • [New Features][ESPnet2][Codec] Codec downstream task support: TTS #5763 by @jctian98
  • [New Features][ESPnet2][Codec] Add Encodec features for Codec toolkit #5758 by @jctian98
  • [New Features][ESPnet2][Installation][TTS] Add evaluation scripts with DiscreteSpeechMetrics. #5661 by @Takaaki-Saeki
  • [New Features][ESPnet2][ASR] Integrate adapter for s3prl frontend #5609 by @Stanwang1210
  • [New Features][ESPnet2][CI][OWSM] Support external dataset library for ESPnetEasy #5584 by @Masao-Someki
  • [New Features][ESPnet2][CI][LM] Pr voxtlm #5472 by @soumimaiti

Enhancement

  • [Enhancement][ESPnet2][SLM] MT Task in SpeechLM #5899 by @ftshijt
  • [Enhancement][ESPnet2][Codec] Categorical Balnced Chunk iterator #5894 by @ftshijt
  • [Enhancement][ESPnet2][ESPnet1] TransformerDecoder forward_one_step with memory_mask #5679 by @albertz
  • [Enhancement][ESPnet2] Update espnet_model.py #5646 by @shen9712

Recipe

  • [Recipe][ESPnet2][Music] Fixed KiSing Data Preparation #5895 by @HANJionghao
  • [Recipe][ESPnet2][ASR] CORAAL asr1 recipe #5882 by @kalvinchang
  • [Recipe][ESPnet2][ASR] ml_superb asr2 recipe #5866 by @Stanwang1210
  • [Recipe][ESPnet2] Add more download links for ML-SUPERB #5863 by @ftshijt
  • [Recipe][ESPnet2][ASR] Fix bug in asr2.sh #5859 by @juice500ml
  • [Recipe][ESPnet2][Music] fix bugs in SVS1 #5851 by @South-Twilight
  • [Recipe][ESPnet2][TTS] New Recipe of tts2+aishell3 #5849 by @Tsukasane
  • [Recipe][ESPnet2][ASR] Espnet Multi-convformer implementation #5832 by @Darshan7575
  • [Recipe][ESPnet2][SE] Update of SE functions #5825 by @Emrys365
  • [Recipe][ESPnet2] SPRING-INX Recipe (Speech Lab, IIT, Madras) #5811 by @arjun-gangwar
  • [Recipe][ESPnet2][TTS] Adding Hifitts recipe for espnet #5784 by @coding-phoenix-12
  • [Recipe][ESPnet2][ASR] Updated results for CHiME-8 DASR baseline with new notsofar1 dev set #5771 by @popcornell
  • [Recipe][ESPnet2][SE] Final model scores for TF-GridNetV2 on the Kinect-WSJ dataset #5754 by @atharva253
  • [Recipe][ESPnet2] Apply normalization on validation set for CHiME-8 recipe #5749 by @popcornell
  • [Recipe][ESPnet2][Need review][Codec] ESPnet-Codec decoding and Scoring #5747 by @ftshijt
  • [Recipe][ESPnet2][CI][ST] Add recipe for IWSLT 2024 shared task Indic track #5744 by @cromz22
  • [Recipe][ESPnet2][Music] [SVS] VISinger Plus #5741 by @jerryuhoo
  • [Recipe][ESPnet2][Need review][Codec] ESPnet-codec Training and Setup #5732 by @ftshijt
  • [Recipe][ESPnet2][ASR] ESPnet Recipe for ASR on the Makerere Radio Speech Corpus #5730 by @satvik-dixit
  • [Recipe][ESPnet2][SE] ESPnet recipe for the Kinect-WSJ dataset #5711 by @atharva253
  • [Recipe][ESPnet2][TTS][ASR][Music] Update bitrate calculation scripts for the IS24 discrete speech challenge #5677 by @ftshijt
  • [Recipe][ESPnet2][ASR] Add some documents for JTubeSpeech #5663 by @sw005320
  • [Recipe][ESPnet2][SID] ESPnet-SPK: add SdSV 2021 recipe #5659 by @Alexgichamba
  • [Recipe][ESPnet2][ASR] Add E-Branchformer model for FLEURS #5657 by @wanchichen
  • [Recipe][ESPnet2][Installation][CI][ASR] CHiME-8 DASR recipe based on CHiME-7 DASR baseline #5641 by @popcornell
  • [Recipe][ESPnet2][ASR] add interspeech2024_dsu_challenge/asr2 #5627 by @simpleoier
  • [Recipe][ESPnet2][Installation][TTS] Discrete token-based TTS implementation #5626 by @ftshijt

Bugfix

  • [Bugfix] fix: replace ellipses (...) in ESPnet-EZ Trainer documentation #5911 by @kalvinchang
  • [Bugfix] Bugfix/homepage #5885 by @Masao-Someki
  • [Bugfix][ESPnet2] Fix absolute paths in aishell3_tts2 #5884 by @Tsukasane
  • [Bugfix] Bug fix for source link #5883 by @Masao-Someki
  • [Bugfix][Installation] [CI] Add required file for g2p_en #5869 by @Fhrozen
  • [Bugfix][ESPnet2] A fix to newer torch version (compatible to old version with typecheck) #5830 by @ftshijt
  • [Bugfix][ESPnet2] Revert change to abs_task to keep the consistency behavior #5789 by @ftshijt
  • [Bugfix][ESPnet2] Fix Whisper frontend #5760 by @siddhu001
  • [Bugfix][ESPnet2][SE] Update TSE recipe egs2/librimix/tse1 #5731 by @Emrys365
  • [Bugfix][ESPnet2] Fix LoRA issues when saving all parameters. #5722 by @simpleoier
  • [Bugfix][ESPnet2] Fix tts packing with new spk embedding #5715 by @ftshijt
  • [Bugfix][ESPnet2][TTS] Fix stage references in generated run.sh in TTS recipes #5714 by @G-Thor
  • [Bugfix][ESPnet2][OWSM] fix a small issue in OWSM decode_long #5703 by @jctian98
  • [Bugfix][ESPnet2][Installation] Upgrade typeguard #5702 by @sw005320
  • [Bugfix][ESPnet2] Quick fix to calculation of bitrate #5692 by @ftshijt
  • [Bugfix][ESPnet2][SSUM] Fix typo in summarization scoring #5688 by @YoshikiMas
  • [Bugfix][ESPnet2] Update egs2/TEMPLATE/asr2/asr2.sh #5682 by @simpleoier
  • [Bugfix][ESPnet2][ASR] Fix over-lengthy audio in ml_superb data prep #5678 by @ftshijt
  • [Bugfix][ESPnet2] fix typo #5673 by @hiranoyu0830
  • [Bugfix][Installation][ST] Fix CI Multilingual ST test #5672 by @Fhrozen
  • [Bugfix][ESPnet2][SLU] Fix speed perturbation when not using transcript in slu.sh #5671 by @siddhu001
  • [Bugfix][ESPnet2][SLU] Fix loading pre-trained model from transformers #5668 by @siddhu001
  • [Bugfix][ESPnet2] Correct the argument errors in the whisper tokenizer language. #5666 by @pengchengguo

Documentation

  • [Documentation][ESPnet2][Music] Fixed SingingGenerate docstring examples #5889 by @HANJionghao
  • [Documentation][ESPnet2][CI] Separate packing and uploading stages #5752 by @cromz22
  • [Documentation] Add script to make release note from milestone #5653 by @kan-bayashi

Refactoring

Others

  • [Others][CI] Bugfix for the paper publish workflow #5909 by @juice500ml
  • [Others][ESPnet2] Revision on Speechlm vocabulary extension script #5906 by @jctian98
  • [Others][ESPnet2][TTS] Fix tts.sh path in aishell3 tts2 #5879 by @sw005320
  • [Others][ESPnet2][Installation] Add DeepSpeed trainer for large-scale training #5856 by @jctian98
  • [Others] Update README info #5852 by @ftshijt
  • [Others][ESPnet2][ESPnet1][Installation] Add flash-attn #5839 by @wanchichen
  • [Others][ESPnet2][Music] [SVS] fix VISinger2 typecheck error #5838 by @jerryuhoo
  • [Others][ESPnet2] Fixed kising/acesinger google drive download #5834 by @HANJionghao
  • [Others][ESPnet2][SID] update MFA-Conformer performance after fixing the bug in #5797 #5826 by @Jungjee
  • [Others][ESPnet2][CI][SE] SE function updates: new models and support for handling various sampling frequencies #5800 by @Emrys365
  • [Others][ESPnet2][SID] fix spk mfa-conformer forwarding #5797 by @series2
  • [Others][ESPnet2][CI][Music] [SVS] Add CI tests for VISinger Plus #5786 by @jerryuhoo
  • [Others][ESPnet2][LM] Bug fix for VoxtLM v1 recipe #5782 by @cromz22
  • [Others][ESPnet2][ESPnet1] Added partially auto-regressive decoding #5769 by @Masao-Someki
  • [Others][Installation][CI] Fix minor issue in anaconda downloading #5753 by @ftshijt
  • [Others] [pre-commit.ci] pre-commit autoupdate #5738 by @pre-commit-ci[bot]
  • [Others][ESPnet2][Installation][CI] Upgrade typeguard [Subst.] #5724 by @Fhrozen
  • [Others][ESPnet2][SE] TF-GridNet training recipe for DNS Interspeech 2020 dataset #5710 by @nateanl
  • [Others][ESPnet2][LM] Adding transformer_opt #5709 by @soumimaiti
  • [Others][ESPnet2] Add Readme for Voxtlm #5693 by @wyh2000
  • [Others][ESPnet2][SID] ESPnet-SPK: add ASVspoof19 SASV recipe #5687 by @Alexgichamba

Acknowledgements

Special thanks to @Alexgichamba, @Darshan7575, @Emrys365, @Fhrozen, @G-Thor, @HANJionghao, @Jungjee, @Masao-Someki, @South-Twilight, @Stanwang1210, @Takaaki-Saeki, @Tsukasane, @YoshikiMas, @albertz, @arjun-gangwar, @atharva253, @coding-phoenix-12, @cromz22, @ftshijt, @hiranoyu0830, @jctian98, @jerryuhoo, @juice500ml, @kalvinchang, @kan-bayashi, @nateanl, @pengchengguo, @popcornell, @pre-commit-ci[bot], @satvik-dixit, @series2, @shen9712, @siddhu001, @simpleoier, @soumimaiti, @sw005320, @wanchichen, @wyh2000.

ESPnet version 202402

06 Feb 03:28
6ddbdf3

Choose a tag to compare

News

We're thrilled to announce that our latest update brings two groundbreaking features to our project: espnetez and ESPnet-SPK!

New Features

  • [New Features][ESPnet2][ESPnet1][Installation][SE] Add diffusion-base SE model to ESPnet-SE #5572 by @LiChenda
  • [New Features][ESPnet2][ESPnet1][CI][ASR] Add Bayes Risk CTC (reworked) #5519 by @jctian98
  • [New Features][ESPnet2][TTS] TTS evaluation script and monitoring functionality using MOS prediction model #5485 by @Takaaki-Saeki
  • [New Features][ESPnet2][SE] Add USES model for speech enhancement in diverse conditions #5482 by @Emrys365
  • [New Features][ESPnet2][CI][SID] ESPnet-SPk: major update #5408 by @Jungjee
  • [New Features][ESPnet2][TTS][ASR] Add espnetez #5372 by @Masao-Someki

Enhancement

  • [Enhancement][ESPnet2][OWSM] Improving OWSM inference interface #5618 by @pyf98
  • [Enhancement][ESPnet2][OWSM] Add OWSM v3.1 #5611 by @pyf98
  • [Enhancement][ESPnet2][CI] ESPnet-SPK: Additional models, supplement readme #5559 by @Jungjee
  • [Enhancement][ESPnet2][CI][SE] Add PyTorch & GPU support for DNSMOS calculation #5548 by @Emrys365
  • [Enhancement][ESPnet2][TTS][SID] Speaker embedding extractor (with ESPnet pre-trained speaker model) #5579 by @ftshijt

Recipe

  • [Recipe][ESPnet2][Music] Fix relative setting of train-dev-test #5623 by @ftshijt
  • [Recipe][ESPnet2][SID] ESPnet-SPK: add Voxblink recipe #5583 by @Jungjee
  • [Recipe][ESPnet2][SID] ESPnet-SPK: Model upload and result generation #5558 by @Jungjee
  • [Recipe][ESPnet2][Music] ACE singer recipe fixing #5551 by @ftshijt
  • [Recipe][ESPnet2][TTS] TTS2 Template #5541 by @ftshijt
  • [Recipe][ESPnet2][ASR] fix kaldi dependency in asr2 #5540 by @ftshijt
  • [Recipe][ESPnet2][CI][S2ST] CI test for s2st #5526 by @ftshijt
  • [Recipe][ESPnet2][ASR] Added data.sh to SPRING-INX IITM Recipe #5522 by @arjun-gangwar
  • [Recipe][ESPnet2][ASR] Add Libriheavy small and medium ASR2 recipes #5512 by @akreal
  • [Recipe][ESPnet2][ASR] SPRING-INX IITM RECIPE #5505 by @arjun-gangwar
  • [Recipe][ESPnet2][ASR][RNNT] Add transducer conformer configuration to commonvoice recipe #5503 by @zuazo
  • [Recipe][ESPnet2][ESPnet1] add centralized data preparation for OWSM #5478 by @jctian98
  • [Recipe][ESPnet1] Added clean speech results #5649 by @linan2
  • [Recipe][ESPnet2][Installation][AV] AVSR recipe for Easycom Dataset #5630 by @ms-dot-k
  • [Recipe][ESPnet2] Update CHiME-7 ASR1 recipe #5555 by @popcornell
  • [Recipe][ESPnet2] Add E-Branchformer model checkpoint in OWSM v2 #5517 by @pyf98
  • [Recipe][ESPnet2][SLU] Slue PR configs #5087 by @siddhu001

Bugfix

Documentation

  • [Documentation][ESPnet2] Add instructions for finetuning owsm #5539 by @pyf98
  • [Documentation] Updated the reference of the accepted JOSS paper #5515 by @neillu23

Others

  • [Others] Update Discord Invitation Link #5578 by @Fhrozen
  • [Others][ESPnet2][CI] Improve error robustness of unit tests #5523 by @Emrys365

Acknowledgements

Special thanks to @Emrys365, @Fhrozen, @Jungjee, @LiChenda, @Masao-Someki, @Takaaki-Saeki, @VicentCano, @akreal, @albertz, @arjun-gangwar, @brianyan918, @ftshijt, @jasonmusespresso, @jctian98, @juice500ml, @linan2, @ms-dot-k, @neillu23, @popcornell, @pyf98, @siddhu001, @sw005320, @takenori-y, @tjysdsg, @zuazo.

ESPnet version 202310

25 Oct 11:52
76b318e

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v.202308...v.202310

ESPnet version 202308

03 Aug 13:36
01d7df7

Choose a tag to compare

What's Changed

Read more

ESPnet version 202304

01 May 12:53
2219358

Choose a tag to compare

What's Changed

Read more