Codestin Search App

simpleoier · 2023-07-21T01:14:42Z

ST2 recipe

A combination tasks of ASR + ST + MT, which could use speech discrete tokens and text transcriptions together.
(More details will be filled)

utt2category in numel_sampler: to support the same task data in a single-batch.
new st2 template
covost2 recipe example
mini_an4/st2 test

ASR2 recipe

Add ASR2 recipe for CoVoST2 data.

Misc.

tqdm progress bar in pyscripts/feats/ssl_feature_utils.py and pyscripts/feats/dump_km_labels.py
limit number of GPU jobs in recipes: librispeech/asr2, covost2/asr2, covost2/st2.
update readme for recipe creation (thanks to @sw005320 )
update readme for asr2 tips
show one line of pseudo_labels example.
add data filtering in stage6 of asr2 / st2.
add support for extracted features in kmeans, making it expand to other feature types easily.
--speech_feats_type extracted

ftshijt

The main implementation looks good. Only one concern in the potential confusion of "src" in ST task.

In ST task, we usually would do multi-task, while predicting source language transcript, where we refer as "src". It is different from the input speech. Could you please factor out that part if possible instead of reusing? Cause it would potentially bring a lot of confusions between the st1 and st2 implementation.

egs2/TEMPLATE/st2/setup.sh

egs2/TEMPLATE/st2/st2.sh

egs2/covost2/asr2/README.md

codecov · 2023-07-22T04:24:12Z

Codecov Report

❌ Patch coverage is 78.57143% with 33 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.14%. Comparing base (92207e2) to head (dde2f85).
⚠️ Report is 5449 commits behind head on master.

Files with missing lines	Patch %	Lines
espnet2/samplers/num_elements_batch_sampler.py	77.55%	11 Missing ⚠️
espnet2/tasks/mt.py	52.38%	10 Missing ⚠️
espnet2/bin/mt_inference.py	72.41%	8 Missing ⚠️
espnet2/asr/discrete_asr_espnet_model.py	94.00%	3 Missing ⚠️
espnet2/train/preprocessor.py	80.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5318      +/-   ##
==========================================
- Coverage   77.17%   77.14%   -0.03%     
==========================================
  Files         684      684              
  Lines       62643    62735      +92     
==========================================
+ Hits        48345    48399      +54     
- Misses      14298    14336      +38

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`65.73% <ø> (-0.03%)`	⬇️
test_integration_espnet2	`49.10% <45.45%> (+<0.01%)`	⬆️
test_python_espnet1	`19.85% <0.00%> (-0.11%)`	⬇️
test_python_espnet2	`52.26% <51.94%> (-0.03%)`	⬇️
test_utils	`23.10% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

mergify · 2023-07-22T14:29:58Z

This pull request is now in conflict :(

sw005320 · 2023-07-22T19:33:27Z

egs2/TEMPLATE/asr1/pyscripts/feats/dump_km_label.py

Thanks!
What does it look like?
Could you paste an example?

It is like the tqdm progress bar, with percentage progress, current batch / total_batch, time_so_far / est_time_to_finish, time_per_batch.

0%| | 1/1577 [00:20<8:49:13, 20.15s/it]

sw005320 · 2023-07-22T19:35:22Z

egs2/TEMPLATE/asr2/README.md

+8. cp ../../librispeech/asr2/conf/tuning/train_discrete_asr_e_branchformer1.yaml conf/ # copy training conf
+9. cp ../../librispeech/asr2/conf/decode_ctc0.3.yaml conf/     # copy confs
+10. EDIT run.sh by checking ../asr1/run.sh
+  a. We may skip an LM


Suggested change

a. We may skip an LM

a. We may skip an LM by adding an option `--use_lm false`

sw005320 · 2023-07-22T19:36:31Z

egs2/TEMPLATE/asr2/README.md

+   * SSL model choice can affect the performance a lot, e.g. wavlm models may not work for non-English data,
+   * Layer selection is also important: different layers retain different information. For example, based the training criterion, the 24-th layer of HuBERT_large is trying to match the information from HuBERT_base layer 9. If you didn't have experience, please refer to the Fig. 4 of this [CCA paper](https://arxiv.org/pdf/2211.03929.pdf), which is usually helpful.
+   * Number of kmeans clusters also affect the variance in pronunciation, etc.
+   * Please check the kmeans labels in `dump/extracted/{kmeans_feat_type}/layer{layer}/{dset}/pseudo_label_km{ncluseters}.txt`. In my experience, a good km result for ASR should have an obvious pattern of repeatitions, e.g.


Maybe, you can add a bad example as well?

sw005320 · 2023-07-22T19:38:23Z

egs2/TEMPLATE/st2/st2.sh

@ftshijt, can you review it?

sw005320 · 2023-07-22T19:43:08Z

egs2/librispeech/asr2/run.sh


 ./asr2.sh \
-    --kmeans_opts "--batch_bins 4800000" \
+    --kmeans_opts "--batch_bins 4800000 --nj 4" \


You may need to increase --num_threads to deal with large memory consumptions in scikit-learn?
(Ideally, I want you to solve it by avoiding using such less-refined implementation.

OK. Working on this item.

sw005320 · 2023-07-22T19:46:25Z

espnet2/bin/mt_inference.py

+        if quantize_mt_model or quantize_lm:
+            if quantize_dtype == "float16" and torch.__version__ < LooseVersion(
+                "1.5.0"
+            ):
+                raise ValueError(
+                    "float16 dtype for dynamic quantization is not supported with "
+                    "torch version < 1.5.0. Switch to qint8 dtype instead."
+                )


since CI does not support 1.5.0, we can remove these lines

sw005320 · 2023-07-22T19:54:40Z

espnet2/tasks/mt.py

CTC BPE token part looks complicated and tricky.
It requires some documentation (in the source code and asr2 or st2 documents).

The idea is simple. Maybe my way is a bit complicated. In st2, we use different the text targets for CTC and attention-decoder.

CTC target: ASR transcriptions for ASR or ST, while <not_available> is used in MT.

Att-Dec target: ASR transcriptions for ASR, while translation transcriptions for ST / MT.

For this purpose, we need different text input as data. In ESPnet preprocessor, the number of tokenizer should match the text input.

assert ( len(token_type) == len(token_list) == len(bpemodel) == len(text_name) ), "token_type, token_list, bpemodel, or processing text_name mismatched"

But in practice, the bpe model for CTC and Att-Dec is the same. We combine the vocabulary for ASR language and Translation language. However, I made CTC text tokenizer option as an explicit argument, which is easy to change.

sw005320 · 2023-07-22T19:55:29Z

espnet2/train/preprocessor.py

        speech_name: str = "speech",
        text_name: List[str] = ["text"],
        tokenizer_encode_conf: List[Dict] = [dict(), dict()],
+        not_available_symbol: str = None,


Can you explain about it and embed the explanation in the source coed?

I'll put the following explanation down below.

# not_available_symbol is a special symbol as placeholder in the text. e.g. # "utt_id <na>" an item in the text input # Then such samples will not have the corresponding text signals. # The resulting tensor would be processed as torch.longtensor([-1])

…+ covost2/asr2

for more information, see https://pre-commit.ci

mergify · 2023-09-23T12:00:30Z

This pull request is now in conflict :(

ftshijt · 2023-09-27T17:11:11Z

egs2/TEMPLATE/st2/st2.sh

+speech_token_lang="wavlm_large_21_km2000"  # speech discrete token type abbrev. id (e.g., wavlm_large_21_km2000)
+src_tgt_text_case="lc.rm"       # source / target transcript case. Note, all source / target text should use the same case for now.
+src_tgt_text_lang=en            # source / target language abbrev. id (e.g., en). Multiple langs are supported to support multiple tasks, with space between (e.g., "es/en"), from data's perspect of view, src_lang of text is the first.multiple tasks, with space between (e.g., "es/en"), from data's perspect of view, src_lang of text is the first.
+tgt_tasks="asr/st"              # task abbrev. id (e.g., st). Multiple tasks are supported to support multiple tasks, with space between (e.g., "asr/st")


space between or "/" between?

Since it was for multi-task, is it a good idea to put in st task or a more general one?

I'm asking cause the major reason we split st from asr in the previous context is:

the different architectures between each other (for st only tasks, we also have a few unique architecture designs for each component (e.g., separate asr/mt decoder, two-pass framework with multi-decoder, etc.)

data preparation (design for tgt_text, src_text, and src_speech)

specific evaluation (bleu calculation and multi-reference support)

however, I feel many of the above parts are not shared (e.g., the architecture is still discrete asr; we skip the joint-task framework, instead, we do either asr or st in a single run; no support to multi-reference scoring)

Given the above reasons, I lean to call it s2t2 instead of st2. Please let me know you thoughts!

ftshijt · 2023-09-27T17:31:28Z

egs2/TEMPLATE/st2/st2.sh

@@ -0,0 +1,2066 @@
+#!/usr/bin/env bash


The script looks great!

One issue remains on my side is the support of multi-reference scenarios for ST evaluation (I double checked the support, but it seems not supported in preparation yet) Please refer https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/st1/st.sh#L547-L553 for some details on how we process those

For more information on the name:
https://github.com/espnet/espnet/blob/master/egs2/TEMPLATE/st1/st.sh#L1585C75-L1585C75

ftshijt · 2023-09-27T17:37:57Z

egs2/TEMPLATE/st2/st2.sh

+    hf_task=automatic-speech-recognition
+    # shellcheck disable=SC2034
+    espnet_task=ASR


consider changing it?

mergify · 2023-10-25T11:53:47Z

This pull request is now in conflict :(

mergify · 2024-02-06T02:47:56Z

This pull request is now in conflict :(

mergify · 2025-03-18T10:48:37Z

This pull request is now in conflict :(

mergify · 2025-06-13T11:17:48Z

This pull request is now in conflict :(

github-actions · 2025-12-11T02:13:28Z

This PR is stale because it has been open for 90 days with no activity.
It will be closed if no further activity occurs.
Thank you for your contributions.

github-actions · 2025-12-19T02:13:33Z

This PR is closed. Please re-open if needed.

mergify bot added ESPnet1 ESPnet2 README labels Jul 21, 2023

sw005320 added the ST Speech translation label Jul 21, 2023

sw005320 added this to the v.202307 milestone Jul 21, 2023

sw005320 requested a review from ftshijt July 21, 2023 03:49

simpleoier force-pushed the discrete_asr branch from e4a540c to 9ce90b2 Compare July 21, 2023 16:34

ftshijt reviewed Jul 21, 2023

View reviewed changes

egs2/TEMPLATE/st2/setup.sh Outdated Show resolved Hide resolved

egs2/TEMPLATE/st2/st2.sh Outdated Show resolved Hide resolved

egs2/covost2/asr2/README.md Outdated Show resolved Hide resolved

simpleoier force-pushed the discrete_asr branch 2 times, most recently from d4e8eb1 to 8babca8 Compare July 21, 2023 21:20

simpleoier changed the title ~~[WIP] CoVoST2 ASR2 recipe and new ST2 recipe~~ CoVoST2 ASR2 recipe and new ST2 recipe Jul 21, 2023

simpleoier force-pushed the discrete_asr branch from 8babca8 to 15400be Compare July 22, 2023 03:51

simpleoier force-pushed the discrete_asr branch from 15400be to 664f99f Compare July 22, 2023 05:02

simpleoier added the Need review label Jul 22, 2023

simpleoier force-pushed the discrete_asr branch from 664f99f to 3c29a3e Compare July 22, 2023 14:29

mergify bot added the conflicts label Jul 22, 2023

simpleoier force-pushed the discrete_asr branch from 3c29a3e to 1f8ab59 Compare July 22, 2023 14:49

mergify bot removed the conflicts label Jul 22, 2023

sw005320 reviewed Jul 22, 2023

View reviewed changes

simpleoier force-pushed the discrete_asr branch from 9a0120c to 81f93fa Compare July 23, 2023 15:30

add utt2category to numel sampler and add TEMPLATE/st2 + covost2/st2 …

68cf431

…+ covost2/asr2

simpleoier force-pushed the discrete_asr branch from 3ab027c to 68cf431 Compare July 23, 2023 18:10

[pre-commit.ci] auto fixes from pre-commit.com hooks

af06e2c

for more information, see https://pre-commit.ci

simpleoier mentioned this pull request Jul 26, 2023

Init asr2/sum2 for How2 #5353

Closed

2 tasks

kan-bayashi modified the milestones: v.202307, v.202312 Aug 3, 2023

mergify bot added the conflicts label Sep 23, 2023

mergify bot removed the conflicts label Sep 27, 2023

ftshijt reviewed Sep 27, 2023

View reviewed changes

kan-bayashi modified the milestones: v.202310, v.202312 Oct 25, 2023

mergify bot added the conflicts label Oct 25, 2023

kan-bayashi modified the milestones: v.202312, v.202405 Feb 6, 2024

Fhrozen modified the milestones: v.202409, v.202412 Oct 1, 2024

Fhrozen modified the milestones: v.202412, v.202503 Dec 4, 2024

mergify bot removed the conflicts label Mar 18, 2025

mergify bot added the conflicts label Mar 18, 2025

Fhrozen modified the milestones: v.202503, v.202506 Mar 27, 2025

mergify bot removed the conflicts label Jun 13, 2025

mergify bot added the conflicts label Jun 13, 2025

Fhrozen modified the milestones: v.202506, v.202509 Aug 11, 2025

Fhrozen modified the milestones: v.202509, v.202512 Sep 12, 2025

github-actions bot added the Stale For probot label Dec 11, 2025

github-actions bot closed this Dec 19, 2025

mergify bot removed the conflicts label Dec 19, 2025

	a. We may skip an LM
	a. We may skip an LM by adding an option `--use_lm false`

Conversation

simpleoier commented Jul 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

ST2 recipe

ASR2 recipe

Misc.

Uh oh!

ftshijt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jul 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mergify bot commented Jul 22, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

simpleoier Jul 23, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 23, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Oct 25, 2023

Uh oh!

mergify bot commented Feb 6, 2024

Uh oh!

mergify bot commented Mar 18, 2025

Uh oh!

mergify bot commented Jun 13, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

simpleoier commented Jul 21, 2023 •

edited

Loading

codecov bot commented Jul 22, 2023 •

edited

Loading

simpleoier Jul 23, 2023 •

edited

Loading