Codestin Search App

jctian98 · 2023-12-06T06:42:40Z

What?

This is to respond to issues #5573 and #5580 to make the decoding hyper-parameter minlenratio effective in beam search

Previously the minlenratio and the variable minlen were calculated but not applied, which leads to the issue as described below:
Some models intend to predict the special symbol <eos> at the first several decoding steps. As long as the invalid <eos> remains in the top-k list, these invalid hypotheses are added to the ended hypothesis list. Since the invalid hypotheses are short, the accumulated posterior is also very high compared to other valid hypotheses with much longer text hypotheses. The invalid hypotheses are then selected in the finalization stage, most of which are empty strings. As a consequence, the overall decoding results would be empty or very short. The issue is more likely to be observed when beam size is large as the <eos> token is more likely to stay in the top-k list.

To overcome this issue, simply make the minlenratio hyperparameter effective by adding an if condition in the post_process function. Then pass a small heuristic value to it (e.g., 0.1). This will force all ended hypotheses to have a length of at least 10% of the input lengths. Although it's heuristic, it works well generally.

This PR also adds a length normalization feature for the beam search object. If activated, the best hypothesis is selected by the best token-level score rather than the accumulated score.

mergify · 2023-12-06T06:43:16Z

This pull request is now in conflict :(

codecov · 2023-12-06T07:51:31Z

Codecov Report

Attention: 5 lines in your changes are missing coverage. Please review.

Comparison is base (1c55053) 70.62% compared to head (b34bc7b) 76.53%.

Files	Patch %	Lines
espnet/nets/batch_beam_search_online.py	75.00%	2 Missing ⚠️
espnet/nets/batch_beam_search_online_sim.py	66.66%	2 Missing ⚠️
espnet/nets/beam_search.py	85.71%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #5581      +/-   ##
==========================================
+ Coverage   70.62%   76.53%   +5.91%     
==========================================
  Files         719      720       +1     
  Lines       66513    66623     +110     
==========================================
+ Hits        46972    50992    +4020     
+ Misses      19541    15631    -3910

Flag	Coverage Δ
test_configuration_espnet2	`∅ <ø> (∅)`
test_integration_espnet1	`62.92% <66.66%> (-0.01%)`	⬇️
test_integration_espnet2	`50.11% <70.00%> (+<0.01%)`	⬆️
test_python_espnet1	`19.08% <25.00%> (?)`
test_python_espnet2	`52.38% <75.00%> (-0.01%)`	⬇️
test_utils	`22.15% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

sw005320 · 2023-12-06T12:24:44Z

LGTM!
Once you fix the CI error, I'll merge it.

sw005320 · 2023-12-14T10:21:07Z

espnet/nets/batch_beam_search.py

Why it does not include the normalization?

I think the batch_beam_search class inherits the sort function from the beam_search class. So we don't need to handle this in the batch_beam_search class.

sw005320 · 2023-12-14T10:22:30Z

@siddhu001, can you check the case of negative minlenratio?
Would it be an expected behavior?

sw005320 · 2023-12-14T20:02:38Z

After @siddhu001's confirmation, I'll merge this PR.

jctian98 · 2023-12-14T20:17:37Z

@siddhu001, can you check the case of negative minlenratio? Would it be an expected behavior?

It's good to test with the attention-only model, as the ctc can greatly calibrate the length estimation so minlenratio will not be tested properly.
Please let me know if there is any issue. Thanks @siddhu001

siddhu001 · 2023-12-15T02:29:01Z

@siddhu001, can you check the case of negative minlenratio? Would it be an expected behavior?

Hi @sw005320,

Yes I believe the case of negative minlenratio is the expected behaviour. We follow a similar approach for negative maxlenratio (i.e. maxlenratio=-1*maxlenratio when maxlenratio<0)

@jctian98 thanks a lot for your efforts on this PR. This is very useful!

jctian98 added 5 commits December 5, 2023 22:07

fix bug: make minlenratio effective

0583265

add length norm option

1efe6a7

fix the length normalize

5cf8d93

add asr_inference

3bec346

apply black

66f14cf

mergify bot added conflicts ESPnet1 ESPnet2 labels Dec 6, 2023

Merge branch 'master' into decoding_bug

18e0919

mergify bot removed the conflicts label Dec 6, 2023

jctian98 added 2 commits December 6, 2023 00:50

fix a bug

39f7e55

fix ci issue

313d183

sw005320 added the Bugfix label Dec 6, 2023

sw005320 added this to the v.202312 milestone Dec 6, 2023

jctian98 and others added 2 commits December 13, 2023 22:43

update online version and bin files

95825ff

Merge branch 'master' into decoding_bug

b34bc7b

sw005320 reviewed Dec 14, 2023

View reviewed changes

jctian98 mentioned this pull request Dec 14, 2023

Increasing the beam size can sometimes lead to the elimination of inference in ASR on custom dataset. #5573

Closed

siddhu001 merged commit c39ce51 into espnet:master Dec 15, 2023

jctian98 deleted the decoding_bug branch December 15, 2023 04:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make minlenratio effective#5581

Make minlenratio effective#5581
siddhu001 merged 10 commits intoespnet:masterfrom
jctian98:decoding_bug

jctian98 commented Dec 6, 2023

Uh oh!

mergify bot commented Dec 6, 2023

Uh oh!

codecov bot commented Dec 6, 2023 •

edited

Loading

Uh oh!

sw005320 commented Dec 6, 2023

Uh oh!

sw005320 Dec 14, 2023

Uh oh!

jctian98 Dec 14, 2023 •

edited

Loading

Uh oh!

sw005320 commented Dec 14, 2023

Uh oh!

sw005320 commented Dec 14, 2023

Uh oh!

jctian98 commented Dec 14, 2023

Uh oh!

siddhu001 commented Dec 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jctian98 commented Dec 6, 2023

What?

Uh oh!

mergify bot commented Dec 6, 2023

Uh oh!

codecov bot commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

sw005320 commented Dec 6, 2023

Uh oh!

sw005320 Dec 14, 2023

Choose a reason for hiding this comment

Uh oh!

jctian98 Dec 14, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sw005320 commented Dec 14, 2023

Uh oh!

sw005320 commented Dec 14, 2023

Uh oh!

jctian98 commented Dec 14, 2023

Uh oh!

siddhu001 commented Dec 15, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov bot commented Dec 6, 2023 •

edited

Loading

jctian98 Dec 14, 2023 •

edited

Loading