Conversation
|
This pull request is now in conflict :( |
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5581 +/- ##
==========================================
+ Coverage 70.62% 76.53% +5.91%
==========================================
Files 719 720 +1
Lines 66513 66623 +110
==========================================
+ Hits 46972 50992 +4020
+ Misses 19541 15631 -3910
Flags with carried forward coverage won't be shown. Click here to find out more. β View full report in Codecov by Sentry. |
|
LGTM! |
There was a problem hiding this comment.
Why it does not include the normalization?
There was a problem hiding this comment.
I think the batch_beam_search class inherits the sort function from the beam_search class. So we don't need to handle this in the batch_beam_search class.
|
@siddhu001, can you check the case of negative |
|
After @siddhu001's confirmation, I'll merge this PR. |
It's good to test with the attention-only model, as the ctc can greatly calibrate the length estimation so minlenratio will not be tested properly. |
Hi @sw005320, Yes I believe the case of negative minlenratio is the expected behaviour. We follow a similar approach for negative maxlenratio (i.e. maxlenratio=-1*maxlenratio when maxlenratio<0) @jctian98 thanks a lot for your efforts on this PR. This is very useful! |
What?
This is to respond to issues #5573 and #5580 to make the decoding hyper-parameter
minlenratioeffective in beam searchPreviously the
minlenratioand the variableminlenwere calculated but not applied, which leads to the issue as described below:Some models intend to predict the special symbol
<eos>at the first several decoding steps. As long as the invalid<eos>remains in the top-k list, these invalid hypotheses are added to the ended hypothesis list. Since the invalid hypotheses are short, the accumulated posterior is also very high compared to other valid hypotheses with much longer text hypotheses. The invalid hypotheses are then selected in the finalization stage, most of which are empty strings. As a consequence, the overall decoding results would be empty or very short. The issue is more likely to be observed when beam size is large as the<eos>token is more likely to stay in the top-k list.To overcome this issue, simply make the
minlenratiohyperparameter effective by adding anifcondition in thepost_processfunction. Then pass a small heuristic value to it (e.g., 0.1). This will force all ended hypotheses to have a length of at least 10% of the input lengths. Although it's heuristic, it works well generally.This PR also adds a length normalization feature for the beam search object. If activated, the best hypothesis is selected by the best token-level score rather than the accumulated score.