Improving OWSM inference interface#5618
Conversation
for more information, see https://pre-commit.ci
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #5618 +/- ##
===========================================
+ Coverage 24.25% 76.06% +51.81%
===========================================
Files 720 735 +15
Lines 66641 68696 +2055
===========================================
+ Hits 16161 52255 +36094
+ Misses 50480 16441 -34039
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
|
Hi, can we considering merging this soon? I will provide example usage in the project page based on this new interface. As reported previously, we can significantly improve the long-form ASR performance with some rules and heuristics |
|
Yes, but currently, CI is broken. |
|
@sw005320 The CI has passed the previous issue. Should be good to go ahead |
|
Cool! @pyf98, please deal with the CI error |
|
Thanks! It seems that some TTS tests failed |
|
@jctian98, this is a reminder. |
espnet2/bin/s2t_inference.py
Outdated
|
|
||
|
|
||
| class ScoreFilter(BatchScorerInterface, torch.nn.Module): | ||
| """Filter scores based on pre-defined rules.""" |
There was a problem hiding this comment.
Can you add more explanations here or in other places about what kind of pre-defined rules?
There was a problem hiding this comment.
I added some more comments below
|
The code is very clear. I'm ok with it. A simple discussion for future development: |
|
Thanks @jctian98 for the comment. In the previous version of
I think we can keep this separate design for now. |
|
One more note is that the order of tokens is: The language token is between In terms of the implementation, we have to set both |
|
Thanks, @pyf98! |
What?
preprocessor_conf. We no longer need to provide them manually.lang_sym,task_symandpredict_timecan be passed as additional arguments when calling__call__, which will overwrite the default values in__init__. This is more convenient to use.s2t_inference_language.BeamSearchis also removed. Only a single decoder forward step is performed. And it can return an N-best list of language and (normalized) probability.scorerthat assigns certain tokens with a-infscore duringsearch. With this modification, it can now predict the first timestamp by itself. Previously, the first timestamp has to be manually set, which is usually inaccurate.