inference trick from: "Improving Cross-Attention based on Positional Alignment during Inference for Robust Long-form Speech Recognition"#6339
Conversation
…lignment during Inference
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Code Review
This pull request implements an inference trick from the paper "Improving Cross-Attention based on Positional Alignment during Inference for Robust Long-form Speech Recognition". The changes introduce a new decoding option and modify several components to apply a Gaussian bias to cross-attention scores during inference. My review identified a critical issue where the new feature would be silently ignored when using optimized attention mechanisms like Flash Attention. I have also pointed out a minor type hint inconsistency. Overall, the implementation of the core logic appears sound, but the interaction with existing optimizations needs to be addressed to ensure the feature works correctly in all configurations.
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
What did you change?
Why did you make this change?
Is your PR small enough?
yes
Additional Context