[BUG] remove customized attentionLSTM layer. Replace with keras attention #8721
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Fixes #8696
May clash with #8710
What does this implement/fix? Explain your changes.
This PR fixes a critical dimension mismatch issue in the LSTM-FCN network's attention mechanism. The original custom attention implementation was designed for sequence-level processing but was being used in an LSTM cell context, causing the attention mechanism to fail when processing individual timesteps.
Changes made:
AttentionLSTM
implementation with the standard KerasAttention
layer_time_distributed_dense
function that expected timesteps that didn't exist at the cell levelattention=True
parameter still works as expectedWhy this fix was needed:
The original attention mechanism tried to process individual timesteps
(batch, features)
but expected full sequences(batch, timesteps, features)
. This caused the_time_distributed_dense
function to fail when trying to reshape inputs that had no timestep dimension.Does your contribution introduce a new dependency? If yes, which one?
No new dependencies. The Keras
Attention
layer is already available in TensorFlow, which is already a dependency of sktime.What should a reviewer concentrate their feedback on?
Did you add any tests for the change?
Yes, the existing tests should continue to pass. The fix resolves the core issue that was preventing the attention mechanism from working, so existing LSTM-FCN tests with
attention=True
should now work correctly. (not sure if there were any of these tests though(Any other comments?
This fix addresses a fundamental architectural issue where the attention mechanism was misapplied. The original custom implementation was overly complex for the use case and introduced bugs. The Keras
Attention
layer provides the same functionality in a simpler, more reliable way.The fix also makes the codebase more maintainable by removing custom attention code that was difficult to debug and maintain.