-
Notifications
You must be signed in to change notification settings - Fork 30.5k
[Jetmoe
] Fix RoPE
#40819
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Jetmoe
] Fix RoPE
#40819
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was a test missing or just not being fetched ? (if you can check if we need a patch for this would be nice!)
run-slow: jetmoe |
Looks like our integration tests were just failing which would've indicated this - not sure how long it has been like this tbh The set values in the common tests were good enough so that this wasnt discovered. Checking in if the slow tests are run now; will likely add a for patch label here |
This comment contains run-slow, running the specified jobs: models: ['models/jetmoe'] |
run-slow: jetmoe |
This comment contains run-slow, running the specified jobs: models: ['models/jetmoe'] |
[For maintainers] Suggested jobs to run (before merge) run-slow: jetmoe |
if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache: | ||
batch_size = inputs_embeds.shape[0] | ||
is_padding_right = attention_mask[:, -1].sum().item() != batch_size | ||
if is_padding_right: | ||
raise ValueError( | ||
"You are attempting to perform batched generation with padding_side='right'" | ||
" this may lead to unexpected behaviour for Flash Attention version of JetMoe. Make sure to " | ||
" call `tokenizer.padding_side = 'left'` before tokenizing the input. " | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have this
transformers/src/transformers/generation/utils.py
Lines 2414 to 2427 in dfae7dd
# decoder-only models must use left-padding for batched generation. | |
if not self.config.is_encoder_decoder: | |
# If `input_ids` was given, check if the last id in any sequence is `pad_token_id` | |
# Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off. | |
if ( | |
generation_config._pad_token_tensor is not None | |
and batch_size > 1 | |
and len(inputs_tensor.shape) == 2 | |
and torch.sum(inputs_tensor[:, -1] == generation_config._pad_token_tensor) > 0 | |
): | |
logger.warning( | |
"A decoder-only architecture is being used, but right-padding was detected! For correct " | |
"generation results, please set `padding_side='left'` when initializing the tokenizer." | |
) |
Doesn't make sense to keep (fails the test occasionally as we can't guarantee it with randomly initialized masks)
run-slow: jetmoe |
This comment contains run-slow, running the specified jobs: models: ['models/jetmoe'] |
All tests pass, no more killed processes! Merging |
Thank you for fixing 🙏 |
Seems like the integration tests on our CI also died for a while.
Fixes the rope dimension for jetmoe by setting a respective attribute mapping - the normal calculation is not valid, ie
hidden_dim / num_attn_heads
. For reference, why this is valid seetransformers/src/transformers/models/jetmoe/modeling_jetmoe.py
Line 494 in 895b3eb
There could be better solutions but not sure if it's worth the effort.
Fixes #40817
cc @gante @ArthurZucker