Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

vasqu
Copy link
Contributor

@vasqu vasqu commented Sep 11, 2025

Seems like the integration tests on our CI also died for a while.

Fixes the rope dimension for jetmoe by setting a respective attribute mapping - the normal calculation is not valid, ie hidden_dim / num_attn_heads. For reference, why this is valid see

self.head_dim = config.kv_channels

There could be better solutions but not sure if it's worth the effort.
Fixes #40817

cc @gante @ArthurZucker

@vasqu vasqu mentioned this pull request Sep 11, 2025
4 tasks
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was a test missing or just not being fetched ? (if you can check if we need a patch for this would be nice!)

@vasqu
Copy link
Contributor Author

vasqu commented Sep 11, 2025

run-slow: jetmoe

@vasqu
Copy link
Contributor Author

vasqu commented Sep 11, 2025

Looks like our integration tests were just failing which would've indicated this - not sure how long it has been like this tbh

The set values in the common tests were good enough so that this wasnt discovered. Checking in if the slow tests are run now; will likely add a for patch label here

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Sep 11, 2025

run-slow: jetmoe

@vasqu vasqu added the for patch Tag issues / labels that should be included in the next patch label Sep 11, 2025
Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: jetmoe

Comment on lines -935 to -943
if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
batch_size = inputs_embeds.shape[0]
is_padding_right = attention_mask[:, -1].sum().item() != batch_size
if is_padding_right:
raise ValueError(
"You are attempting to perform batched generation with padding_side='right'"
" this may lead to unexpected behaviour for Flash Attention version of JetMoe. Make sure to "
" call `tokenizer.padding_side = 'left'` before tokenizing the input. "
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this

# decoder-only models must use left-padding for batched generation.
if not self.config.is_encoder_decoder:
# If `input_ids` was given, check if the last id in any sequence is `pad_token_id`
# Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off.
if (
generation_config._pad_token_tensor is not None
and batch_size > 1
and len(inputs_tensor.shape) == 2
and torch.sum(inputs_tensor[:, -1] == generation_config._pad_token_tensor) > 0
):
logger.warning(
"A decoder-only architecture is being used, but right-padding was detected! For correct "
"generation results, please set `padding_side='left'` when initializing the tokenizer."
)

Doesn't make sense to keep (fails the test occasionally as we can't guarantee it with randomly initialized masks)

@vasqu
Copy link
Contributor Author

vasqu commented Sep 11, 2025

run-slow: jetmoe

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

@vasqu
Copy link
Contributor Author

vasqu commented Sep 11, 2025

All tests pass, no more killed processes! Merging

@vasqu vasqu merged commit cf084f5 into huggingface:main Sep 11, 2025
18 checks passed
@vasqu vasqu deleted the fix-jetmoe branch September 11, 2025 17:36
@gante
Copy link
Member

gante commented Sep 12, 2025

Thank you for fixing 🙏

LysandreJik pushed a commit that referenced this pull request Sep 17, 2025
* fix

* remove prints

* why was this there...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for patch Tag issues / labels that should be included in the next patch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Got shape error when running JetMoe
4 participants