[`Jetmoe`] Fix RoPE #40819

vasqu · 2025-09-11T13:49:57Z

Seems like the integration tests on our CI also died for a while.

Fixes the rope dimension for jetmoe by setting a respective attribute mapping - the normal calculation is not valid, ie hidden_dim / num_attn_heads. For reference, why this is valid see

transformers/src/transformers/models/jetmoe/modeling_jetmoe.py

Line 494 in 895b3eb

self.head_dim = config.kv_channels

There could be better solutions but not sure if it's worth the effort.
Fixes #40817

cc @gante @ArthurZucker

HuggingFaceDocBuilderDev · 2025-09-11T13:59:23Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Was a test missing or just not being fetched ? (if you can check if we need a patch for this would be nice!)

vasqu · 2025-09-11T14:19:55Z

run-slow: jetmoe

vasqu · 2025-09-11T14:21:22Z

Looks like our integration tests were just failing which would've indicated this - not sure how long it has been like this tbh

The set values in the common tests were good enough so that this wasnt discovered. Checking in if the slow tests are run now; will likely add a for patch label here

github-actions · 2025-09-11T14:21:27Z

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

vasqu · 2025-09-11T14:57:42Z

run-slow: jetmoe

github-actions · 2025-09-11T14:59:00Z

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

github-actions · 2025-09-11T16:23:57Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: jetmoe

vasqu · 2025-09-11T16:24:27Z

src/transformers/models/jetmoe/modeling_jetmoe.py

-        if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
-            batch_size = inputs_embeds.shape[0]
-            is_padding_right = attention_mask[:, -1].sum().item() != batch_size
-            if is_padding_right:
-                raise ValueError(
-                    "You are attempting to perform batched generation with padding_side='right'"
-                    " this may lead to unexpected behaviour for Flash Attention version of JetMoe. Make sure to "
-                    " call `tokenizer.padding_side  = 'left'` before tokenizing the input. "
-                )


We already have this

transformers/src/transformers/generation/utils.py

Lines 2414 to 2427 in dfae7dd

# decoder-only models must use left-padding for batched generation.

if not self.config.is_encoder_decoder:

# If `input_ids` was given, check if the last id in any sequence is `pad_token_id`

# Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off.

if (

generation_config._pad_token_tensor is not None

and batch_size > 1

and len(inputs_tensor.shape) == 2

and torch.sum(inputs_tensor[:, -1] == generation_config._pad_token_tensor) > 0

):

logger.warning(

"A decoder-only architecture is being used, but right-padding was detected! For correct "

"generation results, please set `padding_side='left'` when initializing the tokenizer."

)

Doesn't make sense to keep (fails the test occasionally as we can't guarantee it with randomly initialized masks)

vasqu · 2025-09-11T16:24:42Z

run-slow: jetmoe

github-actions · 2025-09-11T16:26:21Z

This comment contains run-slow, running the specified jobs:

models: ['models/jetmoe']
quantizations: [] ...

vasqu · 2025-09-11T16:41:07Z

All tests pass, no more killed processes! Merging

gante · 2025-09-12T09:07:04Z

Thank you for fixing 🙏

* fix * remove prints * why was this there...

fix

aebd6dd

vasqu mentioned this pull request Sep 11, 2025

Got shape error when running JetMoe #40817

Closed

4 tasks

ArthurZucker approved these changes Sep 11, 2025

View reviewed changes

remove prints

9925104

vasqu added the for patch Tag issues / labels that should be included in the next patch label Sep 11, 2025

why was this there...

827ed7c

vasqu commented Sep 11, 2025

View reviewed changes

vasqu merged commit cf084f5 into huggingface:main Sep 11, 2025
18 checks passed

vasqu deleted the fix-jetmoe branch September 11, 2025 17:36

wtomin mentioned this pull request Sep 12, 2025

feat(transformers): Add Jetmoe mindspore-lab/mindone#1263

Open

LysandreJik pushed a commit that referenced this pull request Sep 17, 2025

[Jetmoe] Fix RoPE (#40819)

694410d

* fix * remove prints * why was this there...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[`Jetmoe`] Fix RoPE #40819

[`Jetmoe`] Fix RoPE #40819

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

Uh oh!

gante commented Sep 12, 2025

Uh oh!

Uh oh!

	# decoder-only models must use left-padding for batched generation.
	if not self.config.is_encoder_decoder:
	# If `input_ids` was given, check if the last id in any sequence is `pad_token_id`
	# Note: If using, `inputs_embeds` this check does not work, because we want to be more hands-off.
	if (
	generation_config._pad_token_tensor is not None
	and batch_size > 1
	and len(inputs_tensor.shape) == 2
	and torch.sum(inputs_tensor[:, -1] == generation_config._pad_token_tensor) > 0
	):
	logger.warning(
	"A decoder-only architecture is being used, but right-padding was detected! For correct "
	"generation results, please set `padding_side='left'` when initializing the tokenizer."
	)

[Jetmoe] Fix RoPE #40819

[Jetmoe] Fix RoPE #40819

Uh oh!

Conversation

vasqu commented Sep 11, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Sep 11, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

vasqu commented Sep 11, 2025

Uh oh!

Uh oh!

gante commented Sep 12, 2025

Uh oh!

Uh oh!

[`Jetmoe`] Fix RoPE #40819

[`Jetmoe`] Fix RoPE #40819