Fix Llama4 offset #37414

Cyrilvallez · 2025-04-10T08:37:18Z

What does this PR do?

See title. It's the same as before + 1. Indeed, we need to add the +1 to match correctly the positions, it's also what is done in gemma2 here https://github.com/huggingface/transformers/blob/main/src/transformers/models/gemma2/modeling_gemma2.py#L912-L913 as it uses the length (which is equal to cache_position[-1] + 1).
I used first_cache_position - attention_chunk_size + 1 here as I feel it simpler to understand when looking at the cache code (see the 1-offset here https://github.com/huggingface/transformers/blob/main/src/transformers/cache_utils.py#L1921-L1922), but it is striclty equal. I.e. we always have max(first_cache_position - attention_chunk_size + 1, 0) == max(last_cache_position + 1 - key_length, 0)

github-actions · 2025-04-10T08:37:31Z

Hi 👋, thank you for opening this pull request! The pull request is converted to draft by default. The CI will be paused while the PR is in draft mode. When it is ready for review, please click the Ready for review button (at the bottom of the PR page). This will assign reviewers and trigger CI.

HuggingFaceDocBuilderDev · 2025-04-10T09:02:25Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

This seems to work better / is simpler thanks

* add +1 * Update modeling_llama4.py

Cyrilvallez added 2 commits April 10, 2025 10:27

add +1

5c8c2b5

Update modeling_llama4.py

2f37c15

github-actions bot marked this pull request as draft April 10, 2025 08:37

Cyrilvallez marked this pull request as ready for review April 10, 2025 08:53

github-actions bot requested review from ArthurZucker and Rocketknight1 April 10, 2025 08:53

ArthurZucker approved these changes Apr 10, 2025

View reviewed changes

ArthurZucker merged commit 6d8b0b3 into main Apr 10, 2025
24 checks passed

ArthurZucker deleted the llama4-offset branch April 10, 2025 09:41

ArthurZucker pushed a commit that referenced this pull request Apr 10, 2025

Fix Llama4 offset (#37414)

894783c

* add +1 * Update modeling_llama4.py

cyr0930 pushed a commit to cyr0930/transformers that referenced this pull request Apr 18, 2025

Fix Llama4 offset (huggingface#37414)

cdac989

* add +1 * Update modeling_llama4.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Llama4 offset #37414

Fix Llama4 offset #37414

Cyrilvallez commented Apr 10, 2025 •

edited

Loading

github-actions bot commented Apr 10, 2025

HuggingFaceDocBuilderDev commented Apr 10, 2025

ArthurZucker left a comment

Fix Llama4 offset #37414

Fix Llama4 offset #37414

Conversation

Cyrilvallez commented Apr 10, 2025 • edited Loading

What does this PR do?

github-actions bot commented Apr 10, 2025

HuggingFaceDocBuilderDev commented Apr 10, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

Cyrilvallez commented Apr 10, 2025 •

edited

Loading