Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666

Closed
msaelices wants to merge 3 commits into
modular:mainfrom
msaelices:fix-gemma4-null-num-global-kv-heads
Closed

[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666
msaelices wants to merge 3 commits into
modular:mainfrom
msaelices:fix-gemma4-null-num-global-kv-heads

Conversation

@msaelices

@msaelices msaelices commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes the TypeError: unsupported operand type(s) for %: 'NoneType' and 'int' crash when serving google/gemma-4-E4B-it (fixes #6665).

The google/gemma-4-E*B checkpoints ship "num_global_key_value_heads": null in text_config. Hugging Face defines null/absent as "defaults to num_key_value_heads" (configuration_gemma4.py); its modeling code only consults the field when attention_k_eq_v is true (false on E*B). MAX read it without a fallback in Gemma4TextConfig.initialize_from_config and Gemma4ForConditionalGenerationConfig.construct_kv_params.

  • Adds a shared _resolve_num_global_kv_heads() helper implementing the HF fallback, used at both call sites. Checkpoints with an explicit value (gemma-4-31b-it, gemma-4-26B-A4B-it) are unaffected.
  • Adds a CPU-only test_config_init.py with a local gemma-4-E4B-it config.json exercising the crash path (mirrors the qwen2_5vl test_config_init pattern), registered as its own bazel target and excluded from the GPU tests glob.

Scope note

E*B variants still do not serve end to end after this fix: their per-layer-embedding (MatFormer) weights are not implemented in the gemma4 graph and now surface as strict=True unexpected keys during weight load (details in #6665). This PR removes the first blocker and makes the real gap visible.

The same latent pattern in unified_mtp_gemma4/model.py (tc.get("num_global_key_value_heads", 4).get with a default still returns None for an explicit JSON null) is deliberately left to a follow-up: that architecture is exercised by speculative-decoding paths I cannot test here. It is flagged in #6665.

Verification

  • New tests: 4/4 pass (run against a patched 26.3.0 install with --noconftest; the conftest needs torch). Reverting the two call sites reproduces the original TypeError, so the tests genuinely cover the crash path.

  • max serve --model google/gemma-4-E4B-it with the fix applied to 26.3.0 on an A10G (g5.xlarge) progresses past config resolution, builds the pipeline, and reaches weight loading (where the separate PLE gap reports). For that serve check the hardcoded 15 GiB vision-activation headroom (estimate_activation_memory, blocker 2 in [BUG] Serving google/gemma-4-E4B-it crashes: num_global_key_value_heads is null -> TypeError in construct_kv_params #6665) was locally reduced — it otherwise rejects E4B on a 24 GB card at memory estimation, before loading.

  • _resolve_num_global_kv_heads exercised against the real config.json of all five google/gemma-4 checkpoints:

    Checkpoint raw value resolved outcome
    gemma-4-26B-A4B-it 2 2 unchanged
    gemma-4-31b-it 4 4 unchanged
    gemma-4-12b-it 1 1 unchanged
    gemma-4-E4B-it null 2 (= num_key_value_heads) fallback
    gemma-4-E2B-it null 1 (= num_key_value_heads) fallback

Assisted-by: AI

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an integration test and fixture config to ensure Gemma4 config initialization handles num_global_key_value_heads: null (HF fallback semantics) without crashing.

Changes:

  • Introduces _resolve_num_global_kv_heads() and uses it in Gemma4 text config initialization + KV param construction.
  • Adds a local Gemma4 E4B config.json fixture and integration tests that exercise the previously crashing code path.
  • Updates Bazel targets to include the new test and its data dependency.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
max/python/max/pipelines/architectures/gemma4/model_config.py Adds null/absent fallback logic for num_global_key_value_heads and applies it in two call sites.
max/tests/integration/architectures/gemma4/test_config_init.py Adds integration tests covering the null-global-KV-heads scenario using a local HF config.
max/tests/integration/architectures/gemma4/configs/gemma4_e4b/config.json Adds a local HF-style config fixture with num_global_key_value_heads: null.
max/tests/integration/architectures/gemma4/BUILD.bazel Registers the new test and wires in the config fixture as test data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +41 to +42
def _load_hf_config() -> AutoConfig:
return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)

@msaelices msaelices Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Annotation is now PretrainedConfig.

Comment on lines +38 to +42
CONFIG_DIR = Path(__file__).parent / "configs" / "gemma4_e4b"


def _load_hf_config() -> AutoConfig:
return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kept the directory form: AutoConfig.from_pretrained expects a directory (or repo id), not a config.json path — passing the file directly is the less-supported route. The fixture's config.json is a data dep so it materializes in runfiles, and the test resolves it green in CI.



def _load_hf_config() -> AutoConfig:
return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)

@msaelices msaelices Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Dropped trust_remote_code. The fixture has no auto_map, and importing model_config registers the gemma4 config shim, so AutoConfig resolves it natively without it.

Comment on lines +62 to +65
def test_resolve_num_global_kv_heads() -> None:
"""null/absent falls back to num_key_value_heads; explicit value wins."""
null_config = Mock(num_global_key_value_heads=None, num_key_value_heads=2)
assert _resolve_num_global_kv_heads(null_config) == 2

@msaelices msaelices Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already covered. test_resolve_num_global_kv_heads includes an AbsentFieldConfig (a plain class, no num_global_key_value_heads attribute) exercising the getattr(..., None) default branch, alongside the explicit-null and explicit-int cases.

@msaelices msaelices marked this pull request as draft June 11, 2026 20:49
@msaelices msaelices marked this pull request as ready for review June 11, 2026 21:06
The google/gemma-4-E*B checkpoints ship "num_global_key_value_heads": null
in text_config. Hugging Face defines null/absent as "defaults to
num_key_value_heads" (configuration_gemma4.py), and its modeling code only
consults the field when attention_k_eq_v is true (false on E*B).

MAX read the field without a fallback in two places, so serving
google/gemma-4-E4B-it crashed during config resolution:

    construct_kv_params -> kv_cache_config.to_params(n_kv_heads=None, ...)
    TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'

Add a shared _resolve_num_global_kv_heads() helper implementing the HF
fallback and use it in Gemma4TextConfig.initialize_from_config and
Gemma4ForConditionalGenerationConfig.construct_kv_params. Behavior for
checkpoints with an explicit value (gemma-4-31b-it, gemma-4-26B-A4B-it) is
unchanged.

Add a CPU-only config-init test with a local gemma-4-E4B-it config.json
reproducing the crash path, mirroring the qwen2_5vl test_config_init
pattern.

Note: E*B variants still do not serve end to end after this fix -- their
per-layer-embedding (MatFormer) weights are not yet implemented in the
gemma4 graph and weight loading reports them as unexpected keys. This
change removes the first blocker and surfaces the real gap.
END_PUBLIC

Assisted-by: AI
Signed-off-by: Manuel Saelices <[email protected]>
… fix

- test_config_init.py: cover the absent-attribute branch with a plain
  object (bare Mocks auto-create attributes), pin the resolved fallback
  into the cache params (global cache n_kv_heads=2 / head_dim=512,
  sliding 2/256) instead of a vacuous not-None assert, and add a test
  for the framework entry point
  Gemma4ForConditionalGenerationConfig.initialize_from_config.
- model_config.py: note in the helper docstring that transformers only
  documents the null fallback (its code never applies it; modeling
  consults the field only when attention_k_eq_v is true), so readers
  don't go looking for HF resolution code that doesn't exist.
- ruff format clean at the repo's 80-column config.

Assisted-by: AI
Signed-off-by: Manuel Saelices <[email protected]>
@msaelices msaelices force-pushed the fix-gemma4-null-num-global-kv-heads branch from 92d3f4b to ccc8fcf Compare June 11, 2026 21:10

@k-w-w k-w-w left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

the transformers ``configuration_gemma4.py`` docstring defines null/absent
as "defaults to ``num_key_value_heads``". Note transformers never applies
that fallback in code -- its modeling only consults the field when
``attention_k_eq_v`` is true (false on E*B), so resolving here matches the

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ModelConfig already has attention_k_eq_v, why not add the same self.use_alternative_attention check in Gemma4Attention, to match the transformers reference implementation?

@msaelices msaelices Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The config-level resolve produces the HF-equivalent value: the field is only consumed for full-attention (global) layers, and on E*B (attention_k_eq_v=False) the resolved value equals num_key_value_heads exactly transformers' "null defaults to num_key_value_heads" semantic. Gemma4Attention already keys its V-projection on attention_k_eq_v (_has_v_proj), so the attention layer's structure already follows the reference. Resolving in the config keeps the null-handling in one place rather than threading it through the layer. Happy to move the resolution into Gemma4Attention instead if you'd prefer the structural parity there, let me know.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for the explanation!

- Correct the _load_hf_config return annotation (PretrainedConfig, not
  AutoConfig).
- Drop trust_remote_code=True: the fixture has no auto_map and importing
  model_config registers the gemma4 shim, so AutoConfig resolves it
  natively (review: @Copilot).

Assisted-by: AI
Signed-off-by: Manuel Saelices <[email protected]>
@k-w-w

k-w-w commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

!sync

@modularbot modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 15, 2026
@modularbot

Copy link
Copy Markdown
Collaborator

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the main branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

@modularbot modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels Jun 16, 2026
@modularbot

Copy link
Copy Markdown
Collaborator

Landed in 30f15cf! Thank you for your contribution 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

imported-internally Signals that a given pull request has been imported internally. merged-externally Merged externally in public mojo repo merged-internally Indicates that this pull request has been merged internally

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Serving google/gemma-4-E4B-it crashes: num_global_key_value_heads is null -> TypeError in construct_kv_params

4 participants