Codestin Search App

msaelices · 2026-06-11T20:46:53Z

Summary

Fixes the TypeError: unsupported operand type(s) for %: 'NoneType' and 'int' crash when serving google/gemma-4-E4B-it (fixes #6665).

The google/gemma-4-E*B checkpoints ship "num_global_key_value_heads": null in text_config. Hugging Face defines null/absent as "defaults to num_key_value_heads" (configuration_gemma4.py); its modeling code only consults the field when attention_k_eq_v is true (false on E*B). MAX read it without a fallback in Gemma4TextConfig.initialize_from_config and Gemma4ForConditionalGenerationConfig.construct_kv_params.

Adds a shared _resolve_num_global_kv_heads() helper implementing the HF fallback, used at both call sites. Checkpoints with an explicit value (gemma-4-31b-it, gemma-4-26B-A4B-it) are unaffected.
Adds a CPU-only test_config_init.py with a local gemma-4-E4B-it config.json exercising the crash path (mirrors the qwen2_5vl test_config_init pattern), registered as its own bazel target and excluded from the GPU tests glob.

Scope note

E*B variants still do not serve end to end after this fix: their per-layer-embedding (MatFormer) weights are not implemented in the gemma4 graph and now surface as strict=True unexpected keys during weight load (details in #6665). This PR removes the first blocker and makes the real gap visible.

The same latent pattern in unified_mtp_gemma4/model.py (tc.get("num_global_key_value_heads", 4) — .get with a default still returns None for an explicit JSON null) is deliberately left to a follow-up: that architecture is exercised by speculative-decoding paths I cannot test here. It is flagged in #6665.

Verification

New tests: 4/4 pass (run against a patched 26.3.0 install with --noconftest; the conftest needs torch). Reverting the two call sites reproduces the original TypeError, so the tests genuinely cover the crash path.
max serve --model google/gemma-4-E4B-it with the fix applied to 26.3.0 on an A10G (g5.xlarge) progresses past config resolution, builds the pipeline, and reaches weight loading (where the separate PLE gap reports). For that serve check the hardcoded 15 GiB vision-activation headroom (estimate_activation_memory, blocker 2 in [BUG] Serving google/gemma-4-E4B-it crashes: num_global_key_value_heads is null -> TypeError in construct_kv_params #6665) was locally reduced — it otherwise rejects E4B on a 24 GB card at memory estimation, before loading.

_resolve_num_global_kv_heads exercised against the real config.json of all five google/gemma-4 checkpoints:

Checkpoint	raw value	resolved	outcome
`gemma-4-26B-A4B-it`	`2`	`2`	unchanged
`gemma-4-31b-it`	`4`	`4`	unchanged
`gemma-4-12b-it`	`1`	`1`	unchanged
`gemma-4-E4B-it`	`null`	`2` (= `num_key_value_heads`)	fallback
`gemma-4-E2B-it`	`null`	`1` (= `num_key_value_heads`)	fallback

Assisted-by: AI

Copilot

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Adds an integration test and fixture config to ensure Gemma4 config initialization handles num_global_key_value_heads: null (HF fallback semantics) without crashing.

Changes:

Introduces _resolve_num_global_kv_heads() and uses it in Gemma4 text config initialization + KV param construction.
Adds a local Gemma4 E4B config.json fixture and integration tests that exercise the previously crashing code path.
Updates Bazel targets to include the new test and its data dependency.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File	Description
max/python/max/pipelines/architectures/gemma4/model_config.py	Adds null/absent fallback logic for `num_global_key_value_heads` and applies it in two call sites.
max/tests/integration/architectures/gemma4/test_config_init.py	Adds integration tests covering the null-global-KV-heads scenario using a local HF config.
max/tests/integration/architectures/gemma4/configs/gemma4_e4b/config.json	Adds a local HF-style config fixture with `num_global_key_value_heads: null`.
max/tests/integration/architectures/gemma4/BUILD.bazel	Registers the new test and wires in the config fixture as test data.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

msaelices · 2026-06-13T21:39:36Z

+def _load_hf_config() -> AutoConfig:
+    return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)


Fixed. Annotation is now PretrainedConfig.

msaelices · 2026-06-13T21:39:37Z

+CONFIG_DIR = Path(__file__).parent / "configs" / "gemma4_e4b"
+
+
+def _load_hf_config() -> AutoConfig:
+    return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)


Kept the directory form: AutoConfig.from_pretrained expects a directory (or repo id), not a config.json path — passing the file directly is the less-supported route. The fixture's config.json is a data dep so it materializes in runfiles, and the test resolves it green in CI.

msaelices · 2026-06-13T21:39:39Z

+
+
+def _load_hf_config() -> AutoConfig:
+    return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)


Fixed. Dropped trust_remote_code. The fixture has no auto_map, and importing model_config registers the gemma4 config shim, so AutoConfig resolves it natively without it.

msaelices · 2026-06-13T21:39:40Z

+def test_resolve_num_global_kv_heads() -> None:
+    """null/absent falls back to num_key_value_heads; explicit value wins."""
+    null_config = Mock(num_global_key_value_heads=None, num_key_value_heads=2)
+    assert _resolve_num_global_kv_heads(null_config) == 2


Already covered. test_resolve_num_global_kv_heads includes an AbsentFieldConfig (a plain class, no num_global_key_value_heads attribute) exercising the getattr(..., None) default branch, alongside the explicit-null and explicit-int cases.

The google/gemma-4-E*B checkpoints ship "num_global_key_value_heads": null in text_config. Hugging Face defines null/absent as "defaults to num_key_value_heads" (configuration_gemma4.py), and its modeling code only consults the field when attention_k_eq_v is true (false on E*B). MAX read the field without a fallback in two places, so serving google/gemma-4-E4B-it crashed during config resolution: construct_kv_params -> kv_cache_config.to_params(n_kv_heads=None, ...) TypeError: unsupported operand type(s) for %: 'NoneType' and 'int' Add a shared _resolve_num_global_kv_heads() helper implementing the HF fallback and use it in Gemma4TextConfig.initialize_from_config and Gemma4ForConditionalGenerationConfig.construct_kv_params. Behavior for checkpoints with an explicit value (gemma-4-31b-it, gemma-4-26B-A4B-it) is unchanged. Add a CPU-only config-init test with a local gemma-4-E4B-it config.json reproducing the crash path, mirroring the qwen2_5vl test_config_init pattern. Note: E*B variants still do not serve end to end after this fix -- their per-layer-embedding (MatFormer) weights are not yet implemented in the gemma4 graph and weight loading reports them as unexpected keys. This change removes the first blocker and surfaces the real gap. END_PUBLIC Assisted-by: AI Signed-off-by: Manuel Saelices <[email protected]>

… fix - test_config_init.py: cover the absent-attribute branch with a plain object (bare Mocks auto-create attributes), pin the resolved fallback into the cache params (global cache n_kv_heads=2 / head_dim=512, sliding 2/256) instead of a vacuous not-None assert, and add a test for the framework entry point Gemma4ForConditionalGenerationConfig.initialize_from_config. - model_config.py: note in the helper docstring that transformers only documents the null fallback (its code never applies it; modeling consults the field only when attention_k_eq_v is true), so readers don't go looking for HF resolution code that doesn't exist. - ruff format clean at the repo's 80-column config. Assisted-by: AI Signed-off-by: Manuel Saelices <[email protected]>

k-w-w

Thanks for the PR!

k-w-w · 2026-06-13T00:18:28Z

+    the transformers ``configuration_gemma4.py`` docstring defines null/absent
+    as "defaults to ``num_key_value_heads``". Note transformers never applies
+    that fallback in code -- its modeling only consults the field when
+    ``attention_k_eq_v`` is true (false on E*B), so resolving here matches the


The ModelConfig already has attention_k_eq_v, why not add the same self.use_alternative_attention check in Gemma4Attention, to match the transformers reference implementation?

The config-level resolve produces the HF-equivalent value: the field is only consumed for full-attention (global) layers, and on E*B (attention_k_eq_v=False) the resolved value equals num_key_value_heads exactly transformers' "null defaults to num_key_value_heads" semantic. Gemma4Attention already keys its V-projection on attention_k_eq_v (_has_v_proj), so the attention layer's structure already follows the reference. Resolving in the config keeps the null-handling in one place rather than threading it through the layer. Happy to move the resolution into Gemma4Attention instead if you'd prefer the structural parity there, let me know.

Makes sense, thanks for the explanation!

- Correct the _load_hf_config return annotation (PretrainedConfig, not AutoConfig). - Drop trust_remote_code=True: the fixture has no auto_map and importing model_config registers the gemma4 shim, so AutoConfig resolves it natively (review: @Copilot). Assisted-by: AI Signed-off-by: Manuel Saelices <[email protected]>

k-w-w · 2026-06-15T16:29:36Z

!sync

modularbot · 2026-06-16T21:13:37Z

✅🟣 This contribution has been merged 🟣✅

Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the main branch during the next Mojo nightly release, typically within the next 24-48 hours.

We use Copybara to merge external contributions, click here to learn more.

modularbot · 2026-06-17T12:21:12Z

Landed in 30f15cf! Thank you for your contribution 🎉

msaelices requested a review from a team as a code owner June 11, 2026 20:46

Copilot AI review requested due to automatic review settings June 11, 2026 20:46

github-actions Bot added the waiting-on-review label Jun 11, 2026

msaelices mentioned this pull request Jun 11, 2026

[Feature Request] NVFP4 dequant fallback for pre-Blackwell NVIDIA GPUs (serve Gemma 4 NVFP4 on Ampere) #6667

Open

Copilot AI reviewed Jun 11, 2026

View reviewed changes

msaelices marked this pull request as draft June 11, 2026 20:49

msaelices marked this pull request as ready for review June 11, 2026 21:06

msaelices added 2 commits June 11, 2026 21:08

msaelices force-pushed the fix-gemma4-null-num-global-kv-heads branch from 92d3f4b to ccc8fcf Compare June 11, 2026 21:10

k-w-w reviewed Jun 13, 2026

View reviewed changes

k-w-w approved these changes Jun 15, 2026

View reviewed changes

modular-automation Bot assigned k-w-w Jun 15, 2026

modular-automation Bot removed the waiting-on-review label Jun 15, 2026

modularbot added the imported-internally Signals that a given pull request has been imported internally. label Jun 15, 2026

modularbot added merged-internally Indicates that this pull request has been merged internally merged-externally Merged externally in public mojo repo labels Jun 16, 2026

modularbot closed this in 30f15cf Jun 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666

[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666
msaelices wants to merge 3 commits into
modular:mainfrom
msaelices:fix-gemma4-null-num-global-kv-heads

msaelices commented Jun 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

msaelices Jun 13, 2026 •

edited

Loading

Uh oh!

msaelices Jun 13, 2026

Uh oh!

msaelices Jun 13, 2026 •

edited

Loading

Uh oh!

msaelices Jun 13, 2026 •

edited

Loading

Uh oh!

k-w-w left a comment

Uh oh!

k-w-w Jun 13, 2026

Uh oh!

msaelices Jun 13, 2026 •

edited

Loading

Uh oh!

k-w-w Jun 15, 2026

Uh oh!

k-w-w commented Jun 15, 2026

Uh oh!

modularbot commented Jun 16, 2026

Uh oh!

modularbot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		def _load_hf_config() -> AutoConfig:
		return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True)

Conversation

msaelices commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope note

Verification

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

msaelices Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msaelices Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

msaelices Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-w-w left a comment

Choose a reason for hiding this comment

Uh oh!

k-w-w Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

msaelices Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

k-w-w Jun 15, 2026

Choose a reason for hiding this comment

Uh oh!

k-w-w commented Jun 15, 2026

Uh oh!

modularbot commented Jun 16, 2026

Uh oh!

modularbot commented Jun 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

msaelices commented Jun 11, 2026 •

edited

Loading

msaelices Jun 13, 2026 •

edited

Loading

msaelices Jun 13, 2026 •

edited

Loading

msaelices Jun 13, 2026 •

edited

Loading

msaelices Jun 13, 2026 •

edited

Loading