[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666
[Pipelines] Handle null num_global_key_value_heads in Gemma 4 configs#6666msaelices wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
Adds an integration test and fixture config to ensure Gemma4 config initialization handles num_global_key_value_heads: null (HF fallback semantics) without crashing.
Changes:
- Introduces
_resolve_num_global_kv_heads()and uses it in Gemma4 text config initialization + KV param construction. - Adds a local Gemma4 E4B
config.jsonfixture and integration tests that exercise the previously crashing code path. - Updates Bazel targets to include the new test and its data dependency.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| max/python/max/pipelines/architectures/gemma4/model_config.py | Adds null/absent fallback logic for num_global_key_value_heads and applies it in two call sites. |
| max/tests/integration/architectures/gemma4/test_config_init.py | Adds integration tests covering the null-global-KV-heads scenario using a local HF config. |
| max/tests/integration/architectures/gemma4/configs/gemma4_e4b/config.json | Adds a local HF-style config fixture with num_global_key_value_heads: null. |
| max/tests/integration/architectures/gemma4/BUILD.bazel | Registers the new test and wires in the config fixture as test data. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _load_hf_config() -> AutoConfig: | ||
| return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True) |
There was a problem hiding this comment.
Fixed. Annotation is now PretrainedConfig.
| CONFIG_DIR = Path(__file__).parent / "configs" / "gemma4_e4b" | ||
|
|
||
|
|
||
| def _load_hf_config() -> AutoConfig: | ||
| return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True) |
There was a problem hiding this comment.
Kept the directory form: AutoConfig.from_pretrained expects a directory (or repo id), not a config.json path — passing the file directly is the less-supported route. The fixture's config.json is a data dep so it materializes in runfiles, and the test resolves it green in CI.
|
|
||
|
|
||
| def _load_hf_config() -> AutoConfig: | ||
| return AutoConfig.from_pretrained(str(CONFIG_DIR), trust_remote_code=True) |
There was a problem hiding this comment.
Fixed. Dropped trust_remote_code. The fixture has no auto_map, and importing model_config registers the gemma4 config shim, so AutoConfig resolves it natively without it.
| def test_resolve_num_global_kv_heads() -> None: | ||
| """null/absent falls back to num_key_value_heads; explicit value wins.""" | ||
| null_config = Mock(num_global_key_value_heads=None, num_key_value_heads=2) | ||
| assert _resolve_num_global_kv_heads(null_config) == 2 |
There was a problem hiding this comment.
Already covered. test_resolve_num_global_kv_heads includes an AbsentFieldConfig (a plain class, no num_global_key_value_heads attribute) exercising the getattr(..., None) default branch, alongside the explicit-null and explicit-int cases.
The google/gemma-4-E*B checkpoints ship "num_global_key_value_heads": null
in text_config. Hugging Face defines null/absent as "defaults to
num_key_value_heads" (configuration_gemma4.py), and its modeling code only
consults the field when attention_k_eq_v is true (false on E*B).
MAX read the field without a fallback in two places, so serving
google/gemma-4-E4B-it crashed during config resolution:
construct_kv_params -> kv_cache_config.to_params(n_kv_heads=None, ...)
TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'
Add a shared _resolve_num_global_kv_heads() helper implementing the HF
fallback and use it in Gemma4TextConfig.initialize_from_config and
Gemma4ForConditionalGenerationConfig.construct_kv_params. Behavior for
checkpoints with an explicit value (gemma-4-31b-it, gemma-4-26B-A4B-it) is
unchanged.
Add a CPU-only config-init test with a local gemma-4-E4B-it config.json
reproducing the crash path, mirroring the qwen2_5vl test_config_init
pattern.
Note: E*B variants still do not serve end to end after this fix -- their
per-layer-embedding (MatFormer) weights are not yet implemented in the
gemma4 graph and weight loading reports them as unexpected keys. This
change removes the first blocker and surfaces the real gap.
END_PUBLIC
Assisted-by: AI
Signed-off-by: Manuel Saelices <[email protected]>
… fix - test_config_init.py: cover the absent-attribute branch with a plain object (bare Mocks auto-create attributes), pin the resolved fallback into the cache params (global cache n_kv_heads=2 / head_dim=512, sliding 2/256) instead of a vacuous not-None assert, and add a test for the framework entry point Gemma4ForConditionalGenerationConfig.initialize_from_config. - model_config.py: note in the helper docstring that transformers only documents the null fallback (its code never applies it; modeling consults the field only when attention_k_eq_v is true), so readers don't go looking for HF resolution code that doesn't exist. - ruff format clean at the repo's 80-column config. Assisted-by: AI Signed-off-by: Manuel Saelices <[email protected]>
92d3f4b to
ccc8fcf
Compare
| the transformers ``configuration_gemma4.py`` docstring defines null/absent | ||
| as "defaults to ``num_key_value_heads``". Note transformers never applies | ||
| that fallback in code -- its modeling only consults the field when | ||
| ``attention_k_eq_v`` is true (false on E*B), so resolving here matches the |
There was a problem hiding this comment.
The ModelConfig already has attention_k_eq_v, why not add the same self.use_alternative_attention check in Gemma4Attention, to match the transformers reference implementation?
There was a problem hiding this comment.
The config-level resolve produces the HF-equivalent value: the field is only consumed for full-attention (global) layers, and on E*B (attention_k_eq_v=False) the resolved value equals num_key_value_heads exactly transformers' "null defaults to num_key_value_heads" semantic. Gemma4Attention already keys its V-projection on attention_k_eq_v (_has_v_proj), so the attention layer's structure already follows the reference. Resolving in the config keeps the null-handling in one place rather than threading it through the layer. Happy to move the resolution into Gemma4Attention instead if you'd prefer the structural parity there, let me know.
There was a problem hiding this comment.
Makes sense, thanks for the explanation!
- Correct the _load_hf_config return annotation (PretrainedConfig, not AutoConfig). - Drop trust_remote_code=True: the fixture has no auto_map and importing model_config registers the gemma4 shim, so AutoConfig resolves it natively (review: @Copilot). Assisted-by: AI Signed-off-by: Manuel Saelices <[email protected]>
|
!sync |
|
✅🟣 This contribution has been merged 🟣✅ Your pull request has been merged to the internal upstream Mojo sources. It will be reflected here in the Mojo repository on the main branch during the next Mojo nightly release, typically within the next 24-48 hours. We use Copybara to merge external contributions, click here to learn more. |
|
Landed in 30f15cf! Thank you for your contribution 🎉 |
Summary
Fixes the
TypeError: unsupported operand type(s) for %: 'NoneType' and 'int'crash when servinggoogle/gemma-4-E4B-it(fixes #6665).The
google/gemma-4-E*Bcheckpoints ship"num_global_key_value_heads": nullintext_config. Hugging Face defines null/absent as "defaults tonum_key_value_heads" (configuration_gemma4.py); its modeling code only consults the field whenattention_k_eq_vis true (false on E*B). MAX read it without a fallback inGemma4TextConfig.initialize_from_configandGemma4ForConditionalGenerationConfig.construct_kv_params._resolve_num_global_kv_heads()helper implementing the HF fallback, used at both call sites. Checkpoints with an explicit value (gemma-4-31b-it,gemma-4-26B-A4B-it) are unaffected.test_config_init.pywith a localgemma-4-E4B-itconfig.jsonexercising the crash path (mirrors theqwen2_5vltest_config_initpattern), registered as its own bazel target and excluded from the GPUtestsglob.Scope note
E*B variants still do not serve end to end after this fix: their per-layer-embedding (MatFormer) weights are not implemented in the gemma4 graph and now surface as
strict=Trueunexpected keys during weight load (details in #6665). This PR removes the first blocker and makes the real gap visible.The same latent pattern in
unified_mtp_gemma4/model.py(tc.get("num_global_key_value_heads", 4)—.getwith a default still returnsNonefor an explicit JSON null) is deliberately left to a follow-up: that architecture is exercised by speculative-decoding paths I cannot test here. It is flagged in #6665.Verification
New tests: 4/4 pass (run against a patched 26.3.0 install with
--noconftest; the conftest needs torch). Reverting the two call sites reproduces the originalTypeError, so the tests genuinely cover the crash path.max serve --model google/gemma-4-E4B-itwith the fix applied to 26.3.0 on an A10G (g5.xlarge) progresses past config resolution, builds the pipeline, and reaches weight loading (where the separate PLE gap reports). For that serve check the hardcoded 15 GiB vision-activation headroom (estimate_activation_memory, blocker 2 in [BUG] Serving google/gemma-4-E4B-it crashes: num_global_key_value_heads is null -> TypeError in construct_kv_params #6665) was locally reduced — it otherwise rejects E4B on a 24 GB card at memory estimation, before loading._resolve_num_global_kv_headsexercised against the realconfig.jsonof all five google/gemma-4 checkpoints:gemma-4-26B-A4B-it22gemma-4-31b-it44gemma-4-12b-it11gemma-4-E4B-itnull2(=num_key_value_heads)gemma-4-E2B-itnull1(=num_key_value_heads)Assisted-by: AI