Add config for Gemma3-12B on v6e-256/512/1024 #2488

cfbh-google · 2025-10-12T23:59:53Z

Description

Add new configs for Gemma3-12B on v6e-256, 2x-v6e-256 (512), and 4x-v6e-256 (1024) (derived from b/429070413).

Why is this change being made?

This PR adds the missing Gemma3-12B model configs on different v6e configurations.

What is the problem being solved and any relevant context?

Users required to run Gemma3-12B on different v6e configurations. We found some model parameters that results in good MFU. We will create recipes to share the findings.

Why this is a good solution?

We thoroughly tested different configurations (v6e-256, 2x-v6e-256 (512), 4x-v6e-256 (1024)) until we found solutions with good MFU that make the customer happy (b/429070413). We made further improvements by using the relatively new feature of vocab tiling (#2242).

What would be some information about the specific implementation?

Using pure FSDP on 32K sequence length with vocab tiling, custom splash attention block sizes and increased xla_tpu_scoped_vmem_limit_kib.

What are the shortcomings of the solution and possible future improvements?

We could continue looking for better splash attention block sizes.

If the change fixes a bug or a Github issue, please include a link, e.g.,:

FIXES: b/429070413
FIXES: b/451397849

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Tested the following configurations using MaxText's benchmark_runner / XPK and verifying the Cloud logs:

Gemma3-12B on v6e-256
Gemma3-12B on 2x-v6e-256 (512)
Gemma3-12B on 4x-v6e-256 (1024)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed.

benchmarks/maxtext_trillium_model_configs.py

cfbh-google requested review from NuojCheng, RissyRan, SujeethJinesh, bvandermoon, gobbleturk, khatwanimohit, mitalisi, richjames0, shauryagup, shralex and vipannalla as code owners October 12, 2025 23:59

NuojCheng approved these changes Oct 13, 2025

View reviewed changes

bvandermoon reviewed Oct 13, 2025

View reviewed changes

benchmarks/maxtext_trillium_model_configs.py Outdated Show resolved Hide resolved

cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch 2 times, most recently from 396c59e to cbe1e3c Compare October 13, 2025 18:14

bvandermoon approved these changes Oct 13, 2025

View reviewed changes

cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch from cbe1e3c to 633f360 Compare October 17, 2025 23:11

NuojCheng added the pull ready label Oct 18, 2025

Add config for Gemma3-12B on v6e-256/512/1024

291ec23

cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch from 633f360 to 291ec23 Compare October 18, 2025 06:19

copybara-service bot merged commit 50bafeb into main Oct 18, 2025
27 checks passed

copybara-service bot deleted the carlosbus/training_v6e_gemma3_12b branch October 18, 2025 07:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add config for Gemma3-12B on v6e-256/512/1024 #2488

Add config for Gemma3-12B on v6e-256/512/1024 #2488

Uh oh!

cfbh-google commented Oct 12, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add config for Gemma3-12B on v6e-256/512/1024 #2488

Add config for Gemma3-12B on v6e-256/512/1024 #2488

Uh oh!

Conversation

cfbh-google commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Why is this change being made?

What is the problem being solved and any relevant context?

Why this is a good solution?

What would be some information about the specific implementation?

What are the shortcomings of the solution and possible future improvements?

If the change fixes a bug or a Github issue, please include a link, e.g.,:

Tests

Checklist

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

cfbh-google commented Oct 12, 2025 •

edited

Loading