Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@cfbh-google
Copy link
Collaborator

@cfbh-google cfbh-google commented Oct 12, 2025

Description

Add new configs for Gemma3-12B on v6e-256, 2x-v6e-256 (512), and 4x-v6e-256 (1024) (derived from b/429070413).

Why is this change being made?

This PR adds the missing Gemma3-12B model configs on different v6e configurations.

What is the problem being solved and any relevant context?

Users required to run Gemma3-12B on different v6e configurations. We found some model parameters that results in good MFU. We will create recipes to share the findings.

Why this is a good solution?

We thoroughly tested different configurations (v6e-256, 2x-v6e-256 (512), 4x-v6e-256 (1024)) until we found solutions with good MFU that make the customer happy (b/429070413). We made further improvements by using the relatively new feature of vocab tiling (#2242).

What would be some information about the specific implementation?

Using pure FSDP on 32K sequence length with vocab tiling, custom splash attention block sizes and increased xla_tpu_scoped_vmem_limit_kib.

What are the shortcomings of the solution and possible future improvements?

We could continue looking for better splash attention block sizes.

If the change fixes a bug or a Github issue, please include a link, e.g.,:

FIXES: b/429070413
FIXES: b/451397849

Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.

Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.

Tests

Tested the following configurations using MaxText's benchmark_runner / XPK and verifying the Cloud logs:

  • Gemma3-12B on v6e-256
  • Gemma3-12B on 2x-v6e-256 (512)
  • Gemma3-12B on 4x-v6e-256 (1024)

Checklist

Before submitting this PR, please make sure (put X in square brackets):

  • I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
  • I have necessary comments in my code, particularly in hard-to-understand areas.
  • I have run end-to-end tests tests and provided workload links above if applicable.
  • I have made or will make corresponding changes to the doc if needed.

@cfbh-google cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch 2 times, most recently from 396c59e to cbe1e3c Compare October 13, 2025 18:14
@cfbh-google cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch from cbe1e3c to 633f360 Compare October 17, 2025 23:11
@cfbh-google cfbh-google force-pushed the carlosbus/training_v6e_gemma3_12b branch from 633f360 to 291ec23 Compare October 18, 2025 06:19
@copybara-service copybara-service bot merged commit 50bafeb into main Oct 18, 2025
27 checks passed
@copybara-service copybara-service bot deleted the carlosbus/training_v6e_gemma3_12b branch October 18, 2025 07:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants