Codestin Search App

awni · 2025-11-26T04:16:17Z

Benchmark for RMS norm and VJP with total size 1024*1024*8 and varying the last dimension that is normalized over

Forward

D	Pre (ms)	Post (ms)
64	4.333	0.567
128	2.296	0.560
256	1.260	0.551
512	0.767	0.604
1024	0.718	0.607
2048	0.736	0.614
4096	0.772	0.625
8192	0.984	0.691

VJP

D	Pre (ms)	Post (ms)
64	12.24	3.865
128	6.532	2.844
256	3.321	2.279
512	2.452	2.141
1024	2.269	1.974
2048	2.277	1.982
4096	2.448	2.131
8192	2.884	2.362

awni · 2025-11-26T16:12:53Z

The improvement for pretraining 0.6B is ok but not as much as I was hoping:

On B200:

Pre: toks_per_sec: 96970.51
Post: toks_per_sec: 105429.75

zcbenz

Very nice improvement!

awni added 2 commits November 26, 2025 04:15

faster rms norm for small dimension

11b62ee

faster rms norm and vjp for small dimensions

cabf4cb

awni requested a review from zcbenz November 26, 2025 16:13

awni changed the title ~~[WIP][CUDA] Faster rms norm for small dimension~~ [CUDA] Faster rms norm for small dimension Nov 26, 2025

zcbenz approved these changes Nov 26, 2025

View reviewed changes

awni merged commit dd79d3c into ml-explore:main Nov 26, 2025
12 checks passed

awni deleted the faster_rms_norm branch December 3, 2025 15:10

BrewTestBot mentioned this pull request Dec 18, 2025

mlx 0.30.1 Homebrew/homebrew-core#259125

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CUDA] Faster rms norm for small dimension#2838

[CUDA] Faster rms norm for small dimension#2838
awni merged 2 commits intoml-explore:mainfrom
awni:faster_rms_norm

awni commented Nov 26, 2025 •

edited

Loading

Uh oh!

awni commented Nov 26, 2025

Uh oh!

zcbenz left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

awni commented Nov 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Forward

VJP

Uh oh!

awni commented Nov 26, 2025

Uh oh!

zcbenz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

awni commented Nov 26, 2025 •

edited

Loading