Thanks to visit codestin.com
Credit goes to github.com

Skip to content

register pressure#3116

Merged
awni merged 1 commit intoml-explore:mainfrom
nastya236:columnwise_speed_up
Feb 10, 2026
Merged

register pressure#3116
awni merged 1 commit intoml-explore:mainfrom
nastya236:columnwise_speed_up

Conversation

@nastya236
Copy link
Collaborator

@nastya236 nastya236 commented Feb 9, 2026

There is a register pressure in fp_quantize_columnwise. We (I) were (was) using 40 registers per thread and launch 1 block.
mxfp8:

Size (M×N) Time (us) –after Time (us) – before
4096×4096 77.89 78.98
4096×8192 101.11 105.38
8192×4096 99.76 105.91
8192×8192 140.29 152.07
4096×16384 144.56 154.76
16384×4096 136.55 148.44

I tried to tune it more, block_size.x = 16 seems like an optimal value for the current kernel.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@awni awni merged commit be52cf6 into ml-explore:main Feb 10, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants