feat: add fused QKV for other bitnet models #6

younesbelkada · 2025-06-15T07:38:14Z

Blaizzy · 2025-06-15T09:39:26Z

mlx_lm/quant/utils.py

I thought about making it configurable. But I think we can just keep it fused for bitnet because of the increased performance.

Makes sense, let's put it to True always then

Blaizzy · 2025-06-15T09:40:23Z

mlx_lm/quant/utils.py



-def apply_hf_quantization(model, config):
+def apply_hf_quantization(model, config, weights):


Please ensure the tests bitnet_llama tests are passing. Because I use this function there

Blaizzy

This is awesome!

I just have a couple nits.

Blaizzy · 2025-06-15T09:42:20Z

mlx_lm/models/bitlinear_layers.py

            print(f"{name:<16}: {dt*1e3:.1f} ms | {(bs*sl)/dt:,.0f} tok/s")


+class BitFusedAttention(nn.Module):


Let's also import this module on bitnet.py to avoid having duplicates.

younesbelkada · 2025-06-15T11:50:59Z

Addressed all comments ! LMK wdyt

feat: add fused QKV for other bitnet models

c007c1f

Blaizzy reviewed Jun 15, 2025

View reviewed changes

Blaizzy requested changes Jun 15, 2025

View reviewed changes

Blaizzy reviewed Jun 15, 2025

View reviewed changes

address all comments

f21ec19

younesbelkada requested a review from Blaizzy June 15, 2025 11:50

Blaizzy mentioned this pull request Jun 16, 2025

Add bitnet1.58 with custom metal kernel ml-explore/mlx-lm#219

Merged

Blaizzy merged commit 02ee4f4 into Blaizzy:pc/fused-bitnet Jun 21, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add fused QKV for other bitnet models #6

feat: add fused QKV for other bitnet models #6

Uh oh!

younesbelkada commented Jun 15, 2025 •

edited

Loading

Uh oh!

Blaizzy Jun 15, 2025

Uh oh!

younesbelkada Jun 15, 2025

Uh oh!

Blaizzy Jun 15, 2025 •

edited

Loading

Uh oh!

Blaizzy left a comment

Uh oh!

Blaizzy Jun 15, 2025

Uh oh!

younesbelkada commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def apply_hf_quantization(model, config):
		def apply_hf_quantization(model, config, weights):

		print(f"{name:<16}: {dt1e3:.1f} ms \| {(bssl)/dt:,.0f} tok/s")


		class BitFusedAttention(nn.Module):

feat: add fused QKV for other bitnet models #6

feat: add fused QKV for other bitnet models #6

Uh oh!

Conversation

younesbelkada commented Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Blaizzy Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

younesbelkada Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

Blaizzy Jun 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Blaizzy left a comment

Choose a reason for hiding this comment

Uh oh!

Blaizzy Jun 15, 2025

Choose a reason for hiding this comment

Uh oh!

younesbelkada commented Jun 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

younesbelkada commented Jun 15, 2025 •

edited

Loading

Blaizzy Jun 15, 2025 •

edited

Loading