Thanks to visit codestin.com
Credit goes to github.com

Skip to content

IBM Granite support#14

Merged
p-e-w merged 1 commit intop-e-w:masterfrom
Ai-Swat:feature/ibm-granite-support
Nov 18, 2025
Merged

IBM Granite support#14
p-e-w merged 1 commit intop-e-w:masterfrom
Ai-Swat:feature/ibm-granite-support

Conversation

@Ooooze
Copy link
Contributor

@Ooooze Ooooze commented Nov 17, 2025

Add support for Granite MoE Hybrid in model.py by including down projections for shared MLP and MoE experts

@p-e-w
Copy link
Owner

p-e-w commented Nov 17, 2025

Thanks! Does this actually work? I guess I never considered just adding those matrices blindly and letting try_add sort out the layers. Can you post the results of the Heretic run?

@Ooooze
Copy link
Contributor Author

Ooooze commented Nov 17, 2025

Sure! here is the full log
heretic.log

model uploaded to Hugginface:
https://huggingface.co/Biogenic/granite-4.0-micro-heretic-uncensored

@pszemraj
Copy link

pszemraj commented Nov 17, 2025

yeah same here, I wrote some code that more or less does the same as this PR

Abliteration parameters

Parameter Value
direction_index 24.65
attn.o_proj.max_weight 1.13
attn.o_proj.max_weight_position 30.77
attn.o_proj.min_weight 0.50
attn.o_proj.min_weight_distance 17.16
mamba.out_proj.max_weight 1.44
mamba.out_proj.max_weight_position 26.90
mamba.out_proj.min_weight 0.91
mamba.out_proj.min_weight_distance 19.24
mlp.shared_down_proj.max_weight 0.86
mlp.shared_down_proj.max_weight_position 28.04
mlp.shared_down_proj.min_weight 0.00
mlp.shared_down_proj.min_weight_distance 13.42

Performance

Metric This model Original model (ibm-granite/granite-4.0-h-1b)
KL divergence 0.03 0 (by definition)
Refusals 12/100 93/100

give this a spin: https://huggingface.co/pszemraj/granite-4.0-h-1b-heretic

LFM2 models, like https://hf.co/LiquidAI/LFM2-2.6B are proving a bit trickier fwiw - it seems to need more than simply adding the conv layers (i even tried expanding the ranges too)

@pszemraj
Copy link

oh oops, my bad. I realized this PR is for the transformer MoE version of granite-4.0, I did the mamba hybrid one(s) (as.. that was the whole point) + LFM2

if you like, I can submit a separate PR?

@p-e-w p-e-w merged commit 61fdf72 into p-e-w:master Nov 18, 2025
@p-e-w
Copy link
Owner

p-e-w commented Nov 18, 2025

Thanks, great stuff!

I made the same error as @pszemraj in assuming that this is the Mamba version. I actually wasn't aware that Granite also has a traditional transformer variant. Either way, it's great to have broader model support, the more the merrier.

@pszemraj Yes please.

@Ooooze
Copy link
Contributor Author

Ooooze commented Nov 18, 2025

Thank you guys, awesome project!)

@pszemraj
Copy link

Great! @p-e-w i'll take a look later today or tomorrow & fork/reconcile my changes vs latest main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments