Conversation
…ections for shared MLP and MoE experts
|
Thanks! Does this actually work? I guess I never considered just adding those matrices blindly and letting |
|
Sure! here is the full log model uploaded to Hugginface: |
|
yeah same here, I wrote some code that more or less does the same as this PR Abliteration parameters
Performance
give this a spin: https://huggingface.co/pszemraj/granite-4.0-h-1b-heretic LFM2 models, like https://hf.co/LiquidAI/LFM2-2.6B are proving a bit trickier fwiw - it seems to need more than simply adding the conv layers (i even tried expanding the ranges too) |
|
oh oops, my bad. I realized this PR is for the transformer MoE version of granite-4.0, I did the mamba hybrid one(s) (as.. that was the whole point) + LFM2 if you like, I can submit a separate PR? |
|
Thank you guys, awesome project!) |
|
Great! @p-e-w i'll take a look later today or tomorrow & fork/reconcile my changes vs latest main |
Add support for Granite MoE Hybrid in model.py by including down projections for shared MLP and MoE experts