Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@pwilkin
Copy link
Collaborator

@pwilkin pwilkin commented Oct 28, 2025

Implementation for Minimax M2 - not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

@github-actions github-actions bot added testing Everything test related python python script changes labels Oct 28, 2025
@pwilkin pwilkin force-pushed the minimax-m2 branch 2 times, most recently from 48aab51 to 06ed421 Compare October 28, 2025 23:13
@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 28, 2025

Closes #16798

@xldistance
Copy link

xldistance commented Oct 30, 2025

minmax-m2's reply is missing the think tag <think>.

@ubergarm
Copy link

ubergarm commented Oct 30, 2025

I ran this PR with the q8_0 by DevQuasar and seems to be working. Without --jinja it got stuck in a loop but likely due to chat template stuff, but with --jinja and hitting /v1/chat/completions it seems to work okay.

It does not print an initial <think> as mentioned above, but does close the </think>. The original model card mentions this model is unique in that the client should not strip the think blocks which could lead to issues.

Full command and perplexity results (looks fine) here: https://huggingface.co/DevQuasar/MiniMaxAI.MiniMax-M2-GGUF/discussions/1

Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.

@ark3
Copy link

ark3 commented Oct 30, 2025

Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)?

Unknown argument ensure_ascii for function tojson at row 7, column 52:
{%- for tool in tool_list -%}
<tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
                                                   ^
{% endfor -%}

@CISC
Copy link
Collaborator

CISC commented Oct 30, 2025

Tool calls don't work yet? Or is that just this particular GGUF (from bullerwins)?

Unknown argument ensure_ascii for function tojson at row 7, column 52:
{%- for tool in tool_list -%}
<tool>{{ tool.function | tojson(ensure_ascii=False) }}</tool>
                                                   ^
{% endfor -%}

ensure_ascii is not yet supported by minja, see google/minja#84 but this model needs additional support for the chat template anyway, not within the scope of this PR, see OP.

@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 30, 2025

Remove the vocab files and test, if there is a good reason to test the vocab (which AFAICT there is not) we can add it to ggml-org/vocabs on HF.

Done.

@pwilkin pwilkin reopened this Oct 30, 2025
@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 30, 2025

Argh, stupid codespaces.

@CISC rebased on current master, should be OK now.

@danielhanchen
Copy link
Contributor

@pwilkin Fantastic work and thanks as always for your open source work!

@ggerganov
Copy link
Member

not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

Is it worth merging if this does not work?

@CISC
Copy link
Collaborator

CISC commented Oct 31, 2025

not doing the chat template yet because not sure how to handle the interleaving thinking blocks.

Is it worth merging if this does not work?

I think the jinja template works if you just remove ensure_ascii, @danielhanchen and many other GGUF uploaders patch the templates to work.

@danielhanchen
Copy link
Contributor

@CISC Yep I normally just remove it for now

@CISC
Copy link
Collaborator

CISC commented Oct 31, 2025

@CISC Yep I normally just remove it for now

It's weird too, I don't understand why some are using it in their templates as it is default, makes no sense...

@CISC CISC added model Model specific hot Something that is hot and removed testing Everything test related labels Oct 31, 2025
Copy link
Collaborator

@CISC CISC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready to merge when CIs are done.

pwilkin and others added 2 commits October 31, 2025 14:18
Co-authored-by: Sigbjørn Skjæret <[email protected]>
Co-authored-by: Sigbjørn Skjæret <[email protected]>
@pwilkin
Copy link
Collaborator Author

pwilkin commented Oct 31, 2025

@CISC look OK to me, the failures are unrelated (webgpu).

@CISC CISC merged commit 0de0a01 into ggml-org:master Oct 31, 2025
70 of 74 checks passed
@aldehir
Copy link
Collaborator

aldehir commented Oct 31, 2025

not sure how to handle the interleaving thinking blocks.

Seems similar to gpt-oss in this regard, except for all messages and not just tool calls.

It should work if clients pass back assistant messages with reasoning_content intact. Or am I misunderstanding?

@gopinath87607
Copy link

guys any idea why the thinking tag still didn't get fixed ?

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Nov 1, 2025

Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple <think>...</think> blocks gracefully.

We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks.

@xldistance
Copy link

Once --reasoning-format none is set on the backend, everything should work as the reasoning content will be passed back to the server; the rest is purely cosmetic, like adding a lightweight, dedicated front-end filter or toggle to handle multiple ... blocks gracefully.

We could take a more modular approach: the backend could properly parse the blocks and send alternating delta reasoning_content / delta content, while a simple front-end option could resend “reasoning_content as content” with a configurable delimiter. It would fit nicely within the OpenAI-Compat layer: though it might be a bit of overengineering... but it would cover all possible cases without needing any additional parsing logic or frontend-side hacks.

Adding --reasoning-format none still results in missing tink tags.

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Nov 1, 2025

Adding --reasoning-format none still results in missing tink tags.

Are you running the latest version? Else please rebase and try :
Sans titre
The AST was broken before the recent PRs; now the Svelte UI no longer drops XML tags, whether Markdown rendering is enabled or not. If it still doesn't display, the issue lies elsewhere.
I'll give the model a try, but running the IQ2 build with only 96 GB of RAM and 32 GB of VRAM will probably feel like watching a 320p DivX on a 486

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Nov 1, 2025

The Jinja template includes a generation prompt that pre-opens a <think> block ({{- ']~b]ai' ~ '\n' ~ '<think>' ~ '\n' }}), which can cause misaligned reasoning output.
Sans titre
A detection codepath should be added to handle or skip this pre-opened block automatically

37.2 tok/s on GPU+CPU Please open an issue; I’ll check the codepath tomorrow.
Sans titre

@ServeurpersoCom
Copy link
Collaborator

ServeurpersoCom commented Nov 1, 2025

OK : https://huggingface.co/MiniMaxAI/MiniMax-M2/blob/main/chat_template.jinja

MiniMax-M2 is the first model that actually requires this behavior (the reasoning_content must be preserved in context), so it deserves its own special option.
A new --reasoning-format minimax-m2 should behave like none, but emit an initial <think>\n chunk at the start of streaming.
It’s a pragmatic and minimalist solution: the change is lightweight, low-risk, and avoids any regression while keeping the reasoning block intact for context replay.
Only a small front-end toggle would be needed to display it properly.

@gopinath87607
Copy link

i dont know why they merged this pr impo this is not good.

@CISC
Copy link
Collaborator

CISC commented Nov 1, 2025

i dont know why they merged this pr impo this is not good.

Not everything has to (or should) be done in a single PR.

@ServeurpersoCom
Copy link
Collaborator

Sans titre

@hksdpc255
Copy link

For anyone interested in enabling tool calls for Minimax M2, refer to PR #16932 — I’ve managed to get tool calls working.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hot Something that is hot model Model specific python python script changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.