-
Notifications
You must be signed in to change notification settings - Fork 14.3k
model: add Qwen3-Omni Thinker support (qwen3omnimoe) #18420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice job. I think deepcopy might not be needed since you're not modifying anything nested. |
Add support for Qwen3-Omni Thinker, a 48-layer MoE model with 128 experts (8 active per token) and optional shared expert. This enables text-only inference as the foundation for full multimodal support. Key changes: - New architecture: LLM_ARCH_QWEN3OMNIMOE - GGUF conversion with nested thinker_config handling - IMRoPE (Interleaved M-RoPE) with sections [24, 20, 20, 0] - Shared expert support in qwen3vl-moe graph builder - Reuses llm_build_qwen3vlmoe for graph construction
Address review feedback: - Rename class to Qwen3OmniMoeModel, inherit from Qwen2MoeModel - Remove __init__ override (thinker_config handled at L720-722) - Remove set_gguf_parameters (mrope_section via rope_scaling) Keep set_vocab for EOS/PAD: Qwen3-Omni lacks tokenizer.json (uses vocab.json + merges.txt), so SpecialVocab can't discover token IDs automatically.
5969085 to
d4ee36e
Compare
| # Qwen3-Omni lacks tokenizer.json, so token IDs must be set explicitly | ||
| self.gguf_writer.add_eos_token_id(151645) # <|im_end|> - required for generation | ||
| self.gguf_writer.add_pad_token_id(151643) # <|endoftext|> - required for batching |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is incorrect, it's because they for some reason are explicitly set to null in config.json.
| layer.ffn_up_exps = create_tensor(tn(LLM_TENSOR_FFN_UP_EXPS, "weight", i), { n_embd, n_ff_exp, n_expert}, 0); | ||
| } | ||
| } break; | ||
| case LLM_ARCH_QWEN3OMNIMOE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is only Qwen3VLMoe with shared experts added and you are adding shared experts support to qwen3vl-moe.cpp I suggest you do the same here instead of duplicating code.
|
If I understand correctly, qwen3 omni is just qwen3vl with whisper encoder for audio. There is no need to introduce this much changes. The conversation script can simply mark this info. Beside, I don't feel comfortable using AI for anything related to mtmd, it generates too much redundant and overkill code. I will replace this PR with another approach which is much simpler |
Hello @ngxson, I'm back! How does this look for the first PR? I'm open to any feedback.
Original Model: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct
GGUFs: https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF
This PR implements the
thinkermodel only, providing justtext -> text.thinker-f16 on dgx-spark:AI Disclosure
AI was used to write this code, but it was then reviewed, tested, and benchmarked by a human!