[Feature]: Improve GGUF loading from HuggingFace user experience like repo_id:quant_type#29137
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a user-friendly way to load GGUF models from Hugging Face using the repo_id:quant_type format. The changes are well-structured, touching configuration, argument parsing, and the GGUF model loader. The addition of unit tests is also a great practice. I've found one high-severity issue in the file searching logic that could prevent some models from loading correctly. My feedback includes a specific code suggestion to address this.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
@codex review |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Isotr0py
left a comment
There was a problem hiding this comment.
Thanks for adding this feature! I just leave some initial comments. PTAL! :)
|
This pull request has merge conflicts that must be resolved before it can be |
Signed-off-by: Injae Ryou <[email protected]>
Signed-off-by: Injae Ryou <[email protected]>
Signed-off-by: Injae Ryou <[email protected]>
Signed-off-by: Injae Ryou <[email protected]>
…ectly - Changed `_prepare_weights` to take `ModelConfig` instead of a model path. - Updated `download_gguf` to include an optional `revision` parameter. - Adjusted `download_model` and `load_weights` methods to work with the new `_prepare_weights` signature. Signed-off-by: Injae Ryou <[email protected]>
Signed-off-by: Injae Ryou <[email protected]>
- remove 'gguf_quant_type' in ModelConfig - move 'download_gguf' to weight_utils.py - strictly check 'quant_type' in 'is_remote_gguf' - leave self.model as repo_id:quant_type - raise error in 'split_remote_gguf' - split invalid remote_gguf_model - invalid gguf_quant_type (different from GGMLQuantizationType) Signed-off-by: Injae Ryou <[email protected]>
Signed-off-by: Injae Ryou <[email protected]>
7435750 to
eca3898
Compare
|
@Isotr0py |
|
Sorry for the delay! I'm relatively busy recently, will try to take a look tomorrow ASAP! :) |
Signed-off-by: Isotr0py <[email protected]>
Isotr0py
left a comment
There was a problem hiding this comment.
Overall LGTM! Just leave a nit.
Signed-off-by: Isotr0py <[email protected]>
|
Nice!!! Thanks for the PR!!! |
The tokenizer argument is required for GGUF.
As far as I know, current vLLM need tokenizer files. |
|
Strange, according to this diagram https://huggingface.co/docs/transformers/gguf the files can have the tokenizer embedded inside. |
That's right, GGUF also includes tokenizer information. As far as I know, the current vLLM loads the tokenizer based on the file in the HF. |
… repo_id:quant_type (vllm-project#29137) Signed-off-by: Injae Ryou <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
… repo_id:quant_type (vllm-project#29137) Signed-off-by: Injae Ryou <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]>
… repo_id:quant_type (vllm-project#29137) Signed-off-by: Injae Ryou <[email protected]> Signed-off-by: Isotr0py <[email protected]> Co-authored-by: Isotr0py <[email protected]> Signed-off-by: dsuhinin <[email protected]>
Purpose
Fixes #25182
Improve GGUF loading from HuggingFace user experience like repo_id:quant_type
Test Plan
Test Result
[BEFORE] vllm serve unsloth/Qwen3-0.6B-GGUF:IQ1_S --tokenizer Qwen/Qwen3-0.6B
[AFTER] vllm serve unsloth/Qwen3-0.6B-GGUF:IQ1_S --tokenizer Qwen/Qwen3-0.6B
pytest tests/models/test_gguf_download.py
pytest tests/transformers_utils/test_utils.py
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.