-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Is there support for loading a sharded gguf file ? #1341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@jharonfe I haven't tested it personally but according to the linked discussion it's automatically detected by One caveat is that this probably doesn't work with One thing that would help is a link to a small model that's been split and uploaded, preferrably <7b. |
I tested this using the latest version, 0.2.61, and the model appears to load correctly. Thanks for the feedback on this. |
I just hit this issue today. I had tried using a wild card to specify all of the files, but then had it complain about seeing multiple files. An option like |
Prototype: 0e67a83 |
PR opened: #1457 |
i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky from llama_cpp import Llama
for i in range(1, 6+1): # assuming the shards are split into 6
try:
llm = Llama.from_pretrained(
repo_id="MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4-GGUF",
filename=f"*IQ4_XS-{i:05d}*",
verbose=True,
n_gpu_layers=-1,
)
except:
pass |
Well, my PR has been sitting there for a month now. You might try editing the code additions in yourself to get rid of the jank. |
Is your feature request related to a problem? Please describe.
Inquiring whether this project supports loading a "sharded" gguf model file ? The llama cpp project appears to add tooling for splitting gguf files into pieces (more here). Was curious of the this project supports loading gguf files in that format since I didn't see any mention of it in the documentation or issues.
If it is supported, could you point me to the documentation on this or provide a code example ? If not, perhaps this feature could be added ?
The text was updated successfully, but these errors were encountered: