Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Is there support for loading a sharded gguf file ? #1341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jharonfe opened this issue Apr 12, 2024 · 7 comments
Closed

Is there support for loading a sharded gguf file ? #1341

jharonfe opened this issue Apr 12, 2024 · 7 comments

Comments

@jharonfe
Copy link

Is your feature request related to a problem? Please describe.
Inquiring whether this project supports loading a "sharded" gguf model file ? The llama cpp project appears to add tooling for splitting gguf files into pieces (more here). Was curious of the this project supports loading gguf files in that format since I didn't see any mention of it in the documentation or issues.

If it is supported, could you point me to the documentation on this or provide a code example ? If not, perhaps this feature could be added ?

@abetlen
Copy link
Owner

abetlen commented Apr 13, 2024

@jharonfe I haven't tested it personally but according to the linked discussion it's automatically detected by llama_load_model_from_file which llama-cpp-python uses.

One caveat is that this probably doesn't work with .from_pretrained yet because that method looks for a single file to pull via the huggingface_hub library. I think adding an option like additional_files there would be good, I'll look into it.

One thing that would help is a link to a small model that's been split and uploaded, preferrably <7b.

@jharonfe
Copy link
Author

I tested this using the latest version, 0.2.61, and the model appears to load correctly. Thanks for the feedback on this.

@ryao
Copy link

ryao commented Apr 18, 2024

I just hit this issue today. I had tried using a wild card to specify all of the files, but then had it complain about seeing multiple files. An option like additional_files would be a nice quality of life change.

@Gnurro
Copy link
Contributor

Gnurro commented May 11, 2024

Prototype: 0e67a83
Will test and open PR if it works.

@Gnurro
Copy link
Contributor

Gnurro commented May 14, 2024

PR opened: #1457

@ozanciga
Copy link

ozanciga commented Jun 7, 2024

i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky

from llama_cpp import Llama

for i in range(1, 6+1):  # assuming the shards are split into 6
    try:
        llm = Llama.from_pretrained(
            repo_id="MaziyarPanahi/Llama-3-70B-Instruct-DPO-v0.4-GGUF",
            filename=f"*IQ4_XS-{i:05d}*",
            verbose=True,
            n_gpu_layers=-1, 
        )
    except:
        pass

@Gnurro
Copy link
Contributor

Gnurro commented Jun 19, 2024

i would appreciate this feature! currently, i do something very hacky like below and it works (once all files are downloaded), but it's pretty clunky

Well, my PR has been sitting there for a month now. You might try editing the code additions in yourself to get rid of the jank.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants