Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@afrideva
Copy link
Contributor

Encountered several repos with a single file "model.safetensors" that would fail to convert, example: https://huggingface.co/mesolitica/malaysian-tinyllama-1.1b-16384-instructions/tree/main

Found #3097 while searching for existing PR's, should be safe to close now.

if path.is_dir():
# Check if it's a set of safetensors files first
files = list(path.glob("model-00001-of-*.safetensors"))
globs = ["model-00001-of-*.safetensors", "model.safetensors"]
Copy link
Contributor

@AlpinDale AlpinDale Nov 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just *.safetensors? That's the common approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the same thing, but that could, under some absurd circumstances cause problems. We never know what people do with their stuff... Might wait for someone with another option though

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlpinDale The glob looks like it's deliberately trying to target the first part of the set with model-00001-of-*. If it was just *.safetensors then you could get model-99999-of-99999.safetensors which is probably not what you want to load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python indexes the files alphabetically when using a glob, so that is a non-issue. I'm simply pointing out that this way of doing it is unconventional and I've not seen any other project do this.

Copy link
Contributor

@KerfuffleV2 KerfuffleV2 Nov 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python indexes the files alphabetically when using a glob

I'm pretty sure that's not the case. The documentation doesn't even mention order: https://docs.python.org/3/library/pathlib.html#pathlib.Path.glob

Note also that their examples are like sorted(Path('.').glob('*.py')) which would be redundant if it was guaranteed to be already sorted.

I'm simply pointing out that this way of doing it is unconventional

That may be the case, but your proposed change would break it. There's actually a

1046if len(files) > 1:
1047raise # ...

a couple lines down. This is specifically supposed to pull in the first file of the set, not all of them.

Copy link
Contributor

@KerfuffleV2 KerfuffleV2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a good improvement in its current state.

@afrideva Was there anything else you want to change before the pull gets merged?

@afrideva
Copy link
Contributor Author

This looks like a good improvement in its current state.

@afrideva Was there anything else you want to change before the pull gets merged?

Looks good to me

@KerfuffleV2 KerfuffleV2 merged commit b46d12f into ggml-org:master Nov 14, 2023
KerfuffleV2 pushed a commit to KerfuffleV2/llama.cpp that referenced this pull request Nov 17, 2023
* add safetensors to convert.py help message

* Check for single-file safetensors model

* Update convert.py "model" option help message

* revert convert.py help message change
olexiyb pushed a commit to Sanctum-AI/llama.cpp that referenced this pull request Nov 23, 2023
* add safetensors to convert.py help message

* Check for single-file safetensors model

* Update convert.py "model" option help message

* revert convert.py help message change
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants