Closed
Description
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
As stated in the docs, creating Llama with n_ctx=0 should default to the model's trained context length and work.
Current Behavior
Instead, llama_cpp crashes after loading the model.
Environment and Context
Linux 6.2 kernel, Python 3.11, latest llama_cpp instaled with CUBLAS support.
Failure Information / Steps to Reproduce
import llama_cpp
model = llama_cpp.Llama(model_path="../models/llama-2-7b-chat.Q4_K_M.gguf", n_ctx=0)
print(model.n_ctx())
print(model("The quick brown fox jumps ", stop=["."])["choices"][0]["text"])
Extra Info
I think this is related with the following line inside Llama._init_(), effectively setting self.n_batch=0, since n_ctx is 0:
self.n_batch = min(n_ctx, n_batch) # ???
Removing this line and setting self.n_batch = n_batch, avoids the crash. The above code prints the model's trained n_ctx. However later raises an exception during inference and doesn't complete it:
ValueError: could not broadcast input array from shape (39,) into shape (0,)
Metadata
Metadata
Assignees
Labels
No labels