Llama(n_ctx = 0) doesn't work

# Prerequisites

Please answer the following questions for yourself before submitting an issue.

- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [x] I carefully followed the [README.md](https://github.com/abetlen/llama-cpp-python/blob/main/README.md).
- [x] I [searched using keywords relevant to my issue](https://docs.github.com/en/issues/tracking-your-work-with-issues/filtering-and-searching-issues-and-pull-requests) to make sure that I am creating a new issue that is not already open (or closed).
- [x] I reviewed the [Discussions](https://github.com/abetlen/llama-cpp-python/discussions), and have a new bug or useful enhancement to share.

# Expected Behavior

As stated in the docs, creating Llama with n_ctx=0 should default to the model's trained context length and work.

# Current Behavior

Instead, llama_cpp crashes after loading the model.


# Environment and Context

Linux 6.2 kernel, Python 3.11, latest llama_cpp instaled with CUBLAS support.

# Failure Information / Steps to Reproduce

```
import llama_cpp
model = llama_cpp.Llama(model_path="../models/llama-2-7b-chat.Q4_K_M.gguf", n_ctx=0)
print(model.n_ctx())
print(model("The quick brown fox jumps ", stop=["."])["choices"][0]["text"])
```

# Extra Info

I think this is related with the following line inside Llama.\__init\__(), effectively setting self.n_batch=0, since n_ctx is 0:
```
        self.n_batch = min(n_ctx, n_batch)  # ???
```

Removing this line and setting self.n_batch = n_batch, avoids the crash. The above code prints the model's trained n_ctx. However later raises an exception during inference and doesn't complete it:

``` 
ValueError: could not broadcast input array from shape (39,) into shape (0,)
```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Llama(n_ctx = 0) doesn't work #988

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information / Steps to Reproduce

Extra Info

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Llama(n_ctx = 0) doesn't work #988

Description

Prerequisites

Expected Behavior

Current Behavior

Environment and Context

Failure Information / Steps to Reproduce

Extra Info

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions