Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 97aa3a1

Browse files
committed
docs: Add information re: auto chat formats. Closes abetlen#1236
1 parent f062a7f commit 97aa3a1

File tree

2 files changed

+13
-2
lines changed

2 files changed

+13
-2
lines changed

README.md

+10-1
Original file line numberDiff line numberDiff line change
@@ -286,7 +286,16 @@ By default [`from_pretrained`](https://llama-cpp-python.readthedocs.io/en/latest
286286

287287
The high-level API also provides a simple interface for chat completion.
288288

289-
Note that `chat_format` option must be set for the particular model you are using.
289+
Chat completion requires that the model know how to format the messages into a single prompt.
290+
The `Llama` class does this using pre-registered chat formats (ie. `chatml`, `llama-2`, `gemma`, etc) or by providing a custom chat handler object.
291+
292+
The model will will format the messages into a single prompt using the following order of precedence:
293+
- Use the `chat_handler` if provided
294+
- Use the `chat_format` if provided
295+
- Use the `tokenizer.chat_template` from the `gguf` model's metadata (should work for most new models, older models may not have this)
296+
- else, fallback to the `llama-2` chat format
297+
298+
Set `verbose=True` to see the selected chat format.
290299

291300
```python
292301
>>> from llama_cpp import Llama

llama_cpp/llama.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -410,7 +410,7 @@ def __init__(
410410
bos_token = self._model.token_get_text(bos_token_id)
411411

412412
if self.verbose:
413-
print(f"Using chat template: {template}", file=sys.stderr)
413+
print(f"Using gguf chat template: {template}", file=sys.stderr)
414414
print(f"Using chat eos_token: {eos_token}", file=sys.stderr)
415415
print(f"Using chat bos_token: {bos_token}", file=sys.stderr)
416416

@@ -420,6 +420,8 @@ def __init__(
420420

421421
if self.chat_format is None and self.chat_handler is None:
422422
self.chat_format = "llama-2"
423+
if self.verbose:
424+
print(f"Using fallback chat format: {chat_format}", file=sys.stderr)
423425

424426
@property
425427
def ctx(self) -> llama_cpp.llama_context_p:

0 commit comments

Comments
 (0)