Codestin Search App

ljluestc · 2026-03-01T20:50:15Z

feat: Add DeepSeek R1 and distilled model support

Closes #1952

Summary

Adds full chat format support for DeepSeek R1, DeepSeek R1 Distill (Qwen), and DeepSeek R1 Distill (Llama) models. Updates the llama.cpp submodule to b8184 which includes native architecture support for DeepSeek R1/V2/V3.

Problem

DeepSeek R1 and its distilled variants are among the most popular open-weight reasoning models, but llama-cpp-python currently lacks both the inference backend support and the chat format handling required to run them correctly. Users attempting to load DeepSeek R1 GGUFs get incorrect prompt formatting, double BOS tokens, and missing architecture support at the C++ layer.

Changes

`llama_cpp/llama_chat_format.py`

Added DEEPSEEK_R1_CHAT_TEMPLATE constant sourced from the official HuggingFace tokenizer config
Added DEEPSEEK_R1_BOS_TOKEN and DEEPSEEK_R1_EOS_TOKEN constants using DeepSeek's fullwidth Unicode special tokens (\uff5c, \u2581)
Registered three new chat formats:
- deepseek-r1 — primary format with correct special token handling (<｜User｜>, <｜Assistant｜>, <｜begin▁of▁sentence｜>, <｜end▁of▁sentence｜>)
- deepseek-r1-distill-qwen — alias for Qwen-based distilled models
- deepseek-r1-distill-llama — alias for Llama-based distilled models
Updated guess_chat_format_from_gguf_metadata() to auto-detect DeepSeek R1 models via:
- Exact template match against DEEPSEEK_R1_CHAT_TEMPLATE
- Heuristic fallback checking for characteristic <｜User｜> / <｜Assistant｜> tokens in the chat template
Handles </think> reasoning content stripping in multi-turn conversations — prior assistant turns have their chain-of-thought reasoning removed to keep context clean
Sets added_special=True in the formatter response to prevent double BOS token injection during tokenization

`llama_cpp/init.py`

Version bump from 0.3.16 → 0.3.17

`vendor/llama.cpp`

Updated submodule to b8184 (3191462) which adds native DeepSeek R1/V2/V3 architecture support in the inference backend

Testing

All 11 tests pass (2 existing + 9 new):

- Update llama.cpp submodule to latest (b8184) for full DeepSeek R1/V2/V3 architecture support - Add 'deepseek-r1' chat format with correct special tokens (<｜User｜>, <｜Assistant｜>, <｜begin▁of▁sentence｜>, <｜end▁of▁sentence｜>) - Add 'deepseek-r1-distill-qwen' and 'deepseek-r1-distill-llama' chat format aliases for distilled model variants - Add DEEPSEEK_R1_CHAT_TEMPLATE constant from official HuggingFace tokenizer config - Update guess_chat_format_from_gguf_metadata() to auto-detect DeepSeek R1 models via template matching and heuristic token detection - Handle </think> reasoning content stripping for multi-turn conversations - Bump version to 0.3.17 Closes abetlen#1952

The format_deepseek_r1 function already includes the BOS token (<｜begin▁of▁sentence｜>) in the formatted prompt, but was not setting added_special=True in the ChatFormatterResponse. This caused chat_formatter_to_chat_completion_handler to pass add_bos=True to the tokenizer, resulting in a duplicate BOS token. Also adds comprehensive tests for: - Single-turn and multi-turn conversations - System message handling - </think> reasoning content stripping - Distilled model aliases (qwen/llama) - Auto-detection via exact match and heuristic

ljluestc added 2 commits March 1, 2026 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add DeepSeek R1 and distilled model support#2131

feat: Add DeepSeek R1 and distilled model support#2131
ljluestc wants to merge 2 commits intoabetlen:mainfrom
ljluestc:feat/deepseek-r1-support

ljluestc commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ljluestc commented Mar 1, 2026

feat: Add DeepSeek R1 and distilled model support

Summary

Problem

Changes

llama_cpp/llama_chat_format.py

llama_cpp/__init__.py

vendor/llama.cpp

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`llama_cpp/llama_chat_format.py`

`llama_cpp/init.py`

`vendor/llama.cpp`