Thanks to visit codestin.com
Credit goes to github.com

Skip to content

How to use speculative decoding? #1164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
irthomasthomas opened this issue Feb 6, 2024 · 3 comments
Closed

How to use speculative decoding? #1164

irthomasthomas opened this issue Feb 6, 2024 · 3 comments

Comments

@irthomasthomas
Copy link

Hello,

I've read the docs and tried a few different ways to start speculative decoding, but they all fail.
E.g.
error: unrecognized arguments: --draft_model=prompt-lookup-decoding --draft_model_num_pred_tokens=2

or
Extra inputs are not permitted [type=extra_forbidden, input_value='prompt-lookup-decoding', input_type=str]

So what is the correct way to start an openai server with speculative decoding?

Cheers.

@abetlen
Copy link
Owner

abetlen commented Feb 6, 2024

Hey @irthomasthomas that should work, are you sure you have the latest version installed?

python3 -c "import llama_cpp; print(llama_cpp.__version__)"

If you run that the version should be at least 0.2.38 for speculative decoding support.

@irthomasthomas
Copy link
Author

irthomasthomas commented Feb 7, 2024

Thanks @abetlen, I forgot I was running multiple python environments... Anyway, after updating, I'm now getting a new problem: ValueError: Attempt to split tensors that exceed maximum supported devices. Current LLAMA_MAX_DEVICES=1

Is that related to this? ggml-org/llama.cpp#5240

Here is the command I ran:
python3 -m llama_cpp.server --model /run/media/thomas/828ca7e5-381d-4149-8fc5-6d6aa26b90f2/Models/deepseek/deepseek-coder-33B-instruct-GGUF/deepseek-coder-33b-instruct.Q4_K_M.gguf --n_gpu_layers 56 --tensor_split 64 36 --offload_kqv false --n_ctx 8000 --n_batch 56 --chat_format chatml

@irthomasthomas
Copy link
Author

I opened a new issue as its unrelated to this one. #1166

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants