-
Notifications
You must be signed in to change notification settings - Fork 1.1k
How to use speculative decoding? #1164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @irthomasthomas that should work, are you sure you have the latest version installed?
If you run that the version should be at least |
Thanks @abetlen, I forgot I was running multiple python environments... Anyway, after updating, I'm now getting a new problem: Is that related to this? ggml-org/llama.cpp#5240 Here is the command I ran: |
I opened a new issue as its unrelated to this one. #1166 |
Hello,
I've read the docs and tried a few different ways to start speculative decoding, but they all fail.
E.g.
error: unrecognized arguments: --draft_model=prompt-lookup-decoding --draft_model_num_pred_tokens=2
or
Extra inputs are not permitted [type=extra_forbidden, input_value='prompt-lookup-decoding', input_type=str]
So what is the correct way to start an openai server with speculative decoding?
Cheers.
The text was updated successfully, but these errors were encountered: