Thanks to visit codestin.com
Credit goes to github.com

Skip to content
Discussion options

You must be logged in to vote

Managed to find the answer myself. For some reason the logits_all parameter defaults to true and tanks performance. Setting it to false brings the performance on par with pure llama-cpp. Not sure if that's a sensible default, but at least I managed to solve the problem. GPU load is also back to 100% again.

Replies: 2 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@ExtReMLapin
Comment options

@gl2007
Comment options

Answer selected by Mushoz
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
4 participants