Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Jul 4, 2025. It is now read-only.

41 feat batch inference for nitro #101

Merged
merged 6 commits into from
Nov 6, 2023
Merged

Conversation

tikikun
Copy link
Contributor

@tikikun tikikun commented Nov 3, 2023

** IMPORTANT

In order to enable continous batching (multi threading and multiple ccu) when load model need to enable cont_batching value

example

curl -X POST 'http://localhost:3928/inferences/llamacpp/loadModel' \
     -H 'Content-Type: application/json' \
     -d '{
          "llama_model_path": "/Users/alandao/Documents/codes/nitro.cpp_temp/models/llama2_7b_chat_uncensored.Q4_0.gguf",
          "ctx_len": 2048,
          "ngl": 100,
          "cont_batching": true
     }'

@tikikun tikikun self-assigned this Nov 3, 2023
@tikikun tikikun linked an issue Nov 3, 2023 that may be closed by this pull request
@tikikun tikikun merged commit d358274 into main Nov 6, 2023
@hiro-v hiro-v deleted the 41-feat-batch-inference-for-nitro branch November 20, 2023 16:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat: batch inference for nitro
3 participants