41 feat batch inference for nitro #101

tikikun · 2023-11-03T10:09:56Z

** IMPORTANT

In order to enable continous batching (multi threading and multiple ccu) when load model need to enable cont_batching value

example

curl -X POST 'http://localhost:3928/inferences/llamacpp/loadModel' \
     -H 'Content-Type: application/json' \
     -d '{
          "llama_model_path": "/Users/alandao/Documents/codes/nitro.cpp_temp/models/llama2_7b_chat_uncensored.Q4_0.gguf",
          "ctx_len": 2048,
          "ngl": 100,
          "cont_batching": true
     }'

disable DBUILD_SHARED_LIBS cmake windows cuda

tikikun added 3 commits October 26, 2023 15:58

chore: pump llama cpp to latest version with batch inference

26daebc

pump llama

2212f89

feat: add batching inferences + multi threading in drogon cpp

d89f30c

tikikun self-assigned this Nov 3, 2023

tikikun linked an issue Nov 3, 2023 that may be closed by this pull request

feat: batch inference for nitro #41

Closed

tikikun and others added 3 commits November 6, 2023 12:44

remove specific mac

f9d03a5

disable DBUILD_SHARED_LIBS cmake windows cuda

b953692

Merge pull request #108 from janhq/fix/missing-dll-libs

a463cdb

disable DBUILD_SHARED_LIBS cmake windows cuda

tikikun merged commit d358274 into main Nov 6, 2023

hiro-v deleted the 41-feat-batch-inference-for-nitro branch November 20, 2023 16:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

41 feat batch inference for nitro #101

41 feat batch inference for nitro #101

Uh oh!

tikikun commented Nov 3, 2023

Uh oh!

Uh oh!

41 feat batch inference for nitro #101

41 feat batch inference for nitro #101

Uh oh!

Conversation

tikikun commented Nov 3, 2023

Uh oh!

Uh oh!