Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Misc. bug: Requests stuck forever with concurrent requests #15008

@davidef

Description

@davidef

Name and Version

version: 6056 (baad948)

Operating systems

Mac

Which llama.cpp modules do you know to be affected?

llama-server

Command line

llama-server -c 4096 --embedding --model mxbai-embed-large-v1/gguf/mxbai-embed-large-v1-f16.gguf

Problem description & steps to reproduce

With 3+ concurrent requests to the embedding endpoint (but I think it applies also to others too) sending back-to-back requests, one (or more if you add more concurrent requests) get stuck in the deferred queue almost until you stop sending new requests (see logs below).

It seams like than when previous task is completed and pop_deferred_task is called there is already a newly received task in queue_tasks so the task poped from queue_tasks_deferred queue will be second in queue_tasks so it will later deferred again (as last) in queue_tasks_deferred queue.

The following patch seems to fix this issue but, as I haven't been working with cpp for a while, I don't know if there are any other implications of this change.

diff --git a/tools/server/server.cpp b/tools/server/server.cpp
index 35d66104..b2dd4c7d 100644
--- a/tools/server/server.cpp
+++ b/tools/server/server.cpp
@@ -1703,7 +1703,7 @@ struct server_queue {
     void pop_deferred_task() {
         std::unique_lock<std::mutex> lock(mutex_tasks);
         if (!queue_tasks_deferred.empty()) {
-            queue_tasks.emplace_back(std::move(queue_tasks_deferred.front()));
+            queue_tasks.emplace_front(std::move(queue_tasks_deferred.front()));
             queue_tasks_deferred.pop_front();
         }
         condition_tasks.notify_one();

First Bad Commit

No response

Relevant log output

Only queue log reported. In this run (with 3 concurrent requests) task 0 il always deferred again (will be eventually processed with we stop sending new requests)

que          post: new task, id = 1/1, front = 0
que          post: new task, id = 2/1, front = 0
que          post: new task, id = 0/1, front = 0
que    start_loop: processing new tasks
que    start_loop: processing task, id = 1
que    start_loop: processing task, id = 2
que         defer: defer task, id = 2
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 3, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 3
que    start_loop: processing task, id = 2
que    start_loop: update slots
que          post: new task, id = 4, front = 0
que          post: new task, id = 5/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 4
que    start_loop: processing task, id = 5
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 6, front = 0
que          post: new task, id = 7/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 6
que    start_loop: processing task, id = 7
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 8, front = 0
que          post: new task, id = 9/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 8
que    start_loop: processing task, id = 9
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 10, front = 0
que          post: new task, id = 11/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 10
que    start_loop: processing task, id = 11
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 12, front = 0
que          post: new task, id = 13/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 12
que    start_loop: processing task, id = 13
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 14, front = 0
que          post: new task, id = 15/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 14
que    start_loop: processing task, id = 15
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 16, front = 0
que          post: new task, id = 17/1, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 16
que    start_loop: processing task, id = 17
que    start_loop: processing task, id = 0
que         defer: defer task, id = 0
que    start_loop: update slots
que          post: new task, id = 18, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 18
que    start_loop: processing task, id = 0
que    start_loop: update slots
que          post: new task, id = 19, front = 0
que    start_loop: waiting for new tasks
que    start_loop: processing new tasks
que    start_loop: processing task, id = 19
que    start_loop: update slots
que    start_loop: waiting for new tasks

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions