-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
Name and Version
version: 6056 (baad948)
Operating systems
Mac
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server -c 4096 --embedding --model mxbai-embed-large-v1/gguf/mxbai-embed-large-v1-f16.ggufProblem description & steps to reproduce
With 3+ concurrent requests to the embedding endpoint (but I think it applies also to others too) sending back-to-back requests, one (or more if you add more concurrent requests) get stuck in the deferred queue almost until you stop sending new requests (see logs below).
It seams like than when previous task is completed and pop_deferred_task is called there is already a newly received task in queue_tasks so the task poped from queue_tasks_deferred queue will be second in queue_tasks so it will later deferred again (as last) in queue_tasks_deferred queue.
The following patch seems to fix this issue but, as I haven't been working with cpp for a while, I don't know if there are any other implications of this change.
diff --git a/tools/server/server.cpp b/tools/server/server.cpp
index 35d66104..b2dd4c7d 100644
--- a/tools/server/server.cpp
+++ b/tools/server/server.cpp
@@ -1703,7 +1703,7 @@ struct server_queue {
void pop_deferred_task() {
std::unique_lock<std::mutex> lock(mutex_tasks);
if (!queue_tasks_deferred.empty()) {
- queue_tasks.emplace_back(std::move(queue_tasks_deferred.front()));
+ queue_tasks.emplace_front(std::move(queue_tasks_deferred.front()));
queue_tasks_deferred.pop_front();
}
condition_tasks.notify_one();
First Bad Commit
No response
Relevant log output
Only queue log reported. In this run (with 3 concurrent requests) task 0 il always deferred again (will be eventually processed with we stop sending new requests)
que post: new task, id = 1/1, front = 0
que post: new task, id = 2/1, front = 0
que post: new task, id = 0/1, front = 0
que start_loop: processing new tasks
que start_loop: processing task, id = 1
que start_loop: processing task, id = 2
que defer: defer task, id = 2
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 3, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 3
que start_loop: processing task, id = 2
que start_loop: update slots
que post: new task, id = 4, front = 0
que post: new task, id = 5/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 4
que start_loop: processing task, id = 5
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 6, front = 0
que post: new task, id = 7/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 6
que start_loop: processing task, id = 7
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 8, front = 0
que post: new task, id = 9/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 8
que start_loop: processing task, id = 9
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 10, front = 0
que post: new task, id = 11/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 10
que start_loop: processing task, id = 11
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 12, front = 0
que post: new task, id = 13/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 12
que start_loop: processing task, id = 13
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 14, front = 0
que post: new task, id = 15/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 14
que start_loop: processing task, id = 15
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 16, front = 0
que post: new task, id = 17/1, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 16
que start_loop: processing task, id = 17
que start_loop: processing task, id = 0
que defer: defer task, id = 0
que start_loop: update slots
que post: new task, id = 18, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 18
que start_loop: processing task, id = 0
que start_loop: update slots
que post: new task, id = 19, front = 0
que start_loop: waiting for new tasks
que start_loop: processing new tasks
que start_loop: processing task, id = 19
que start_loop: update slots
que start_loop: waiting for new tasks