server : add more clean up when cancel_tasks is called #11340

ngxson · 2025-01-21T23:09:57Z

This PR fixes an edge case where multiple requests are sent to the server at once, get queued and not yet processed, but cancel_tasks fail to clean them up.

Also fix a small issue with condition_results.wait_for, where it returns the value of pred() instead of timeout state (aka std::cv_status::timeout). See the docs here.

ggerganov · 2025-01-22T08:46:20Z

examples/server/server.cpp

+        for (size_t i = 0; i < queue_results.size(); i++) {
+            if (queue_results[i]->id == id_task) {
+                queue_results.erase(queue_results.begin() + i);
+                i--;
+            }
+        }


Either use std::remove_if or avoid the decrement of unsigned integer like this:

Suggested change

for (size_t i = 0; i < queue_results.size(); i++) {

if (queue_results[i]->id == id_task) {

queue_results.erase(queue_results.begin() + i);

i--;

}

}

for (size_t i = 0; i < queue_results.size(); ) {

if (queue_results[i]->id == id_task) {

queue_results.erase(queue_results.begin() + i);

} else {

i++;

}

}

ggerganov · 2025-01-22T09:30:43Z

Hm, I just started testing this branch using llama.vim and it dead-locked after a short while. This was the last message in the log:

srv  remove_waiti: remove task 2009 from waiting list. current waiting = 1 (before remove)

I will now do some testing with master branch to see if this bug is something caused by recent changes.

llama-server deadlocks in certain cases

ngxson · 2025-01-22T09:35:29Z

Yeah I realize that there's a problem with my logic in the loop of recv_with_timeout, having a look on it

ggerganov · 2025-01-22T09:36:22Z

So, I think it was a false alarm - not 100% sure. Let me test this branch some more time and will report back if any issue occurs.

ngxson · 2025-01-22T09:44:08Z

examples/server/server.cpp

-            });
-            if (!cr_res) {
+            std::cv_status cr_res = condition_results.wait_for(lock, std::chrono::seconds(timeout));
+            if (cr_res == std::cv_status::timeout) {


The problem is that if the result comes in at the moment when we're being timeout, this loop will again wait for another result, skipping the result already in the queue_results. Then being timeout again, then repeated timeout without checking for result

The simple fix is to move this check to the bottom, so we will read the result queue before wait_for

If we get a result, return without wait_for

If we don't get any result, wait_for

If wait_for timeout, we return nullptr so that the caller can check for HTTP connection state

If NOT timeout, we continue to the next iteration of while (true), which again check for incoming result

Ok, I'm testing the new version now. The previous one was definitely locking.

@ggerganov Were you able to test this? Feel free to let me know if you need help btw

Yup, been running without any issues for a while. It's good to merge.

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

server : add more clean up when cancel_tasks is called

f4dab52

ngxson requested a review from ggerganov January 21, 2025 23:09

github-actions bot added examples server labels Jan 21, 2025

ggerganov previously approved these changes Jan 22, 2025

View reviewed changes

ngxson commented Jan 22, 2025

View reviewed changes

ngxson added 3 commits January 22, 2025 10:44

fix recv_with_timeout

08296ec

std::remove_if

b9e5171

fix std::remove_if

0244e79

ngxson requested a review from ggerganov January 22, 2025 12:06

ggerganov approved these changes Jan 23, 2025

View reviewed changes

ngxson merged commit 5845661 into ggml-org:master Jan 23, 2025
45 checks passed

anagri pushed a commit to BodhiSearch/llama.cpp that referenced this pull request Jan 26, 2025

server : add more clean up when cancel_tasks is called (ggml-org#11340)

99ceba7

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

tinglou pushed a commit to tinglou/llama.cpp that referenced this pull request Feb 13, 2025

server : add more clean up when cancel_tasks is called (ggml-org#11340)

3352466

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Feb 26, 2025

server : add more clean up when cancel_tasks is called (ggml-org#11340)

7e181b8

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

mglambda pushed a commit to mglambda/llama.cpp that referenced this pull request Mar 8, 2025

server : add more clean up when cancel_tasks is called (ggml-org#11340)

f0bf552

* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

server : add more clean up when cancel_tasks is called #11340

server : add more clean up when cancel_tasks is called #11340

Uh oh!

ngxson commented Jan 21, 2025

Uh oh!

ggerganov Jan 22, 2025 •

edited

Loading

Uh oh!

ggerganov commented Jan 22, 2025

Uh oh!

ngxson commented Jan 22, 2025

Uh oh!

ggerganov commented Jan 22, 2025 •

edited

Loading

Uh oh!

ngxson Jan 22, 2025 •

edited

Loading

Uh oh!

ggerganov Jan 22, 2025

Uh oh!

ngxson Jan 23, 2025

Uh oh!

ggerganov Jan 23, 2025

Uh oh!

Uh oh!

Uh oh!

server : add more clean up when cancel_tasks is called #11340

server : add more clean up when cancel_tasks is called #11340

Uh oh!

Conversation

ngxson commented Jan 21, 2025

Uh oh!

ggerganov Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Jan 22, 2025

Uh oh!

ngxson commented Jan 22, 2025

Uh oh!

ggerganov commented Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson Jan 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 22, 2025

Choose a reason for hiding this comment

Uh oh!

ngxson Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

ggerganov Jan 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ggerganov Jan 22, 2025 •

edited

Loading

ggerganov commented Jan 22, 2025 •

edited

Loading

ngxson Jan 22, 2025 •

edited

Loading