-
Notifications
You must be signed in to change notification settings - Fork 12.1k
server : add more clean up when cancel_tasks is called #11340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
examples/server/server.cpp
Outdated
for (size_t i = 0; i < queue_results.size(); i++) { | ||
if (queue_results[i]->id == id_task) { | ||
queue_results.erase(queue_results.begin() + i); | ||
i--; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either use std::remove_if
or avoid the decrement of unsigned integer like this:
for (size_t i = 0; i < queue_results.size(); i++) { | |
if (queue_results[i]->id == id_task) { | |
queue_results.erase(queue_results.begin() + i); | |
i--; | |
} | |
} | |
for (size_t i = 0; i < queue_results.size(); ) { | |
if (queue_results[i]->id == id_task) { | |
queue_results.erase(queue_results.begin() + i); | |
} else { | |
i++; | |
} | |
} |
Hm, I just started testing this branch using
I will now do some testing with |
llama-server
deadlocks in certain cases
Yeah I realize that there's a problem with my logic in the loop of |
So, I think it was a false alarm - not 100% sure. Let me test this branch some more time and will report back if any issue occurs. |
examples/server/server.cpp
Outdated
}); | ||
if (!cr_res) { | ||
std::cv_status cr_res = condition_results.wait_for(lock, std::chrono::seconds(timeout)); | ||
if (cr_res == std::cv_status::timeout) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is that if the result comes in at the moment when we're being timeout, this loop will again wait for another result, skipping the result already in the queue_results
. Then being timeout again, then repeated timeout without checking for result
The simple fix is to move this check to the bottom, so we will read the result queue before wait_for
- If we get a result, return without
wait_for
- If we don't get any result,
wait_for
- If
wait_for
timeout, we return nullptr so that the caller can check for HTTP connection state - If NOT timeout, we continue to the next iteration of
while (true)
, which again check for incoming result
- If
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm testing the new version now. The previous one was definitely locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ggerganov Were you able to test this? Feel free to let me know if you need help btw
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, been running without any issues for a while. It's good to merge.
* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if
* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if
* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if
* server : add more clean up when cancel_tasks is called * fix recv_with_timeout * std::remove_if * fix std::remove_if
This PR fixes an edge case where multiple requests are sent to the server at once, get queued and not yet processed, but
cancel_tasks
fail to clean them up.Also fix a small issue with
condition_results.wait_for
, where it returns the value ofpred()
instead of timeout state (akastd::cv_status::timeout
). See the docs here.