Thanks to visit codestin.com
Credit goes to github.com

Skip to content

PHP-FPM: Killing idle child issue using pm=ondemand #12798

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pesc opened this issue Nov 27, 2023 · 2 comments
Open

PHP-FPM: Killing idle child issue using pm=ondemand #12798

pesc opened this issue Nov 27, 2023 · 2 comments
Assignees

Comments

@pesc
Copy link
Contributor

pesc commented Nov 27, 2023

Hi

Introduction

Recently, I was playing around with PHP-FPM while using the process manager ondemand.

ondemand can only be used with kqueue (BSD) or epoll (Linux):

if (!fpm_event_support_edge_trigger()) {
zlog(ZLOG_ALERT, "[pool %s] ondemand process manager can ONLY be used when events.mechanism is either epoll (Linux) or kqueue (*BSD).", wp->config->name);

The following is a snippet of my pool configuration:

....
pm = ondemand
pm.max_children = 6
pm.process_idle_timeout = 30s;
pm.max_requests = 500

This means that each child is being killed if it has been idle for more than 30s. And this works as expected (when running 1 child).

# Last curl
Mon Nov 27 14:37:05 CET 2023

# Logs (30seconds after the last request)
[27-Nov-2023 14:37:36.344310] DEBUG: pid 26359, fpm_got_signal(), line 82: received SIGCHLD
[27-Nov-2023 14:37:36.351210] DEBUG: pid 26359, fpm_children_bury(), line 283: [pool xy-82] child 26360 has been killed by the process management after 99.362241 seconds from start

Problem

So this works like it's supposed to. The problem arises if I have a burst in requests, as seen in the screenshot:
Screenshot burst ended
In the beginning (after the burst), I have 3 running children:

  1. PID: 38733 - 14:54:26 CET
  2. PID: 39039 - 14:54:59 CET
  3. PID: 39051 - 14:54:59 CET

Screenshot +1 minute
After a minute, I still have 3 children. Even two processes (PID 39039, 39051) did not get any new requests (see counter)

Screenshot +4 minutes
After 4 minutes, I still have 2 children. Even PID 39051 did not get any new requests (see counter).

Reason

I dug into the php-fpm code and was able to find the problematic code snippet. On these lines, php-fpm tries to find the last_idle_child. For that, it iterates over his active children and if it is idle it tries to find the "oldest" idle child based on the ->started time. In my case, PID 38733 is idle and the oldest child and is therefore being selected as last_idle_child, even though it gets all the current requests (see counter).

for (child = wp->children; child; child = child->next) {
if (fpm_request_is_idle(child)) {
if (last_idle_child == NULL) {
last_idle_child = child;
} else {
if (timercmp(&child->started, &last_idle_child->started, <)) {
last_idle_child = child;
}
}
idle++;
} else {
active++;
}
}

And there is the problem. Because of how epoll/kqueue works (Cloudflare Blog: Why does one NGINX worker take all the load?), it is possible that one child gets all the load/requests. The selected last_idle_child (PID 38733) is then checked to see if it exceeds the pm.process_idle_timeout. This is not the case for PID 38733, as it is the child which handles all the requests.

And that is the reason PID 39051 is not getting killed even it did not serve any request in the last 3 minutes.

if (last.tv_sec < now.tv_sec - wp->config->pm_process_idle_timeout) {
fpm_pctl_kill_idle_child(last_idle_child);
}

Expected behaviour

I would expect, that every child exceeding the pm.process_idle_timeout should be killed by the fpm-master. Regardless of whether it has been alive for a long time or not.

Possible Solution

It is not that easy to find a solution for this problem. I came up with this idea:
Check if pm.process_idle_timeout is reached for each child on every run (for ondemand) instead of picking the last_idle_child. This may be a bit more CPU intensive. Any other ideas?

PHP Version

PHP 8.2.13

Operating System

FreeBSD 12.4

@bukka
Copy link
Member

bukka commented Dec 1, 2023

Hi, there are already related PR's and reported bugs for this:

I just went again through this and your proposed solution is #4104 which I'm not sure will make much difference for round robin selection that we have now. Essentially the plan would be to also introduce epoll / kevent based selection that should improve things but it's a bit of work to do.

@drsheep404
Copy link

drsheep404 commented Jun 14, 2024

Hi,

I saw that the idle-timeout never reached (for example at 5s), because the fpm-handler seems let running task "randomly" for one of all available processes.
This has following bad side-effects:

  • In peaks (worst case): pm.max_children is reached and "stay" available, because within 5s every of this pm.max_children do one task (even if there are 150 available). Only if traffic drops extremly, there is a chance, that some worker reach the idle-timeout.

What about this idea?

  • The fpm-handler save a 32-bit array [PID | STATE] of active processes and use it like "PID ASC && idle===true". So only the "oldest" processes (if == idle) will execute the queries, newer (spawned) processes will reach idle-timeout and the "ondemand" function works like it should.
  • Also the "max_requests" result only in "if max reached, kill and respawn (even if it's not needed) the new process".

Idea 2 (much faster and maybe easier):
If max_requests reached, only kill the PID/process. Don't spawn automatically (directly) a new one, only if needed (what "ondemand" should do) like the regular function via cold-start.
So the user has the choice how "aggressive" or fast the handler should handle the running process management (by setting max_requests).

Best regards :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants