PHP-FPM: Killing idle child issue using pm=ondemand

Hi

# Introduction 
Recently, I was playing around with PHP-FPM while using the process manager `ondemand`.  

`ondemand` can only be used with kqueue (BSD) or epoll (Linux):

https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_conf.c#L922-L923

The following is a snippet of my pool configuration:
```
....
pm = ondemand
pm.max_children = 6
pm.process_idle_timeout = 30s;
pm.max_requests = 500
```

This means that each child is being killed if it has been idle for more than 30s. And this works as expected (when running 1 child).
```
# Last curl
Mon Nov 27 14:37:05 CET 2023

# Logs (30seconds after the last request)
[27-Nov-2023 14:37:36.344310] DEBUG: pid 26359, fpm_got_signal(), line 82: received SIGCHLD
[27-Nov-2023 14:37:36.351210] DEBUG: pid 26359, fpm_children_bury(), line 283: [pool xy-82] child 26360 has been killed by the process management after 99.362241 seconds from start
```

# Problem
So this works like it's supposed to. The problem arises if I have a burst in requests, as seen in the screenshot:
![Screenshot burst ended](https://github.com/php/php-src/assets/2048399/29e41446-a85f-4e19-a63a-87d368cd2747)
In the beginning (after the burst), I have 3 running children:
1. PID: 38733 - 14:54:26 CET
2. PID: 39039 - 14:54:59 CET
3. PID: 39051 - 14:54:59 CET


![Screenshot +1 minute](https://github.com/php/php-src/assets/2048399/e1d174e8-4f25-42b2-a4ee-54c87bd6925d)
After a minute, I still have 3 children. Even two processes (PID 39039, 39051) did not get any new requests (see counter)

![Screenshot +4 minutes](https://github.com/php/php-src/assets/2048399/3c130b9b-ea85-459a-8960-3dd8d2e763e5)
After 4 minutes, I still have 2 children. Even PID 39051 did not get any new requests (see counter).


## Reason
I dug into the php-fpm code and was able to find the problematic code snippet. On these lines, php-fpm tries to find the `last_idle_child`. For that, it iterates over his active children and if it is `idle` it tries to find the "oldest" idle child based on the `->started` time. In my case, PID 38733 is idle and the oldest child and is therefore being selected as `last_idle_child`, even though it gets all the current requests (see counter).
https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_process_ctl.c#L361-L374

And there is the problem. Because of how epoll/kqueue works ([Cloudflare Blog: Why does one NGINX worker take all the load?](https://blog.cloudflare.com/the-sad-state-of-linux-socket-balancing/)), it is possible that one child gets all the load/requests. The selected `last_idle_child` (PID 38733) is then checked to see if it exceeds the `pm.process_idle_timeout`. This is not the case for PID 38733, as it is the child which handles all the requests. 

And that is the reason PID 39051 is not getting killed even it did not serve any request in the last 3 minutes.

https://github.com/php/php-src/blob/d26068059e83fe40de3430a512471d194119bee0/sapi/fpm/fpm/fpm_process_ctl.c#L388-L390

# Expected behaviour
I would expect, that every child exceeding the `pm.process_idle_timeout` should be killed by the fpm-master. Regardless of whether it has been alive for a long time or not.

## Possible Solution
It is not that easy to find a solution for this problem. I came up with this idea:
Check if `pm.process_idle_timeout` is reached for each child on every run (for ondemand) instead of picking the `last_idle_child`. This may be a bit more CPU intensive.  Any other ideas?


### PHP Version

PHP 8.2.13

### Operating System

FreeBSD 12.4

	for (child = wp->children; child; child = child->next) {
	if (fpm_request_is_idle(child)) {
	if (last_idle_child == NULL) {
	last_idle_child = child;
	} else {
	if (timercmp(&child->started, &last_idle_child->started, <)) {
	last_idle_child = child;
	}
	}
	idle++;
	} else {
	active++;
	}
	}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PHP-FPM: Killing idle child issue using pm=ondemand #12798

Introduction

Problem

Reason

Expected behaviour

Possible Solution

PHP Version

Operating System

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	if (!fpm_event_support_edge_trigger()) {
	zlog(ZLOG_ALERT, "[pool %s] ondemand process manager can ONLY be used when events.mechanism is either epoll (Linux) or kqueue (*BSD).", wp->config->name);

	if (last.tv_sec < now.tv_sec - wp->config->pm_process_idle_timeout) {
	fpm_pctl_kill_idle_child(last_idle_child);
	}

PHP-FPM: Killing idle child issue using pm=ondemand #12798

Description

Introduction

Problem

Reason

Expected behaviour

Possible Solution

PHP Version

Operating System

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions