suggestion: without exponential backoff #1970

AlliBalliBaba · 2025-11-11T22:27:07Z

suggestion for #1955, this is what a solution without backoff would look like.

Startup failures will immediately return an error on Init().

dunglas · 2025-11-12T05:57:26Z

threadworker.go

+	}
+
+	// wait a bit and try again
+	time.Sleep(time.Millisecond * 250)


Isn't it better to use an exponential back off strategy here? 😅

makes sense, I'll add a minimal version

Thinking about it, I'm not sure if having an exponential wait backoff will help with either script failures when watching or external resource failures. In both cases the time-to-resolution would probably be in the range of seconds.

I'm not against keeping it though

For external services, exponential backoff prevent these services from being flooded with requests when they are up again.

Sometimes (see recent AWS issues), these issues take hours to be fixed.

threadworker.go

worker.go

henderkes · 2025-11-12T07:28:39Z

I think I like this better. Lower complexity and the failure case (frankenphp_handle-request not reached) is extremely unlikely to be solved by trying again.

dunglas · 2025-11-12T14:03:12Z

I wonder if it's really a good idea to crash the server because a single worker script fails at startup. App booting may fail because an external service is down, for instance, and crashing the server just for that is a bit too much IMHO.

Also, apps may run dozens of different worker scripts, preventing the whole server from starting because one is crashing (for instance, because a remote API is down) could be unwanted.

AlliBalliBaba · 2025-11-12T14:10:41Z

This PR doesn't change the logic with startup failures, it will just fail immediately. There are not many alternatives, workers have to get to a 'ready' state, otherwise the server cannot start accepting requests.

AlliBalliBaba · 2025-11-12T15:50:33Z

Just failing immediately is the easiest solution from our side IMO. Users can always configure some kind of process/container supervision if they expect random startup failure on deployments (which they should do anyways).

If an external service that is needed for startup is down, then the expected behavior should be to fail startup.

That being sad, it would definitely be possible to start the server in a half-broken state where some workers might be failing. Maybe by marking some workers as 'essential' and others as 'non essential'. Or by adding a frankenphp_set_ready() function. That's something that goes beyond this PR though.

henderkes · 2025-11-12T16:43:17Z

Just failing immediately is the easiest solution from our side IMO. Users can always configure some kind of process/container supervision if they expect random startup failure on deployments (which they should do anyways).

I agree. Not to mention, nobody should be connecting to (possibly failing) outside resources in their worker startup script.

That being sad, it would definitely be possible to start the server in a half-broken state where some workers might be failing. Maybe by marking some workers as 'essential' and others as 'non essential'. Or by adding a frankenphp_set_ready() function.

I don't think we even need that extra complexity. Startup scripts should handle that on their own, we don't need an extra method for it.

AlliBalliBaba · 2025-11-12T18:48:39Z

True, if an external resource is allowed to fail, the application should actually handle that itself.

dunglas · 2025-11-12T19:53:02Z

@henderkes I've already seen a lot of apps retrieving secrets from HashiCorp Vault, config from etcd, cached data from Redis, feature-flags from SaaS like Unleash or translations from Lokalize when booting.

I think that it's pretty common, and, while failures can (and should) be handled user-land, it would be nice to be as convenient as possible if a service like that is down.

For instance, the non-worker mode will not hard-fail if something like that happens. It will just return an error (likely just a 500) until the service is up again.

IMHO, it would be nice to have a similar behavior when using the worker mode.

henderkes · 2025-11-12T20:03:31Z

IMHO, it would be nice to have a similar behavior when using the worker mode.

I agree, but I don't see how it would be possible on our side. Should we automatically mark workers as ready even though they've never reached frankenphp_handle_request, or should we mark them as inactive, not ready to pass requests to, while periodically retrying their initial bootup?

FrankenPHP's current behaviour is to fail if a worker fails too often on startup. I can see the value in getting secrets and stuff once on worker bootup, but then they couldn't serve their sites in regular mode at all. So they'd be best advised to retry those calls in the request handling and not rely on them solely on worker boot.

henderkes · 2025-11-12T20:06:37Z

I suppose what we could do is retry booting the worker script on requests, meaning FrankenPHP would stay up, even though worker scripts have failed. But then we're not giving users any indication what's wrong until they look in the logs.

AlliBalliBaba · 2025-11-12T22:32:27Z

@henderkes I've already seen a lot of apps retrieving secrets from HashiCorp Vault, config from etcd, cached data from Redis, feature-flags from SaaS like Unleash or translations from Lokalize when booting

I guess there's an argument to be made if you spam these services with 50 workers on startup 😅 . Alright, we can keep the backoff.

I'll change it so we'll keep the error instead of panicking since that's actually testable.

This reverts commit e93a6a9.

This reverts commit ba28ea0.

This reverts commit 32e649c.

threadworker.go

AlliBalliBaba · 2025-11-12T23:11:36Z

threadworker.go

+	backoffDuration := time.Duration(handler.failureCount*handler.failureCount*100) * time.Millisecond
+	if backoffDuration > time.Second {
+		backoffDuration = time.Second
+	}
+	handler.failureCount++
+	time.Sleep(backoffDuration)


The actual backoff logic is just 6 lines of code, so probably no module or library necessary

AlliBalliBaba · 2025-11-13T22:38:33Z

Maybe there's merit in revisiting the failure logic at some point. This branch only makes the failure testable instead of panicking, so I'll merge it into #1955 for now.

AlliBalliBaba added 2 commits November 11, 2025 23:21

removes backoff.

da5ac03

Adjusts comment.

924056e

AlliBalliBaba mentioned this pull request Nov 11, 2025

refactor: cleanup modules + c #1955

Merged

AlliBalliBaba requested a review from withinboredom November 11, 2025 22:29

dunglas requested changes Nov 12, 2025

View reviewed changes

AlliBalliBaba added 3 commits November 12, 2025 14:50

Suggestions by @dunglas

9922f4a

Removes 'max_consecutive_failures'

32e649c

Removes 'max_consecutive_failures'

ba28ea0

Adjusts warning.

e93a6a9

Disables the logger in tests.

9e60ac0

AlliBalliBaba added 5 commits November 12, 2025 23:48

Revert "Adjusts warning."

6cc1ea6

This reverts commit e93a6a9.

Revert "Removes 'max_consecutive_failures'"

6642773

This reverts commit ba28ea0.

Revert "Removes 'max_consecutive_failures'"

db8297c

This reverts commit 32e649c.

Only fails on max failures again.

d4f26bb

Restores failure timings.

7af4f80

AlliBalliBaba commented Nov 12, 2025

View reviewed changes

threadworker.go Outdated Show resolved Hide resolved

AlliBalliBaba commented Nov 12, 2025

View reviewed changes

AlliBalliBaba merged commit a36547b into refator/cleanup-c Nov 13, 2025

AlliBalliBaba deleted the refactor/remove-exponential-backoff branch November 13, 2025 22:38

suggestion: without exponential backoff #1970

suggestion: without exponential backoff #1970

Uh oh!

Conversation

AlliBalliBaba commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dunglas Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

AlliBalliBaba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

AlliBalliBaba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

dunglas Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henderkes commented Nov 12, 2025

Uh oh!

dunglas commented Nov 12, 2025

Uh oh!

AlliBalliBaba commented Nov 12, 2025

Uh oh!

AlliBalliBaba commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

henderkes commented Nov 12, 2025

Uh oh!

AlliBalliBaba commented Nov 12, 2025

Uh oh!

dunglas commented Nov 12, 2025

Uh oh!

henderkes commented Nov 12, 2025

Uh oh!

henderkes commented Nov 12, 2025

Uh oh!

AlliBalliBaba commented Nov 12, 2025

Uh oh!

Uh oh!

AlliBalliBaba Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

AlliBalliBaba commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

AlliBalliBaba commented Nov 11, 2025 •

edited

Loading

AlliBalliBaba commented Nov 12, 2025 •

edited

Loading