Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Early page not found middleware based solely on page path.#12895

Open
aembler wants to merge 7 commits into
concretecms:9.5.xfrom
aembler:early-not-found-checker
Open

Early page not found middleware based solely on page path.#12895
aembler wants to merge 7 commits into
concretecms:9.5.xfrom
aembler:early-not-found-checker

Conversation

@aembler
Copy link
Copy Markdown
Member

@aembler aembler commented May 5, 2026

This pull request introduces a new Early404Middleware to optimize 404 handling by quickly returning cached 404 responses for requests that do not match any route or page path, reducing unnecessary processing. It also updates the ResponseFactory to support cached 404 responses and ensures the on_page_not_found event is dispatched in both cached and standard 404 scenarios. The changes are covered by new unit tests for the middleware.

This is a zero-configuration, simpler version of #12867


public function process(Request $request, DelegateInterface $frame)
{
$pathInfo = rawurldecode($request->getPathInfo());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this "if" as "if the requested path contains path traversals, process it as usual".

Shouldn't it be the other way around?
I mean, in case of path traversals we should return an "early 404" response imho...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll fix that.

private function hasPotentialRouteForFirstSegment(string $firstSegment): bool
{
$pattern = '#^' . preg_quote($firstSegment, '#') . '(?:/|$)#';
foreach ($this->router->getRoutes()->all() as $route) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really want to iterate all routes for every single request?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be a more performant way of doing this if we modify the router. There's a secondary concern though - do we really need to register all routes on every request (even the user is logged out, for example).

@aembler
Copy link
Copy Markdown
Member Author

aembler commented May 7, 2026

I thought I'd run some tests. On my local machine, atomik install out of the box, no additional caching config, 9.5.0. This is the apachebench average of 10 requests:

First, visiting /about/faq (a valid page) without the early 404 middleware. I'll run the test three times

Time per request:       204.422 [ms] (mean)
Time per request:       200.430 [ms] (mean)
Time per request:       204.360 [ms] (mean)

Now WITH the middleware:

Time per request:       195.191 [ms] (mean)
Time per request:       197.789 [ms] (mean)
Time per request:       207.646 [ms] (mean)

Pretty highly variable. I find the abnormally fast requests to be very interesting, but I think it's probably just noise. Obviously parsing all routes will incur some performance penalty, but I think by this demonstration it's pretty clear that the difference is minimal and highly variable.

Now let's check against /index.php/wp-login.php (completely invalid page), first without the middleware

Time per request:       218.484 [ms] (mean)
Time per request:       206.094 [ms] (mean)
Time per request:       205.343 [ms] (mean)

Now with the middleware:

Time per request:       101.816 [ms] (mean)
Time per request:       105.530 [ms] (mean)
Time per request:       102.520 [ms] (mean)

I think the real benefits from the zero-configuration 404 checker are pretty obvious in this case.

You do raise a reasonable concern about the routes, but I'd counter that we already have performance issues with routes and I'd urge us to solve it separately. Route caching would bring significant benefit to Concrete outside of this pull request.

@aembler
Copy link
Copy Markdown
Member Author

aembler commented May 7, 2026

Found some additional bugs on a second pass - we obviously don't want to use the cached 404 page or set the cached 404 page if we're logged in, so I've added checking for that.

@mlocati
Copy link
Copy Markdown
Contributor

mlocati commented May 8, 2026

See also #12867 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants