Early page not found middleware based solely on page path.#12895
Early page not found middleware based solely on page path.#12895aembler wants to merge 7 commits into
Conversation
|
|
||
| public function process(Request $request, DelegateInterface $frame) | ||
| { | ||
| $pathInfo = rawurldecode($request->getPathInfo()); |
There was a problem hiding this comment.
I read this "if" as "if the requested path contains path traversals, process it as usual".
Shouldn't it be the other way around?
I mean, in case of path traversals we should return an "early 404" response imho...
There was a problem hiding this comment.
Good catch, I'll fix that.
| private function hasPotentialRouteForFirstSegment(string $firstSegment): bool | ||
| { | ||
| $pattern = '#^' . preg_quote($firstSegment, '#') . '(?:/|$)#'; | ||
| foreach ($this->router->getRoutes()->all() as $route) { |
There was a problem hiding this comment.
Do we really want to iterate all routes for every single request?
There was a problem hiding this comment.
There might be a more performant way of doing this if we modify the router. There's a secondary concern though - do we really need to register all routes on every request (even the user is logged out, for example).
|
I thought I'd run some tests. On my local machine, atomik install out of the box, no additional caching config, 9.5.0. This is the apachebench average of 10 requests: First, visiting /about/faq (a valid page) without the early 404 middleware. I'll run the test three times Now WITH the middleware: Pretty highly variable. I find the abnormally fast requests to be very interesting, but I think it's probably just noise. Obviously parsing all routes will incur some performance penalty, but I think by this demonstration it's pretty clear that the difference is minimal and highly variable. Now let's check against /index.php/wp-login.php (completely invalid page), first without the middleware Now with the middleware: I think the real benefits from the zero-configuration 404 checker are pretty obvious in this case. You do raise a reasonable concern about the routes, but I'd counter that we already have performance issues with routes and I'd urge us to solve it separately. Route caching would bring significant benefit to Concrete outside of this pull request. |
|
Found some additional bugs on a second pass - we obviously don't want to use the cached 404 page or set the cached 404 page if we're logged in, so I've added checking for that. |
|
See also #12867 (comment) |
This pull request introduces a new
Early404Middlewareto optimize 404 handling by quickly returning cached 404 responses for requests that do not match any route or page path, reducing unnecessary processing. It also updates theResponseFactoryto support cached 404 responses and ensures theon_page_not_foundevent is dispatched in both cached and standard 404 scenarios. The changes are covered by new unit tests for the middleware.This is a zero-configuration, simpler version of #12867