-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Routing] dump static arrays instead of classes for both matcher and generator #25909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[ | ||
[ | ||
'match', | ||
'#^/rootprefix/(?P<var>[^/]++)$#s', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not entirely related to this PR, but would it also be an idea to optimize routes away that can't ever be matched? For example, I register a bunch of routes which are exposed via a bundle, but they link to an external application. Those routes will show up in the URL matcher, while I really only need them in the generator.
Example:
// frontend_routing.hostnetnl.search
if ('/zoeken' === $pathinfo) {
return $this->mergeDefaults(array_replace($hostMatches, array('_route' => 'frontend_routing.hostnetnl.search')), array ());
}
The reason I mention this, is because while being a URL matcher fixture, it doesn't seem like an actual fixture that could be called if used in a real world application. Currently not too much overhead because they are loaded last in my scenario, but it's still 51 routes in the current version of my bundle. This being added to an array, but never used, it feels like an optimization would be very welcome!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iltar we should certainly not do that in this PR as this is a separate concern. The current PR is big enough :)
{ | ||
public function testRedirectWhenNoSlash() | ||
{ | ||
$coll = new RouteCollection(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note very important, but what if we rename $coll
variable to $routes
?
$regex = $compiledRoute->getRegex(); | ||
|
||
if (!count($compiledRoute->getPathVariables()) && false !== preg_match('#^(.)\^(?P<url>.*?)\$\1#'.('u' === substr($regex, -1) ? 'u' : ''), $regex, $m)) { | ||
if ($supportsTrailingSlash && '/' === substr($m['url'], -1)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to PHP 7.1 we could remove a function call here:
substr($m['url'], -1)
$m['url'][-1]
$ret = $defaults; | ||
} | ||
|
||
if ($checkTrailingSlash && substr($pathinfo, -1) !== '/') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too:
substr($pathinfo, -1) !== '/'
'/' !== $pathinfo[-1]
@@ -0,0 +1,199 @@ | |||
<?php |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I appreciate the efforts to generate a somewhat readable matcher files, but looking at the generated files (like this one) I wonder if it's readable at all with the route definition spanning so many lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be much better now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great now. Thank you!
@@ -642,6 +643,12 @@ private function registerRouterConfiguration(array $config, ContainerBuilder $co | |||
if (isset($config['type'])) { | |||
$argument['resource_type'] = $config['type']; | |||
} | |||
if (!class_exists(StaticUrlMatcher::class)) { | |||
$argument['generator_class'] = $argument['generator_base_class'] ; | |||
$argument['generator_dumper_class'] = 'Symfony\Component\Routing\Generator\Dumper\PhpGeneratorDumper'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not using :class
to help IDEs finding these usages.
4223e2b
to
244278c
Compare
PR ready on my side. A one-route app show no measurable performance impact. I'd be happy to gather performance number on real apps with bigger routing table. Maybe @frankdejonge since you already worked on that topic? Anyone else really also :) |
@@ -40,5 +40,7 @@ return PhpCsFixer\Config::create() | |||
->notPath('Symfony/Bridge/PhpUnit/Tests/DeprecationErrorHandler/default.phpt') | |||
->notPath('Symfony/Bridge/PhpUnit/Tests/DeprecationErrorHandler/weak.phpt') | |||
->notPath('Symfony/Component/Debug/Tests/DebugClassLoaderTest.php') | |||
// file autogenerated | |||
->notPath('Symfony/Component/Routing/Tests/Fixtures/dumper/static_url_matcher3.php') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dunno why this is ignored by fabbot, which raises a false positive for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does fabbot use a separate config file maybe?
$this->expressionLanguageProviders[] = $provider; | ||
} | ||
|
||
private function compileRoutes(RouteCollection $routes): array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alexpott you might be interested into having this method public at some point for Drupal, as it allows building a static routing table that can be (json)serialized and used together with StaticUrlMatcher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now our entire routing system uses a database tables. This was introduced due to some reasons:
- There might be a lot of possible routes
- User changes might trigger new routes
We then leverage the DB to do our filtering, instead of something purely regex based. Does that make sense for you?
We explored a bit the idea of doing some static optimizations for like the frontpage or so, but quickly realized that there is too much flexibility in the existing system and more important, the bottleneck is not the routing system at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, makes sense, yet I don't know if it's actually better to model routes in the DB (some rows per routes, which is what you do now?), vs just putting one json that describes the full routing in the DB, which could be generated here. I don't have the answer, I just wanted to let you know :)
@nicolas-grekas I've anonymised the routes dump from the reference project I used. You can use this as a reference. I parsed it before to create the route collections: https://gist.github.com/frankdejonge/65e92a4bfa9e0daed6974e0a82ad0684 |
/** | ||
* @author Fabien Potencier <[email protected]> | ||
*/ | ||
class StaticUrlMatcher extends BaseMatcher |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would name it RedirectableStaticUrlMatcher
, as being redirectable is what distinguishes this class from the component one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See reply in comments.
private function getMatcher(RouteCollection $routes, RequestContext $context) | ||
{ | ||
$dumper = new StaticUrlMatcherDumper($routes); | ||
$path = sys_get_temp_dir().DIRECTORY_SEPARATOR.'php_matcher.'.uniqid('StaticUrlMatcher').'.php'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could just put a /
in the string. No need to use DIRECTORY_SEPARATOR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed (this was copy/paste from existing code)
* | ||
* @author Nicolas Grekas <[email protected]> | ||
*/ | ||
abstract class StaticUrlMatcher extends UrlMatcher implements RedirectableUrlMatcherInterface |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we required to make this an abstract class forcing the child to implement RedirectableUrlMatcherInterface ? The redirectable part is the only un-implemented part.
This PR makes the Routing component unusable in a dumped context without using FrameworkBundle (or implementing your own class).
In the existing matcher, support for redirecting is optional. We could do the same here by removing this abstract interface here (making the class concreate) and adding a if ($this instanceof RedirectableUrlMatcherInterface)
in places needing to change the logic for redirections ? What would be the performance impact of this check for the framework case where it is redirectable ?
if (0 !== strpos($pathinfo, $conditions[1])) { | ||
continue 2; | ||
} | ||
// no break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of doing a fallthrough and then changing the matched conditions by changing the index with an int cast of the type check, it would be much more readable to duplicate the preg_match
call, passing the right condition in each case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
I would like to see benchmarks for both generating and matching here, as well as extensive testing that the new matcher behaves the same than the old one (as the logic is totally rewritten). |
@stof out of curiosity, why would you want benchmarks for generating? The generation of the routes should be done before something is deployed and shipped in the artefact/container/deployable-zip/whatever. At least, that's my view on the subject, curious to hear what reasons you'd have to measure the performance of this. |
@frankdejonge I think Christophe refers to generate a URL from a route ( |
Ahhhhh that makes sense. |
244278c
to
aed9bd7
Compare
@nicolas-grekas did you see the gist with the routes? |
Fixed
The code for the generator is exactly the same - only the initialization logic has changed - so I'm not going to spend much time on it. |
It would still make sense to have a benchmark to see whether we actually benefit from the static array caching.
It might make sense to work on building such testsuite first in a separate PR (which means it would run against the current implementation) and then rebase this PR on top of it (ensuring that the new implementation passes it too) |
@frankdejonge your dump does not allow rebuilding the routes fully (requirements are not available in it for instance) |
@stof in my case I just used a wildcard matcher because the changes didn't happen on that front. Doesn't seem to be a factor here either. The difference is the lookup mechanism, not those checks. |
@frankdejonge as the matching logic has been fully rewritten, I would like to see benchmarks for cases using all features too, not only for a single case. |
aed9bd7
to
54c0ee2
Compare
don't! it's no fun :)
Fixed - was easier than anticipated, this adds no overhead.
the current code already benefits from static array caching, since it already dumps a static array - this is really 100% the same. |
1c46df7
to
5c5efed
Compare
I did some benchmarks using one of my project routes (789 routes).
PhpGeneratorDumper vs
|
5c5efed
to
d03d116
Compare
@fancyweb benchmarking a dumper vs the other is irrelevant. Dumping happens during cache warmup, where we can spend time. What we need is to compare the matching using the dumped matcher vs the matching using the static matcher with dumped routes. And this is what you don't have in your comment. |
@fancyweb: @stof is right, sorry if that wasn't clear enough. The perf comparison should be about both url matchers: the one generated by PhpDumperMatcher, vs a StaticUrlMatcher one. Since bootstrap time should be considered, I think the way to bench this is taking a skeleton, load it with some fixture routes, and measure the response time before/after this PR. |
d03d116
to
33276ad
Compare
Matcher now tested, with the existing test cases from UrlMatcher. |
193a085
to
f8c7b98
Compare
f8c7b98
to
d41db55
Compare
As far as I tested correctly, on PHP 7.2, this is slower by a very tiny margin. The reason is easy: the matching loop has to do extra "ifs". But the difference does not diverge and is barely measurable in practice, both on small and big tables. Which means I did not hit the theoretical scalability issue with the current code. One might want to try with bigger tables (mine had ~1000 routes). From design pov, I prefer a generic algorithm with data driven input than a dumped code. So I'm still on the side that this should be seriously considered. E.g. by looking at the data, we can already see that there are more potential optimizations that could be driven by creating hash-maps (a lot of routes could be just put into hashes and don't require the "strpos" check beforehand. Doing so at the code generation level is possible, eg using a switch on PHP 7.2, or inline hashes. But working at the data level is much more comprehensible IMHO. The last benefit of this approach is the name of the router, which is stable. This can also be achieved in the code-generation approach, by dumping an anonymous class, now that we can use them. |
I feel like this new method allows optimizations in a way the old one did not. Instead of deprecating the current, can we simply add this new compilation under a feature flag (for now), and try to optimize it before 4.1 is released? If it turns out we can't get it faster, we can always drop it before 4.1. |
that's true even if we keep the existing approach. Until the RC release, we are free to drop this new class and undeprecate the old one if we decide to do it. |
I'm not sure this holds true considering that the order of routes is important (as soon as a route has a dynamic part, it limits optimizations for any subsequent routes). And even if we find a way to rearrange routes to optimize them (figuring out that we can switch routes because their matching is totally exclusive and so the order does not matter), such rearranging would then be usable with the dumped class too, not only with the dumped array. And in practice, my project has 589 routes (including the WebProfiler ones as this is in dev), among which 439 have placeholders (and so cannot be catched by a map). This lives 150 routes which could be matched by a hashmap lookup (and I think a re-arranging can move them all at the beginning given my URL convention and the fact that my requirements and route order are strict enough to avoid any ambiguity). So the re-arranging might be worth it (it is required for any useful hashmap lookup, as these 150 routes are spread over the full list, with dynamic routes appearing early (the For people wanting to perform a similar analysis, here is what I used as commands:
This makes me think that your promise (in the PR description) of making the router scaling to a higher number of routes is actually not true, as there is no performance difference. |
I know for a thing that sorting routes will mess up a lot, but if it means performance, I'm sure I could change my routes to not be messed up. However, there will always be some fallback routes (like wildcard It almost sounds like a proper tree should be build and visualized so developers can detect url problems and probably optimize by using certain urls. |
I'm closing because this demonstrated to me that this cannot be faster: traversing the static array is always going to be slower than loading + traversing the opcodes. |
@iltar this is why I'm talking about reordering routes as an optimization when we can be sure they are exclusive (and so their order does not matter). And when working on implementing such logic, building a debugging tool visualizing it might indeed make sense. |
… without dumping PHP code (nicolas-grekas) This PR was merged into the 4.3-dev branch. Discussion ---------- [Routing] allow using compiled matchers and generators without dumping PHP code | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | yes | BC breaks? | no | Deprecations? | yes | Tests pass? | yes | Fixed tickets | #29590 | License | MIT | Doc PR | symfony/symfony-docs#10790 This is a resurrection of #25909 to make matcher+generator dumpers output PHP arrays instead of PHP code. Don't be fooled by the diff stats, it's mostly fixtures. This PR should contribute to making the Routing component easier to use standalone. On the way back from SFLive USA.  Commits ------- f0a519a [Routing] allow using compiled matchers and generators without dumping PHP code
Instead of dumping a class for the Url generator and matcher, I'd like to try dumping a static array, and use a generic class to walk through the array.
This should allow leveraging the PHP7 cache for static arrays, thus make the router matcher scale to a big number of routes without much performance penalty.
I'd happily accept help for benching this on your apps.