Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Routing] dump static arrays instead of classes for both matcher and generator #25909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Jan 23, 2018

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets -
License MIT
Doc PR -

Instead of dumping a class for the Url generator and matcher, I'd like to try dumping a static array, and use a generic class to walk through the array.

This should allow leveraging the PHP7 cache for static arrays, thus make the router matcher scale to a big number of routes without much performance penalty.

I'd happily accept help for benching this on your apps.

[
[
'match',
'#^/rootprefix/(?P<var>[^/]++)$#s',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not entirely related to this PR, but would it also be an idea to optimize routes away that can't ever be matched? For example, I register a bunch of routes which are exposed via a bundle, but they link to an external application. Those routes will show up in the URL matcher, while I really only need them in the generator.

Example:

// frontend_routing.hostnetnl.search
if ('/zoeken' === $pathinfo) {
    return $this->mergeDefaults(array_replace($hostMatches, array('_route' => 'frontend_routing.hostnetnl.search')), array ());
}

The reason I mention this, is because while being a URL matcher fixture, it doesn't seem like an actual fixture that could be called if used in a real world application. Currently not too much overhead because they are loaded last in my scenario, but it's still 51 routes in the current version of my bundle. This being added to an array, but never used, it feels like an optimization would be very welcome!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iltar we should certainly not do that in this PR as this is a separate concern. The current PR is big enough :)

{
public function testRedirectWhenNoSlash()
{
$coll = new RouteCollection();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note very important, but what if we rename $coll variable to $routes ?

$regex = $compiledRoute->getRegex();

if (!count($compiledRoute->getPathVariables()) && false !== preg_match('#^(.)\^(?P<url>.*?)\$\1#'.('u' === substr($regex, -1) ? 'u' : ''), $regex, $m)) {
if ($supportsTrailingSlash && '/' === substr($m['url'], -1)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks to PHP 7.1 we could remove a function call here:

substr($m['url'], -1)
$m['url'][-1]

$ret = $defaults;
}

if ($checkTrailingSlash && substr($pathinfo, -1) !== '/') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too:

substr($pathinfo, -1) !== '/'
'/' !== $pathinfo[-1]

@@ -0,0 +1,199 @@
<?php
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the efforts to generate a somewhat readable matcher files, but looking at the generated files (like this one) I wonder if it's readable at all with the route definition spanning so many lines.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be much better now

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great now. Thank you!

@@ -642,6 +643,12 @@ private function registerRouterConfiguration(array $config, ContainerBuilder $co
if (isset($config['type'])) {
$argument['resource_type'] = $config['type'];
}
if (!class_exists(StaticUrlMatcher::class)) {
$argument['generator_class'] = $argument['generator_base_class'] ;
$argument['generator_dumper_class'] = 'Symfony\Component\Routing\Generator\Dumper\PhpGeneratorDumper';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not using :class to help IDEs finding these usages.

@nicolas-grekas nicolas-grekas force-pushed the route-static branch 4 times, most recently from 4223e2b to 244278c Compare January 24, 2018 10:26
@nicolas-grekas
Copy link
Member Author

PR ready on my side. A one-route app show no measurable performance impact. I'd be happy to gather performance number on real apps with bigger routing table.

Maybe @frankdejonge since you already worked on that topic? Anyone else really also :)

@@ -40,5 +40,7 @@ return PhpCsFixer\Config::create()
->notPath('Symfony/Bridge/PhpUnit/Tests/DeprecationErrorHandler/default.phpt')
->notPath('Symfony/Bridge/PhpUnit/Tests/DeprecationErrorHandler/weak.phpt')
->notPath('Symfony/Component/Debug/Tests/DebugClassLoaderTest.php')
// file autogenerated
->notPath('Symfony/Component/Routing/Tests/Fixtures/dumper/static_url_matcher3.php')
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dunno why this is ignored by fabbot, which raises a false positive for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does fabbot use a separate config file maybe?

$this->expressionLanguageProviders[] = $provider;
}

private function compileRoutes(RouteCollection $routes): array
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alexpott you might be interested into having this method public at some point for Drupal, as it allows building a static routing table that can be (json)serialized and used together with StaticUrlMatcher

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right now our entire routing system uses a database tables. This was introduced due to some reasons:

  • There might be a lot of possible routes
  • User changes might trigger new routes

We then leverage the DB to do our filtering, instead of something purely regex based. Does that make sense for you?

We explored a bit the idea of doing some static optimizations for like the frontpage or so, but quickly realized that there is too much flexibility in the existing system and more important, the bottleneck is not the routing system at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, makes sense, yet I don't know if it's actually better to model routes in the DB (some rows per routes, which is what you do now?), vs just putting one json that describes the full routing in the DB, which could be generated here. I don't have the answer, I just wanted to let you know :)

@frankdejonge
Copy link
Contributor

@nicolas-grekas I've anonymised the routes dump from the reference project I used. You can use this as a reference. I parsed it before to create the route collections: https://gist.github.com/frankdejonge/65e92a4bfa9e0daed6974e0a82ad0684

/**
* @author Fabien Potencier <[email protected]>
*/
class StaticUrlMatcher extends BaseMatcher
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would name it RedirectableStaticUrlMatcher, as being redirectable is what distinguishes this class from the component one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See reply in comments.

private function getMatcher(RouteCollection $routes, RequestContext $context)
{
$dumper = new StaticUrlMatcherDumper($routes);
$path = sys_get_temp_dir().DIRECTORY_SEPARATOR.'php_matcher.'.uniqid('StaticUrlMatcher').'.php';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could just put a / in the string. No need to use DIRECTORY_SEPARATOR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed (this was copy/paste from existing code)

*
* @author Nicolas Grekas <[email protected]>
*/
abstract class StaticUrlMatcher extends UrlMatcher implements RedirectableUrlMatcherInterface
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we required to make this an abstract class forcing the child to implement RedirectableUrlMatcherInterface ? The redirectable part is the only un-implemented part.
This PR makes the Routing component unusable in a dumped context without using FrameworkBundle (or implementing your own class).

In the existing matcher, support for redirecting is optional. We could do the same here by removing this abstract interface here (making the class concreate) and adding a if ($this instanceof RedirectableUrlMatcherInterface) in places needing to change the logic for redirections ? What would be the performance impact of this check for the framework case where it is redirectable ?

if (0 !== strpos($pathinfo, $conditions[1])) {
continue 2;
}
// no break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of doing a fallthrough and then changing the matched conditions by changing the index with an int cast of the type check, it would be much more readable to duplicate the preg_match call, passing the right condition in each case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@stof
Copy link
Member

stof commented Jan 24, 2018

I would like to see benchmarks for both generating and matching here, as well as extensive testing that the new matcher behaves the same than the old one (as the logic is totally rewritten).

@frankdejonge
Copy link
Contributor

@stof out of curiosity, why would you want benchmarks for generating? The generation of the routes should be done before something is deployed and shipped in the artefact/container/deployable-zip/whatever. At least, that's my view on the subject, curious to hear what reasons you'd have to measure the performance of this.

@javiereguiluz
Copy link
Member

@frankdejonge I think Christophe refers to generate a URL from a route ({{ path('my_route')) instead of generating these static files.

@frankdejonge
Copy link
Contributor

Ahhhhh that makes sense.

@frankdejonge
Copy link
Contributor

@nicolas-grekas did you see the gist with the routes?

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 24, 2018

Are we required to make this an abstract class forcing the child to implement RedirectableUrlMatcherInterface ?

Fixed

I would like to see benchmarks for both generating and matching here, as well as extensive testing that the new matcher behaves the same than the old one (as the logic is totally rewritten).

The code for the generator is exactly the same - only the initialization logic has changed - so I'm not going to spend much time on it.
For the matcher, I agree we're missing a test suite, too bad we didn't have it already.

@stof
Copy link
Member

stof commented Jan 24, 2018

The code for the generator is exactly the same - only the initialization logic has changed - so I'm not going to spend much time on it.

It would still make sense to have a benchmark to see whether we actually benefit from the static array caching.

For the matcher, I agree we're missing a test suite, too bad we didn't have it already.

It might make sense to work on building such testsuite first in a separate PR (which means it would run against the current implementation) and then rebase this PR on top of it (ensuring that the new implementation passes it too)

@stof
Copy link
Member

stof commented Jan 24, 2018

@frankdejonge your dump does not allow rebuilding the routes fully (requirements are not available in it for instance)

@frankdejonge
Copy link
Contributor

@stof in my case I just used a wildcard matcher because the changes didn't happen on that front. Doesn't seem to be a factor here either. The difference is the lookup mechanism, not those checks.

@stof
Copy link
Member

stof commented Jan 24, 2018

@frankdejonge as the matching logic has been fully rewritten, I would like to see benchmarks for cases using all features too, not only for a single case.

@nicolas-grekas
Copy link
Member Author

I've deleted the route dump. Have fun building feature complete fixtures

don't! it's no fun :)

Are we required to make this an abstract class forcing the child to implement RedirectableUrlMatcherInterface ?

Fixed - was easier than anticipated, this adds no overhead.

It would still make sense to have a benchmark to see whether we actually benefit from the static array caching.

the current code already benefits from static array caching, since it already dumps a static array - this is really 100% the same.

@nicolas-grekas nicolas-grekas force-pushed the route-static branch 2 times, most recently from 1c46df7 to 5c5efed Compare January 24, 2018 16:25
@fancyweb
Copy link
Contributor

I did some benchmarks using one of my project routes (789 routes).
1000 revs, 3 iterations.
Winner is in bold.

PhpMatcherDumper vs StaticUrlMatcherDumper :

benchmark subject groups params revs iter mem_peak time_rev comp_z_value comp_deviation
MatcherDumperBench benchPhpMatcherDumper [] 1000 0 122,099,944b 13,925.855μs +1.37σ +0.87%
MatcherDumperBench benchPhpMatcherDumper [] 1000 1 122,099,944b 13,719.996μs -0.98σ -0.62%
MatcherDumperBench benchPhpMatcherDumper [] 1000 2 122,099,944b 13,771.434μs -0.39σ -0.25%
MatcherDumperBench benchStaticUrlMatcherDumper [] 1000 0 122,238,000b 12,447.356μs -1.2σ -0.61%
MatcherDumperBench benchStaticUrlMatcherDumper [] 1000 1 122,238,000b 12,603.555μs +1.25σ +0.63%
MatcherDumperBench benchStaticUrlMatcherDumper [] 1000 2 122,238,000b 12,521.185μs -0.04σ -0.02%

PhpGeneratorDumper vs StaticUrlGeneratorDumper :

benchmark subject groups params revs iter mem_peak time_rev comp_z_value comp_deviation
GeneratorDumperBench benchPhpGeneratorDumper [] 1000 0 11,868,672b 4,727.980μs +0.55σ +0.16%
GeneratorDumperBench benchPhpGeneratorDumper [] 1000 1 11,868,672b 4,732.192μs +0.85σ +0.25%
GeneratorDumperBench benchPhpGeneratorDumper [] 1000 2 11,868,672b 4,700.886μs -1.4σ -0.41%
GeneratorDumperBench benchStaticUrlGeneratorDumper [] 1000 0 11,439,640b 5,867.291μs +1.40σ +0.09%
GeneratorDumperBench benchStaticUrlGeneratorDumper [] 1000 1 11,439,640b 5,858.939μs -0.87σ -0.05%
GeneratorDumperBench benchStaticUrlGeneratorDumper [] 1000 2 11,439,640b 5,860.187μs -0.53σ -0.03%

UrlGenerator vs StaticUrlGenerator (each route is generated 10 times) :

benchmark subject groups params revs iter mem_peak time_rev comp_z_value comp_deviation
GeneratorBench benchUrlGenerator [] 1000 0 16,935,960b 26,957.715μs +1.18σ +2.76%
GeneratorBench benchUrlGenerator [] 1000 1 16,935,960b 25,458.014μs -1.27σ -2.96%
GeneratorBench benchUrlGenerator [] 1000 2 16,935,960b 26,287.435μs +0.09σ +0.20%
GeneratorBench benchStaticUrlGenerator [] 1000 0 16,935,960b 25,434.804μs +1.10σ +2.79%
GeneratorBench benchStaticUrlGenerator [] 1000 1 16,935,960b 23,923.030μs -1.32σ -3.32%
GeneratorBench benchStaticUrlGenerator [] 1000 2 16,935,960b 24,877.948μs +0.21σ +0.54%

@stof
Copy link
Member

stof commented Jan 24, 2018

@fancyweb benchmarking a dumper vs the other is irrelevant. Dumping happens during cache warmup, where we can spend time.

What we need is to compare the matching using the dumped matcher vs the matching using the static matcher with dumped routes. And this is what you don't have in your comment.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 24, 2018

@fancyweb: @stof is right, sorry if that wasn't clear enough. The perf comparison should be about both url matchers: the one generated by PhpDumperMatcher, vs a StaticUrlMatcher one.

Since bootstrap time should be considered, I think the way to bench this is taking a skeleton, load it with some fixture routes, and measure the response time before/after this PR.

@nicolas-grekas
Copy link
Member Author

Matcher now tested, with the existing test cases from UrlMatcher.

@nicolas-grekas nicolas-grekas force-pushed the route-static branch 3 times, most recently from 193a085 to f8c7b98 Compare January 24, 2018 22:02
@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 25, 2018

As far as I tested correctly, on PHP 7.2, this is slower by a very tiny margin. The reason is easy: the matching loop has to do extra "ifs". But the difference does not diverge and is barely measurable in practice, both on small and big tables. Which means I did not hit the theoretical scalability issue with the current code. One might want to try with bigger tables (mine had ~1000 routes).

From design pov, I prefer a generic algorithm with data driven input than a dumped code. So I'm still on the side that this should be seriously considered. E.g. by looking at the data, we can already see that there are more potential optimizations that could be driven by creating hash-maps (a lot of routes could be just put into hashes and don't require the "strpos" check beforehand. Doing so at the code generation level is possible, eg using a switch on PHP 7.2, or inline hashes. But working at the data level is much more comprehensible IMHO.

The last benefit of this approach is the name of the router, which is stable. This can also be achieved in the code-generation approach, by dumping an anonymous class, now that we can use them.

@linaori
Copy link
Contributor

linaori commented Jan 25, 2018

I feel like this new method allows optimizations in a way the old one did not. Instead of deprecating the current, can we simply add this new compilation under a feature flag (for now), and try to optimize it before 4.1 is released? If it turns out we can't get it faster, we can always drop it before 4.1.

@stof
Copy link
Member

stof commented Jan 25, 2018

If it turns out we can't get it faster, we can always drop it before 4.1.

that's true even if we keep the existing approach. Until the RC release, we are free to drop this new class and undeprecate the old one if we decide to do it.

@stof
Copy link
Member

stof commented Jan 25, 2018

we can already see that there are more potential optimizations that could be driven by creating hash-maps (a lot of routes could be just put into hashes and don't require the "strpos" check beforehand. Doing so at the code generation level is possible, eg using a switch on PHP 7.2, or inline hashes. But working at the data level is much more comprehensible IMHO.

I'm not sure this holds true considering that the order of routes is important (as soon as a route has a dynamic part, it limits optimizations for any subsequent routes). And even if we find a way to rearrange routes to optimize them (figuring out that we can switch routes because their matching is totally exclusive and so the order does not matter), such rearranging would then be usable with the dumped class too, not only with the dumped array.
This makes me think that working on re-arranging routes to unlock more optimizations might be more useful than a rewrite of battle-tested working code.

And in practice, my project has 589 routes (including the WebProfiler ones as this is in dev), among which 439 have placeholders (and so cannot be catched by a map). This lives 150 routes which could be matched by a hashmap lookup (and I think a re-arranging can move them all at the beginning given my URL convention and the fact that my requirements and route order are strict enough to avoid any ambiguity). So the re-arranging might be worth it (it is required for any useful hashmap lookup, as these 150 routes are spread over the full list, with dynamic routes appearing early (the _wdt one is even first).

For people wanting to perform a similar analysis, here is what I used as commands:

  • counting routes: bin/console debug:router | grep ANY |wc if you don't use the scheme requirement or the host requirement (and so all routes have ANY for them). Otherwise, run bin/console debug:router |wc and substract 5 for the output lines not being routes (table header and table delimiters)
  • counting dymaic routes: bin/console debug:router | grep '{' |wc

As far as I tested correctly, on PHP 7.2, this is slower by a very tiny margin. The reason is easy: the matching loop has to do extra "ifs". But the difference does not diverge and is barely measurable in practice, both on small and big tables. Which means I did not hit the theoretical scalability issue with the current code. One might want to try with bigger tables (mine had ~1000 routes).

This makes me think that your promise (in the PR description) of making the router scaling to a higher number of routes is actually not true, as there is no performance difference.

@linaori
Copy link
Contributor

linaori commented Jan 25, 2018

I know for a thing that sorting routes will mess up a lot, but if it means performance, I'm sure I could change my routes to not be messed up. However, there will always be some fallback routes (like wildcard /.*) that need to be done last.

It almost sounds like a proper tree should be build and visualized so developers can detect url problems and probably optimize by using certain urls.

@nicolas-grekas
Copy link
Member Author

I'm closing because this demonstrated to me that this cannot be faster: traversing the static array is always going to be slower than loading + traversing the opcodes.
The implementation is finished so that if someone wants to borrow it in situations where dumping code is not possible, it's still possible.

@nicolas-grekas nicolas-grekas deleted the route-static branch January 26, 2018 08:29
@stof
Copy link
Member

stof commented Jan 29, 2018

@iltar this is why I'm talking about reordering routes as an optimization when we can be sure they are exclusive (and so their order does not matter).

And when working on implementing such logic, building a debugging tool visualizing it might indeed make sense.

nicolas-grekas added a commit that referenced this pull request Jan 26, 2019
… without dumping PHP code (nicolas-grekas)

This PR was merged into the 4.3-dev branch.

Discussion
----------

[Routing] allow using compiled matchers and generators without dumping PHP code

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | yes
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #29590
| License       | MIT
| Doc PR        | symfony/symfony-docs#10790

This is a resurrection of #25909 to make matcher+generator dumpers output PHP arrays instead of PHP code.
Don't be fooled by the diff stats, it's mostly fixtures.

This PR should contribute to making the Routing component easier to use standalone.

On the way back from SFLive USA.

![image](https://user-images.githubusercontent.com/243674/46920076-784e1b80-cf9d-11e8-86e7-850fffb409de.png)

Commits
-------

f0a519a [Routing] allow using compiled matchers and generators without dumping PHP code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants