Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC] Transforming Intl data from JSON to PHP #23545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
javiereguiluz opened this issue Jul 17, 2017 · 39 comments
Closed

[RFC] Transforming Intl data from JSON to PHP #23545

javiereguiluz opened this issue Jul 17, 2017 · 39 comments
Labels
Intl Performance RFC RFC = Request For Comments (proposals about features that you want to be discussed)

Comments

@javiereguiluz
Copy link
Member

Q A
Bug report? no
Feature request? yes
BC Break report? no
RFC? yes
Symfony version 3.4

I was profiling the Symfony Demo app, and I realized that json_decode() took a lot of time to execute:

intl-data-json-decode

The reason is that Intl data is provided as JSON files (see https://github.com/symfony/symfony/tree/master/src/Symfony/Component/Intl/Resources/data).

Would it make sense to transform that data into PHP (using var_export()) to avoid parsing it and to get free caching thanks to OPCache? By the way, Intl already provides a reader and writer of this data for PHP, so we could use it (see https://github.com/symfony/symfony/blob/master/src/Symfony/Component/Intl/Data/Bundle/Reader/PhpBundleReader.php).

@javiereguiluz javiereguiluz added Intl RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Jul 17, 2017
@mvrhov
Copy link

mvrhov commented Jul 17, 2017

Bernhard did a test before deciding on a format: https://github.com/webmozart/json-res-benchmark

@javiereguiluz
Copy link
Member Author

@mvrhov thanks for the reference! According to the benchmark results, JSON is the worst format for performance.

@mvrhov
Copy link

mvrhov commented Jul 17, 2017

The discussion is here #11920

@javiereguiluz
Copy link
Member Author

javiereguiluz commented Jul 17, 2017

@mvrhov in the discussion, Bernhard demonstrated that JSON is not only the slowest format ... but also the biggest in size (because, unlike PHP, can't save UTF8 chars directly). And all this is before the massive improvements introduced by PHP 7. The difference between PHP and JSON would be even bigger today.

@mvrhov
Copy link

mvrhov commented Jul 17, 2017

I'm not against it. And have no idea why @webmozart at the end decided to use json.

@nicolas-grekas
Copy link
Member

👍 to do it, static arrays in shared memory rulez.

@jakzal
Copy link
Contributor

jakzal commented Oct 25, 2017

👍 I'll have a look

@stof
Copy link
Member

stof commented Oct 25, 2017

would it be possible to reuse the benchmark of https://github.com/webmozart/json-res-benchmark to see what happens when using static arrays in shared memory ?

@jakzal
Copy link
Contributor

jakzal commented Oct 27, 2017

PHP 7.1.10

Memory & Peak Memory 2048kB in all cases.

Notes:

  • used en_GI instead of en_GB as the later's not created anymore
  • loaded an additional meta file in each benchmark as the root file doesn't contain Alpha3ToNumeric anymore

Previous benchmarks in brackets.

Results (opcache disabled)

Benchmark .json .json array .res .php
Time 3.12ms (13.5ms) 2.87ms (11.8ms) 2.38ms (5.66ms) 7.60ms (25.25ms)

Results (opcache enabled)

Benchmark .json .json array .res .php
Time 2.84ms (13.4ms) 2.69ms (11.8ms) 2.40ms (5.9ms) 1.12ms (10ms)

@jakzal
Copy link
Contributor

jakzal commented Oct 27, 2017

I think the reason why json was chosen might've been @bojanz'es comment:

I've been told multiple times that if the definitions are in PHP, the memory spent on the included data can never be reclaimed (unlike JSON), which is point against it.

I still think JSON is fast enough to justify being the only option.

and @rszrama's

I could be misunderstanding what you're benchmarking or how we expect these files to be used, but if it's similar to the conversation bojanz and I had last week, my primary point in favor of JSON was the simplicity of using the same data in both the server side and client side code (assuming the raw data was used on the front end for some client side formatting).

@javiereguiluz
Copy link
Member Author

@jakzal what would be your recommendation? Switch to PHP? Keep things unchanged? Thanks!

@jakzal
Copy link
Contributor

jakzal commented Oct 27, 2017

@javiereguiluz research in progress ;)

@jakzal
Copy link
Contributor

jakzal commented Dec 19, 2017

@javiereguiluz do you remember which page were you profiling exactly?

@javiereguiluz
Copy link
Member Author

I don't remember, so I profiled several pages again. The time spent on this json_encode() is pretty constant, so it has a high impact on fast pages:

Blog index: http://symfony-demo.test/en/blog/
Profile: https://blackfire.io/profiles/f50e1db0-1024-409c-b606-a0cd7461a7b5/graph
Total time: 191 ms
json_decode: 3.21 ms (1.6% of total)

Blog show: http://symfony-demo.test/en/blog/posts/lorem-ipsum-dolor-sit-amet-consectetur-adipiscing-elit
Profile: https://blackfire.io/profiles/fda8f579-6961-4b2c-bee6-ebe3af7b53a2/graph
Total time: 150 ms
json_decode: 3.03 ms (2% of total)

Backend index: http://symfony-demo.test/en/admin/post/
Profile: https://blackfire.io/profiles/88526f65-2c68-4a16-9873-376d9e51d197/graph
Total time: 95.7 ms
json_decode: 2.82 ms (3% of total)

@jakzal
Copy link
Contributor

jakzal commented Dec 20, 2017

I made some profiling too, but with 2-3ms overhead for json_decode it's hard to observe significant differences:

Still, my previous raw tests show that php format is over twice faster than json (even if it's just 1.12ms vs 2.84ms).

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Dec 20, 2017

After being loaded, is the translation array ever modified? Like merged with others, or entries added before being used? If yes, that should be fixed first: this would kill all the potential benefit.

@jakzal
Copy link
Contributor

jakzal commented Dec 20, 2017

It's returned right away without modifications: https://github.com/symfony/intl/blob/master/Data/Bundle/Reader/PhpBundleReader.php#L54

@mvrhov
Copy link

mvrhov commented Dec 20, 2017

Will this still get merged then sr_Latn_BA.json with sr_Latn.json?

@nicolas-grekas
Copy link
Member

It's surprising that the memory usage doesn't drop, that's why I'm asking. Should be double checked to ensure the return value itself is not triggering COW. (Sorry can't check now)

@jakzal
Copy link
Contributor

jakzal commented Nov 1, 2018

I rerun the benchmarks.

PHP 7.3-rc

Format Time Memory Memory Peak
res 2.26ms 2048kB 2048kB
php 0.91ms 2048kB 2048kB
json-a 2.58ms 2048kB 2048kB
json 2.69ms 2048kB 2048kB

PHP 7.2

Format Time Memory Memory Peak
res 2.38ms 2048kB 2048kB
php 1.07ms 2048kB 2048kB
json-a 2.73ms 2048kB 2048kB
json 2.76ms 2048kB 2048kB

PHP 7.1

Format Time Memory Memory Peak
res 2.36ms 2048kB 2048kB
php 1.03ms 2048kB 2048kB
json-a 2.53ms 2048kB 2048kB
json 2.71ms 2048kB 2048kB

PHP 7.0

Format Time Memory Memory Peak
res 2.4ms 2048kB 2048kB
php 1.01ms 2048kB 2048kB
json-a 2.64ms 2048kB 2048kB
json 2.85ms 2048kB 2048kB

PHP 5.6

Format Time Memory Memory Peak
res 2.62ms 512kB 512kB
php 3.59ms 1024kB 1024kB
json-a 6.05ms 1024kB 1024kB
json 6.84ms 1024kB 1024kB

PHP 5.5

Format Time Memory Memory Peak
res 2.73ms 512kB 512kB
php 3.58ms 768kB 1280kB
json-a 5.99ms 1024kB 1024kB
json 8.23ms 1024kB 1024kB

@jakzal
Copy link
Contributor

jakzal commented Nov 1, 2018

After being loaded, is the translation array ever modified? Like merged with others, or entries added before being used? If yes, that should be fixed first: this would kill all the potential benefit.

Entries are being merged in some cases: https://github.com/symfony/intl/blob/master/Data/Bundle/Reader/BundleEntryReader.php#L101-L122

@jakzal
Copy link
Contributor

jakzal commented Mar 15, 2019

@javiereguiluz @nicolas-grekas what are your opinions on this? I'd like to either make the conversion soon, or close this :) Any more tests/benchmarks I could do?

@javiereguiluz
Copy link
Member Author

If no one can foresee technical issues by making this change, I'd say that the conversion from JSON to PHP is a no brainer. You get a significant performance improvement "for free".

@Simperfit
Copy link
Contributor

I agree with @javiereguiluz if it's a free performance gain and it does not change anything in the memory spent then this can be done ;). cc @jakzal

@Pierstoval
Copy link
Contributor

I made an app gain a +1000% performance boost in average by changing some config files from JSON to PHP.

Costless, painless, performance gain.

@Pierstoval
Copy link
Contributor

And by the way, it seems that it's only a few lines of changes in the update-data.php script and the ResourceBundle class to default to PHP instead of JSON, so this seems to be a true no-brainer and clearly an easy-pick

@jakzal
Copy link
Contributor

jakzal commented Aug 8, 2019

@Pierstoval thank you for sharing your results. Impressive!

Thank you for reminding me about this. It's been long on my list of things to do. Let me submit a PR for this to finally make this happen!

@Simperfit
Copy link
Contributor

Nice thank you @jakzal

@Pierstoval
Copy link
Contributor

Pierstoval commented Aug 8, 2019

@jakzal I started the work on my machine (this is also why I noticed the change is not "that big"), if you want I can submit it :)

@jakzal
Copy link
Contributor

jakzal commented Aug 8, 2019

@Pierstoval it's fine. I also started it, just need to refresh and run performance tests again.

@fancyweb
Copy link
Contributor

@jakzal @Pierstoval What is the state of your work on this issue? Is there a chance to have it in 4.4?

@jakzal
Copy link
Contributor

jakzal commented Oct 29, 2019

sorry for the delay. I was waiting for a new icu release (which has happened now). I’ll rerun my tests this weekend and send a pr.

@OskarStark
Copy link
Contributor

It’s a very interesting thread, thank you all for the discussions, benchmarks and feedback.

I read sometimes that using named classes with public properties are the „fastest“ choice. Sorry I cannot get the source and to be honest, I am not that deep into those topics. But would you consider using a class over an assoc array?
If yes/no why?

Thanks for the info 👍🏻

@Pierstoval
Copy link
Contributor

Named classes can be faster because they're always passed as references, so it may improve perfs a bit because arrays are passed as copy. However, with copy-on-write for arrays and engine array optimizations from latest PHP versions, I'm not sure it's really useful to use a class rather than a plain PHP array.

Maybe PHP experts could correct me if I'm wrong

@nicolas-grekas
Copy link
Member

Arrays will always be faster thanks to copy-on-write. Classes in PHP 7.4 might be as fast, but not in earlier versions. Of course, this supposes we do NO operations on the loaded arrays. Anything that would trigger COW would break the perf benefit.

@OskarStark
Copy link
Contributor

Thank you very much for the clarification 🙂

@carsonbot
Copy link

Thank you for this suggestion.
There has not been a lot of activity here for a while. Would you still like to see this feature?

@carsonbot
Copy link

Could I get an answer? If I do not hear anything I will assume this issue is resolved or abandoned. Please get back to me <3

@carsonbot
Copy link

Hey,

I didn't hear anything so I'm going to close it. Feel free to comment if this is still relevant, I can always reopen!

@jakzal jakzal reopened this Jan 15, 2021
@Nyholm Nyholm removed the Stalled label Feb 22, 2021
nicolas-grekas added a commit that referenced this issue Apr 14, 2021
This PR was merged into the 5.3-dev branch.

Discussion
----------

[Intl] Switch from json to php resources

| Q             | A
| ------------- | ---
| Branch?       | 5.x
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | Fix #23545
| License       | MIT
| Doc PR        | -

take over #34214

Commits
-------

24bfc3b [Intl] Switch from json to php resources
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Intl Performance RFC RFC = Request For Comments (proposals about features that you want to be discussed)
Projects
None yet
Development

No branches or pull requests