Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Cache] serialize objects using native arrays when possible #27543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 18, 2018

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Jun 7, 2018

Q A
Branch? master
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets -
License MIT
Doc PR -

This PR allows leveraging OPCache shared memory when storing objects in Php* pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);

Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)

php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing Serializable. The best serialization format is selected automatically so this is completely seamless.

ping @palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios.

@palex-fpt
Copy link

I'd be thrilled to get some benchmarks on your scenarios.

https://gist.github.com/palex-fpt/82fc3deed09c2de2a9023080a9613ce3
largeObject - it is serialized. igbinary gives 2x bust over php serialize.
largeExportableObject - in var_export form it loads in 10x+ faster than serialized.

@palex-fpt
Copy link

Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)

There is native support to var_export serialization:
http://php.net/manual/en/language.oop5.magic.php#object.set-state

Classes implemented this method should be able to restore from var_export-ed form.

@nicolas-grekas
Copy link
Member Author

There is native support to var_export serialization:

I think I can make the marshaller handle these. Let me give it a try.
Thanks for the bench also.

@nicolas-grekas
Copy link
Member Author

Here we are, the generated PHP files now use __set_state when possible.

} catch (\Exception $e) {
}
if (null !== $e) {
} catch (\Throwable $e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change should actually be done in the 4.0 branch (lots of places were cleaned already, but it looks like a bunch of them were missed)

// Store arrays serialized if they contain any objects or references
if ($unserialized !== $value || (false !== strpos($serialized, ';R:') && preg_match('/;R:[1-9]/', $serialized))) {
// Keep value serialized if it contains any "Serializable" objects or any internal references
if (0 === strpos($serialized, 'C:') || preg_match('/;[CRr]:[1-9]/', $serialized)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check the first 2 chars, or would $serialized[0] be enough ? the second char would always be a colon AFAICT (all PHP types are represented using a single char in the serialization format AFAIK)

* file that was distributed with this source code.
*/

namespace Symfony\Component\Cache\Traits;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a class in the Traits namespace looks confusing to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get that, yet that's internal and I don't think this deserves a dedicated namespace (nor being at the root), so here it is :)

$class = \get_class($value);
$data = array(self::COOKIE => $class);

if (self::$sleep[$class] ?? self::$sleep[$class] = \method_exists($class, '__sleep')) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the key access in the static property faster than \method_exists($class, '__sleep') with its special opcode ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed: isset is 3x faster than method_exists here

@nicolas-grekas
Copy link
Member Author

Now green, comments addressed @stof, thanks.

@palex-fpt
Copy link

Is any chance to have extension point to switch between php serialization and igbinary?
IMHO it would be good to have method where inherited class would be able override mapping between variable type and used marshaling method.

@nicolas-grekas nicolas-grekas force-pushed the cache-marshall branch 5 times, most recently from 006af24 to 4953591 Compare June 8, 2018 20:43
@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 8, 2018

I managed to remove the unmarshaller part of this PR, and leverage __set_state instead.
This allows generating ultra performant code that needs no userland logic but only minimal PHP code.
In order to not instantiate all objects in its pool for PhpArray* adapters, and in order to skip the wrapping logic around include in PhpFiles* adapters, this code is wrapped in closures that are called on demand.

Combined with #27549, this results in a significantly faster value loader that benefits from OPcache shared memory as much as possible.

Is any chance to have extension point to switch between php serialization and igbinary?

What would be the benefit? serialize() is now used only in specific situations: when a structure contains internal references or when objects implementing Serializable are found. We could use igbinary() instead in these situations, but I would hardcode it to make things simpler. WDYT?

@nicolas-grekas nicolas-grekas force-pushed the cache-marshall branch 2 times, most recently from 467f0ae to eea30e0 Compare June 8, 2018 22:52
@palex-fpt
Copy link

I wrote bench for measure object load times: https://gist.github.com/palex-fpt/121a24e53ded4a0f6bb72307fc823a50

@palex-fpt
Copy link

palex-fpt commented Jun 9, 2018

TBH I would like to have PhpFilesTrait looks like:

// doSave()
...
            $ok = $this->write($file, '<?php return '.var_export(array($lifetime, $this->serialize($value)), true).';') && $ok;
...
// doFetch()
...
                    list($expiresAt, $values[$id]) = $this->unserialize(include $file);

This way we would move all responsibility to prepare 'the best' serialization form to serializer. Leaving PhpFilesTrait with file handling and using opcache.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 9, 2018

I wrote bench for measure object load times: https://gist.github.com/palex-fpt/121a24e53ded4a0f6bb72307fc823a50

I was surprised by the speed of var_export so I looked better at it: the script uses var_export on already var_exported data, which PHP just has to load a simple string. That's not what you wanted to bench or course. When removing this double var_export, you get a fatal error Call to undefined method stdClass::__set_state() , which is what I would have expected.

That leaves us with the marshaller as the best solution, great :)

@palex-fpt
Copy link

palex-fpt commented Jun 9, 2018

Oops. My mistake. I updated gist. Igbinary looks like the winner. And it uses three times lesser space.

@palex-fpt
Copy link

I tested with range of options. When data use a lot of object instances - it looses to igbinary. When data is composition of scalars - PhpMarshaller is winner. It'll be good have option to choose marshaller based on used data.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 9, 2018

Igbinary looks like the winner. And it uses three times lesser space.

actually, it doesn't to me, because this is missing a very important property of the native php format: it leverages php shared memory. This means the php format uses zero extra memory past the first request. This also means zero memory transfer to access the data: you just manipulate pointers to shared memory under the hood.

There is a pathological test case that illustrates this:
php -dopcache.enable_cli=1 -dapc.enable_cli=1 test.php

error_reporting(-1);
require 'vendor/autoload.php';

use Symfony\Component\Cache\Adapter as p;

$cache = new p\PhpFilesAdapter();
//$cache = new p\ApcuAdapter();

$mem = memory_get_usage();
$start = microtime(true);
$i = 10000;
$values = array();

while (--$i) {
    $values[] = $cache->get('foo', function ($item) {
        return str_repeat('-', 10000);
    });
}

echo 1000*(microtime(true) - $start), "ms\n";
echo memory_get_peak_usage() - $mem, "\n";

Takes 23ms + 123MB with Apcu (~mimics serialize/igbinary)
And 13ms + 555KB with native PHP.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 13, 2018

PR now tested and green. Ready.

@palex-fpt now that this is faster than igbinary there is no reason anymore to allow any sort of extensibility here. On the contrary, I prefer this to be closed and remain internal details.

@palex-fpt
Copy link

palex-fpt commented Jun 14, 2018

Do you have benchmarks?

tree depth 3, values: 50, iterations: 1000

test store first get 2nd get
ArrayCache (nonserialized) 15210.6 17.9 5.7
ArrayCache (serialized) 390256.1 305.5 299.7
FilesystemCache 797541.0 391.4 355.8
PhpFilesCache 1432501.9 1240.0 83.7

With my typical scenario (one access to key-value pair per request) it does not looks good.
As I tested against your branch - it's hard to configure cache with igbinary here.

https://gist.github.com/palex-fpt/913fbe1b1def170c2785d3e26ce4f77e

@palex-fpt
Copy link

palex-fpt commented Jun 14, 2018

it looks like first call to 'get' method is performed against empty opcache. 'set' method should populate opcache.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 14, 2018

To me, this looks really good: store and 1st get are is slow calls, that's expected as caches have much higher read-rate than write-rate. For PhpFileAdapter, that's even more the case because since this leverages opcache, it should be used in append-only scenarios. The reason is that opcache itself is append-only: it never frees memory until its buffer is full, in which case it just empties it and starts over. Obviously, this should never happen. This means that if one plans to store data that can expire/be deleted at some non-zero rate, this is not something we should encourage/support.
See http://blog.jpauli.tech/2015/03/05/opcache.html for details about this.
The best use case for PhpFileAdapter is for system caches, and this is where we use it. In this scenario, data is append-only, which means per-request locality is also a great optimization with no downside.
With this reasoning, if you want to use igbinary, then this should be done with FilesystemAdapter, which your other PR will allow. The reason is that if your data is append-only then there is no better backend than PhpFileAdapter, and if not, then opcache is not a good fit, unless you're ok with emptying its memory periodically by design, which is a scenario I wouldn't support (so that I have no incentive to add code to handle igbinary there.)

@nicolas-grekas nicolas-grekas force-pushed the cache-marshall branch 2 times, most recently from 7845a55 to d45b23e Compare June 14, 2018 07:35
@nicolas-grekas
Copy link
Member Author

'set' method should populate opcache.

done

@palex-fpt
Copy link

The reason is that opcache itself is append-only: it never frees memory until its buffer is full, in which case it just empty it and starts over. Obviously, this should never happen. This means that if one plans to store data that can expire/be deleted at some non-zero rate, this is not something we should encourage/support.

Opcache is fastest available cache with node locality. We reset opcache on node on 'switch to new build' deployment. It happens at least once per day. Any frequently accessed data that has change rate lower than deployment rate is perfect candidate to be stored in opcache. Limit opcache to append only is forcing to use slower caches or to do unnecessary redeployment on data change. Ex. does use-case that allows to change title of site directory (which happens once per year or never) should forbid use opcache for site directory attributes?

Caching retrieved data adds some burden to backend services. Service that enumerates over large portion of opcached objects can go oom. But it can be handled with $cache->reset() on each iteration.

I'm ok with current PR. It is big step in performance from opcaching serialized strings. Further improvements can be done in separate PRs.

@nicolas-grekas nicolas-grekas force-pushed the cache-marshall branch 2 times, most recently from 8c5cb74 to 45b6e35 Compare June 15, 2018 09:17
@nicolas-grekas
Copy link
Member Author

Great, PR is ready then :)
ping @symfony/deciders

@nicolas-grekas nicolas-grekas force-pushed the cache-marshall branch 2 times, most recently from d7b1a72 to f67f900 Compare June 18, 2018 08:14
@fabpot
Copy link
Member

fabpot commented Jun 18, 2018

Thank you @nicolas-grekas.

@fabpot fabpot merged commit 866420e into symfony:master Jun 18, 2018
fabpot added a commit that referenced this pull request Jun 18, 2018
…sible (nicolas-grekas)

This PR was merged into the 4.2-dev branch.

Discussion
----------

[Cache] serialize objects using native arrays when possible

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

This PR allows leveraging OPCache shared memory when storing objects in `Php*` pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);

Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)

php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing `Serializable`. The best serialization format is selected automatically so this is completely seamless.

ping @palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios.

Commits
-------

866420e [Cache] serialize objects using native arrays when possible
@nicolas-grekas nicolas-grekas deleted the cache-marshall branch June 18, 2018 16:13
@palex-fpt
Copy link

since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios.

As it does not support objects with __set_sate and cannot be overridden to made that support I can not benchmark it against my configuration. I hope it can be done later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants