-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Cache] serialize objects using native arrays when possible #27543
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
84541a5
to
f286074
Compare
https://gist.github.com/palex-fpt/82fc3deed09c2de2a9023080a9613ce3 |
There is native support to var_export serialization: Classes implemented this method should be able to restore from var_export-ed form. |
I think I can make the marshaller handle these. Let me give it a try. |
f286074
to
0bd2be8
Compare
Here we are, the generated PHP files now use |
} catch (\Exception $e) { | ||
} | ||
if (null !== $e) { | ||
} catch (\Throwable $e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this change should actually be done in the 4.0 branch (lots of places were cleaned already, but it looks like a bunch of them were missed)
// Store arrays serialized if they contain any objects or references | ||
if ($unserialized !== $value || (false !== strpos($serialized, ';R:') && preg_match('/;R:[1-9]/', $serialized))) { | ||
// Keep value serialized if it contains any "Serializable" objects or any internal references | ||
if (0 === strpos($serialized, 'C:') || preg_match('/;[CRr]:[1-9]/', $serialized)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to check the first 2 chars, or would $serialized[0]
be enough ? the second char would always be a colon AFAICT (all PHP types are represented using a single char in the serialization format AFAIK)
* file that was distributed with this source code. | ||
*/ | ||
|
||
namespace Symfony\Component\Cache\Traits; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having a class in the Traits
namespace looks confusing to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get that, yet that's internal and I don't think this deserves a dedicated namespace (nor being at the root), so here it is :)
$class = \get_class($value); | ||
$data = array(self::COOKIE => $class); | ||
|
||
if (self::$sleep[$class] ?? self::$sleep[$class] = \method_exists($class, '__sleep')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the key access in the static property faster than \method_exists($class, '__sleep')
with its special opcode ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirmed: isset is 3x faster than method_exists here
0bd2be8
to
1890567
Compare
Now green, comments addressed @stof, thanks. |
Is any chance to have extension point to switch between php serialization and igbinary? |
006af24
to
4953591
Compare
I managed to remove the unmarshaller part of this PR, and leverage Combined with #27549, this results in a significantly faster value loader that benefits from OPcache shared memory as much as possible.
What would be the benefit? |
467f0ae
to
eea30e0
Compare
I wrote bench for measure object load times: https://gist.github.com/palex-fpt/121a24e53ded4a0f6bb72307fc823a50 |
TBH I would like to have PhpFilesTrait looks like:
This way we would move all responsibility to prepare 'the best' serialization form to serializer. Leaving PhpFilesTrait with file handling and using opcache. |
I was surprised by the speed of That leaves us with the marshaller as the best solution, great :) |
Oops. My mistake. I updated gist. Igbinary looks like the winner. And it uses three times lesser space. |
I tested with range of options. When data use a lot of object instances - it looses to igbinary. When data is composition of scalars - PhpMarshaller is winner. It'll be good have option to choose marshaller based on used data. |
actually, it doesn't to me, because this is missing a very important property of the native php format: it leverages php shared memory. This means the php format uses zero extra memory past the first request. This also means zero memory transfer to access the data: you just manipulate pointers to shared memory under the hood. There is a pathological test case that illustrates this: error_reporting(-1);
require 'vendor/autoload.php';
use Symfony\Component\Cache\Adapter as p;
$cache = new p\PhpFilesAdapter();
//$cache = new p\ApcuAdapter();
$mem = memory_get_usage();
$start = microtime(true);
$i = 10000;
$values = array();
while (--$i) {
$values[] = $cache->get('foo', function ($item) {
return str_repeat('-', 10000);
});
}
echo 1000*(microtime(true) - $start), "ms\n";
echo memory_get_peak_usage() - $mem, "\n"; Takes 23ms + 123MB with Apcu (~mimics serialize/igbinary) |
f37905c
to
bc91842
Compare
PR now tested and green. Ready. @palex-fpt now that this is faster than igbinary there is no reason anymore to allow any sort of extensibility here. On the contrary, I prefer this to be closed and remain internal details. |
Do you have benchmarks? tree depth 3, values: 50, iterations: 1000
With my typical scenario (one access to key-value pair per request) it does not looks good. https://gist.github.com/palex-fpt/913fbe1b1def170c2785d3e26ce4f77e |
it looks like first call to 'get' method is performed against empty opcache. 'set' method should populate opcache. |
To me, this looks really good: store |
7845a55
to
d45b23e
Compare
done |
Opcache is fastest available cache with node locality. We reset opcache on node on 'switch to new build' deployment. It happens at least once per day. Any frequently accessed data that has change rate lower than deployment rate is perfect candidate to be stored in opcache. Limit opcache to append only is forcing to use slower caches or to do unnecessary redeployment on data change. Ex. does use-case that allows to change title of site directory (which happens once per year or never) should forbid use opcache for site directory attributes? Caching retrieved data adds some burden to backend services. Service that enumerates over large portion of opcached objects can go oom. But it can be handled with $cache->reset() on each iteration. I'm ok with current PR. It is big step in performance from opcaching serialized strings. Further improvements can be done in separate PRs. |
8c5cb74
to
45b6e35
Compare
Great, PR is ready then :) |
d7b1a72
to
f67f900
Compare
f67f900
to
866420e
Compare
Thank you @nicolas-grekas. |
…sible (nicolas-grekas) This PR was merged into the 4.2-dev branch. Discussion ---------- [Cache] serialize objects using native arrays when possible | Q | A | ------------- | --- | Branch? | master | Bug fix? | no | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | - | License | MIT | Doc PR | - This PR allows leveraging OPCache shared memory when storing objects in `Php*` pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming); Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.) php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing `Serializable`. The best serialization format is selected automatically so this is completely seamless. ping @palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios. Commits ------- 866420e [Cache] serialize objects using native arrays when possible
As it does not support objects with __set_sate and cannot be overridden to made that support I can not benchmark it against my configuration. I hope it can be done later. |
This PR allows leveraging OPCache shared memory when storing objects in
Php*
pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)
php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing
Serializable
. The best serialization format is selected automatically so this is completely seamless.ping @palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios.