Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Cache] Allow to configure serializator for cache instances. #27484

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

palex-fpt
Copy link

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? no
Tests pass? -
Fixed tickets -
License MIT
Doc PR -

Extract object serialization logic into SerializerInterface.
Allow opcache related caches to store __set_state enabled objects without serialization.


use Symfony\Component\Cache\SerializerInterface;

class NullSerializer implements SerializerInterface
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be called IdentitySerializer, I expected a NullSerializer to always return null.

} catch (\Exception $e) {
throw new InvalidArgumentException(sprintf('Cache key "%s" has non-serializable array value.', $key), 0, $e);
$exportedValue = $exportSerializer->serialize($value);
$valuePart = 'array("isSerialized" => false, "value" => '.$exportedValue.')';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, an important criteria of this PR should be to preserve the original data format, so that we keep compatibility with existing dumped values. Here, this change it, and adds overhead to the storage, filling it with medadata that wasn't needed before.

I think the opcache-base storages should just not allow configuration of the serializer. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous realization favors scalar and arrays of scalar data. Nulls and objects was stored in serialized form. It adds performance hit on every cache read or forces to use opcache for primitive types only.
It is possible to store values directly, but all checks in form: isset($this->values, $key) should be rewritten.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous realization favors scalar and arrays of scalar data. Nulls and objects was stored in serialized form

that's intended: only these benefit from opcache's shared memory

It adds performance hit on every cache read or forces to use opcache for primitive types only

do you have numbers about that to illustrate this statement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test: https://gist.github.com/palex-fpt/82fc3deed09c2de2a9023080a9613ce3

master results:
Duration (first/next) (largeArrayOfNulls): 14.200/2.300 s
Duration (first/next) (largeArrayOfStrings): 12.200/2.500 s
Duration (first/next) (largeArrayOfSmallObjects): 18.600/5.000 s
Duration (first/next) (largeObject): 130.400/112.300 s

branch results:
Duration (first/next) (largeArrayOfNulls): 16.500/3.500 s
Duration (first/next) (largeArrayOfStrings): 11.300/2.800 s
Duration (first/next) (largeArrayOfSmallObjects): 30.200/5.200 s
Duration (first/next) (largeObject): 145.700/3.400 s
Duration (first/next) (largeExportableObject): 7.500/3.300 s

@@ -65,36 +68,25 @@ public function warmUp(array $values)

EOF;

$exportSerializer = new PhpExportSerializer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhpExportSerializer is less capable than the current code, which accepts all serializable objects.
We should preserve this property.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PhpExportSerializer is backed by instance serializer, when data cannot be var_export-ed - it is serialized.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can merge PhpExportSerializer into PhpArrayTrait. It is not used in other Caches and it is not part of serialization contract of PhpArrayTrait.

$unserializeCallbackHandler = ini_set('unserialize_callback_func', __CLASS__.'::handleUnserializeCallback');
try {
$value = unserialize($serialized);
if (false === $value && serialize(false) !== $serialized) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a performance critical code, the previous logic should be preserved (checking the serialized "false" before unserializing)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does unnecessary unserialize only on false value. Does storing false value as opcache entry is so frequent? But, ok. I would change it back.

@@ -206,14 +206,7 @@ protected function doSave(array $values, $lifetime)
*/
protected function doFetch(array $ids)
{
$unserializeCallbackHandler = ini_set('unserialize_callback_func', __CLASS__.'::handleUnserializeCallback');
try {
return $this->checkResultCode($this->getClient()->getMulti($ids));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serialization is also handled by the extension itself, so that the Memcached should be provided an IdentitySerializer and this could should be kept as is, isn't it?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not add SerilaizerTrait to Caches that handle serialization by itselves: Memcached, Apcu, Doctrine. It looks like adding IdentitySerilizer to it is good idea.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would allow using a compressing serializer if needed, so would be great yes

@nicolas-grekas nicolas-grekas added this to the next milestone Jun 4, 2018
@palex-fpt
Copy link
Author

@nicolas-grekas
Copy link
Member

I just realized we cannot use the var_export() strategy with PhpArrayAdapter, because this strategy requires instantiating all objects in the pool, when typically only a few are needed per run.

@nicolas-grekas
Copy link
Member

For PhpFileAdapter, var_export() poses another issue, which is deep cloning, which is broken. :(

@palex-fpt
Copy link
Author

Was PhpArrayAdapter designed with that access strategy in mind?
PhpArrayAdapter - fast access to many cache entries. But heavy on memory usage.
PhpFilesAdapter - fast access to single cache entry. But looses to PhpArrayAdapter on multiple entries.

@palex-fpt
Copy link
Author

Object should be tested against serialization method. There is no universal serializator.

@nicolas-grekas
Copy link
Member

Oh yes it was. PhpArray is basically free on memory usage, because opcache puts the structure in shared memory, which is out if quota.

@nicolas-grekas
Copy link
Member

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jun 7, 2018

My recommendation for this PR would be to remove the export-based serializers and leave the PHP ones as is. That would make a first important step mergeable. Then, if we can find a better serialization strategy, let's do it in another PR.

@palex-fpt
Copy link
Author

So, for PhpArrayCache we want to store scalar/serialized data and unserialize it on demand.
Does same strategy required for PhpFilesAdapter?

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jun 7, 2018

For PhpArray, we want to create the objects on-demand yes.
For PhpFile also, but this is already achieved by the file container. Still there is another thing we need: deep copying of potentially nested objects.

@palex-fpt
Copy link
Author

PhpFile uses same strategy as PhpArray. It stores non-scalar values in serialized form. That add un-serialization performance hit on access.

@palex-fpt
Copy link
Author

Main point of this PR was to speed up access to opcache entries. This can be achieved by removing serialization part (allow to var_export objects with __set_state) or by changing serializator (igbinary ex.). It is possible to add extension points to select types that would be var_exported or to select serializator. But it is not possible to do that in PhpFilesCache preserving current data format. PhpFilesCache detects types by inspecting serialized content. Should it be done in fresh new implementation of CacheInterface? Can we change current data format?

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jun 7, 2018

Main point of this PR was to speed up access to opcache entries

I had a different goal, which you already achieved here (having a swappable serialization format, of special interest when using remote backends IMHO, to allow using e.g. igbinary.)

For OPcache entries, I propose #27543 instead.

@palex-fpt
Copy link
Author

I had a different goal, which you already achieved here (having a swappable serialization format, of special interest when using remote backends IMHO, to allow using e.g. igbinary.)

Ok. I would cleanup all changes from PhpFilesCache and PhpArrayCache.

IMHO we don't need IdentitySerializers in Apcu, Doctrine, Memcached, Redis.
There are caches that use binary data backend (Filesystem, Pdo). That caches should be supplied with Serializers. And there are caches that has native (or through extension) php support (Acpu, Memcached, Redis, Doctrine). There is no need to add IdentitySerializers to its.
In case someone would like to transform data before sending it to backend cache he can use ProxyAdapter to change data type.

@palex-fpt palex-fpt force-pushed the cache-marshaller branch 3 times, most recently from f1b3fe3 to b279460 Compare June 9, 2018 04:45
@nicolas-grekas
Copy link
Member

How should we move this forward? Taking #27543 into account, I would suggest reverting all changes to Php* and Array* from this PR (moving the patch to a later PR maybe), and focus on providing alternative serializers to adapter/caches that extend the AbstractCache / AbstractAdapter. This would allow using igbinary/whatever where it's most useful.
About naming, it's suggest using marshall again, so that there is no possible confusion with the Serializer component. I'm sorry about the flip flap here, I know I asked for the opposite before.
I moved #27543 in the Marshaller namespace, so that this PR could use the same (the PhpSerializer class should keep its name, to not collide with PhpMarshaller from #27543.)
WDYT?

fabpot added a commit that referenced this pull request Jun 18, 2018
…sible (nicolas-grekas)

This PR was merged into the 4.2-dev branch.

Discussion
----------

[Cache] serialize objects using native arrays when possible

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | -
| License       | MIT
| Doc PR        | -

This PR allows leveraging OPCache shared memory when storing objects in `Php*` pool storages (as done by default for all system caches). This improves performance a bit further when loading e.g. annotations, etc. (bench coming);

Instead of using native php serialization, this uses a marshaller that represents objects in plain static arrays. Unmarshalling these arrays is faster than unserializing the corresponding PHP strings (because it works with copy-on-write, while unserialize cannot.)

php-serialization is still a possible format because we have to use it when serializing structures with internal references or with objects implementing `Serializable`. The best serialization format is selected automatically so this is completely seamless.

ping @palex-fpt since you gave me the push to work on this, and are pursuing a similar goal in #27484. I'd be thrilled to get some benchmarks on your scenarios.

Commits
-------

866420e [Cache] serialize objects using native arrays when possible
@nicolas-grekas
Copy link
Member

#27543 is now merged, this can be rebased :)

@palex-fpt
Copy link
Author

Php* changes was already reverted. After reverting Array* it left only two caches: FilesystemCache and PdoCache. My end goal is to use igbinary for filesystem cache. Can we just add some boolean flag to Filesystem* constructor to use igbinary?

/** @var SerializerInterface */
private $serializer;

public function setSerializer(SerializerInterface $serializer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to be public? it's currently not used in a public way.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jun 20, 2018

I closing in favor of #27645, which provides auto-adaptative igbinary support. It does not provide serializer injection, but I'm not sure we need any actually. I would like to thank you for providing this PR anyway, and for your comments on the other cache-related PRs, it's been helping A LOT to improve the component. Let's continue on #27645.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants