Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC][Serializer] Serializer redesign #19330

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
theofidry opened this issue Jul 10, 2016 · 16 comments
Closed

[RFC][Serializer] Serializer redesign #19330

theofidry opened this issue Jul 10, 2016 · 16 comments
Labels
RFC RFC = Request For Comments (proposals about features that you want to be discussed) Serializer

Comments

@theofidry
Copy link
Contributor

theofidry commented Jul 10, 2016

The serializer has grown a lot especially since the 2.7. However it suffers (IMO) from one problem: its whole design is based on inheritance:

  • AbstractNormalizer
  • AbstractObjectNormalizer
  • ObjectNormalizer
  • GetSetMethodNormalizer
  • AbstractNormalizer has a lot about circular references handling and how to instantiate objects
  • AbstractObjectNormalizer extends AbstractNormalizer to specify how to normalize/denormalize objects
  • ObjectNormalizer implements how to extract attributes (via reflection) from an object and delegates the read/write attributes to the property accessor

Now this is working ok as it is. However, there is a big assumption here: the objects you are handling are POPO. In the case of Eloquent ORM for example, the entities are very different beast which requires their own Serializer and PropertyAccessor. As you can see in the repository, there is not that much required to get it works. But you loose one thing: all the children serializer. Indeed if you relied on ObjectNormalizer in your library for all your serializers (like done in ApiPlatform for example), you cannot just switch of base serializer like that very easily. Another implication is that with the current design, it's very encouraged to extend serializers, where sometimes you could just decorate it to make use of composition instead.

The Serializer have seen itself a lot of features added via inheritance and IMO this is causing problem. It feels like it was quite simple at first, and one feature after another has been added but still with the mindset of "keeping things simple", having them a bit hidden. So instead of introducing new interfaces to provide new extension points, everything is done with inheritance and protected. It is a bit limiting and complexify a lot the serializer.

Now the Serializer works well right now, so this whole post could be dismissed as "it's ok, it could have been designed better but it's a lot of work especially because of the BC promise". But given the rapid growth of the Serializer, it would also be fair to take a step back and reconsider deeper changes too (I am not really talking in terms of adding new features, but rather about how the implementation of the extension points).

Note: edited a bit the post to reflect on the discussion below.

@javiereguiluz javiereguiluz changed the title [Serializer] Serializer design [RFC][Serializer] Serializer redesign Jul 13, 2016
@javiereguiluz javiereguiluz added Serializer RFC RFC = Request For Comments (proposals about features that you want to be discussed) labels Jul 13, 2016
@goetas
Copy link
Contributor

goetas commented Jul 17, 2016

Does it make sense to continue working on it? Projects as jms serializer (even if not well mantained ) are much more flexible... and have similar performance

But if you really want, why do not remove the normalized representation? This will simplify the code a lot.

@theofidry
Copy link
Contributor Author

Does it make sense to continue working on it?

The serializer grew up quickly: I don't think we could have though much on how to do things before because the need was not here. So to me it makes sense to work on it now.

Projects as jms serializer (even if not well mantained ) are much more flexible... and have similar performance

Besides being not very well maintained, JmsSerializer does not offer more features and is less performant. It also has a license issue which is also a reason why more changes have been pushed to Symfony Serializer rather than JmsSerializer.

But if you really want, why do not remove the normalized representation? This will simplify the code a lot.

I don't really understand what you mean by that.

@goetas
Copy link
Contributor

goetas commented Jul 18, 2016

But if you really want, why do not remove the normalized representation? This will simplify the code a lot.

I don't really understand what you mean by that.

Currently the whole implementation of the symfony serializer suffers from the "normalization" bottleneck.

It does not matter how good will be the implementation, a normalized array will never work well at the same for JSON, XML, YAML... ( i have added YAML since the aim of the symfony serializer is to be a generic serialize platform, right?)

Output formats (XML, JSON) are really different and they can not be generalized with a normalized array... will try to explain it a little:

Let's suppose we are serializing an object into JSON.The normalized array needs to be pretty simple, since the mapping with the JSON encoder can be almost 1:1.

Let's now think the same normalized array in an XML context, it is obvious that is not enough. XML is way more complex than JSON, and needs parameters that allows you to decide attributes, elements, namespaces... This means that you need a really complex normalized array to encode all the xml special cases.
On the other side, if we focus on XML, a normalized array will be really overcomplicated for what JSON needs.

A fact about my statement is the difference in complexity visible in https://github.com/symfony/serializer/blob/master/Encoder/XmlEncoder.php#L368 vs https://github.com/symfony/serializer/blob/master/Encoder/JsonEncode.php

Without considering that the current state of symfony serializer ignores completely some fundamental XML concepts as namespaces. This means that a "good enough" version of the XML encoder will require a really complex normalized array (and will total waste of time in case of a JSON focused serialization)

It does not matter how good will be the normalizer, it will be always a bottleneck.

The multi format support is a lie since requires a different normalized format.

Let's analyze this case:
encode

In this case we are creating some normalizers to serialize into JSON and XML our User. To do it we have to implement probably some normalization class since is unlikely probable that the default implementation can work. (Note: the normalized array for JSON is different compared to XML, as example, xml attributes are denoted with an @ as prefix)

But let's suppose that the default encoder is not good enough for our intent (for xml is obvious since it does not support a lot of xml features as namespaces, xincludes, processing instructions... )

custom encoder

We will have to create a new set of normalizers that will have to inject into the normalized array special cases that will have to be handled by the custom implementation of our encoder.
I would like to highlight that the new normalized array probably is not compatible with the default encoder.
This also shows that the relation between normalizer-encoder is one to one. Normalizers and encoders are strictly connected, each time that one changes, the other has to change too.

I can agree with the current architecture only if the symfony serializer is called "json-serializer".
(Maybe yaml too... but yaml supports comments and references, while json does not, so, again, will be a different normalization array for different formats)

@goetas
Copy link
Contributor

goetas commented Jul 18, 2016

From what I can see as pull requests for the serializer component, it is ending as a re-implementation of the jms serializer... but it will be hard to implement because if the initial architecture choice.

@theofidry
Copy link
Contributor Author

@goetas if you see changes that are required at a deeper level, it is worth raising it. However I'm not taking things that far. Having a 1:1 relationship between Normalizers and Encoders is okay (IMO at least, unless you see a problem with that), the current architecture allows that very easily. If Symfony Serializer is very lacking XML wise, and say JmsSerializer is much more advanced in that aspect, I think it's ok as well. People get to choose and unless there is an XML guru ready to devote a lot of time for the Serializer and maintain it, I highly doubt that the Serializer will be "XML focused" but rather stay "JSON focused".

The issue I was rising is that I see a lot of hidden contracts in the current implementation, related to handling cache keys, circular references, instantiating or populating objects. I don't think it needs a lot more than what it has right now (at least in case of JSON, I can't speak for XML), my complain is about how those features are currently implemented which IMO offers less flexibility than they could.

@goetas
Copy link
Contributor

goetas commented Jul 18, 2016

if you see changes that are required at a deeper level, it is worth raising it. However I'm not taking things that far.

i always saw symfony as a place with best practices, where what is offered is close to "really good" or "good enough", while at the moment the current implementation IMO is "good for a tutorial on a blog post".
But of course, having clarified the goal, i have nothing against the current implementation.

Having a 1:1 relationship between Normalizers and Encoders is okay (IMO at least, unless you see a problem with that), the current architecture allows that very easily

If they are 1:1, means that they are solving the same problem, and can be merged into a single concept. This will make even the development easier, allowing developers to write code for a single concept, a "json serializer" or a "xml serializer" or a "yaml serializer".

There is a risk that the symfony serializer will drive too many developers into a no-exit road (as is already happening, seeing so many people implementing JMS-like feature on top of the sf serializer).

Just one example, https://github.com/thephpleague/fractal does clearly conversion from object to JSON, simple and coincise. I'm not saying that is perfect, but the aim is well defined, and the architecture fits well the proposed goal.

The issue I was rising is that I see a lot of hidden contracts in the current implementation, related to handling cache keys, circular references, instantiating or populating objects. I don't think it needs a lot more than what it has right now (at least in case of JSON, I can't speak for XML), my complain is about how those features are currently implemented which IMO offers less flexibility than they could.

Personally I do not see a reason to refactor a broken architecture this using some nice object composition or class inheritance model. I prefer to fix the architecture first.

What i can suggest is a slightly different architecture.

Serialization:

  • Having a object visitor that emits events for objects that are visited.
  • Having a serializer (JSON, XML, YAML CSV...) that listens to events and builds an internal data structure useful to serialize the object into the desired data format.
  • at the end of the object visit, we get the serialization result

(this approach allows as example low memory usage streaming serialization as example...)

De-Serialization:

  • Having a object visitor that emits events for objects that are visited.
  • Having a de-serializer (JSON, XML, YAML CSV...) that listens to events and uses some factories and the sf property accessor to create/populate the desired object.
  • at the end of the object visit, we get the de-serialization result
interface Serializer
{
    // start event
    mixed load(mixed $data)

    // scalar visited event
    void visitScalarItem(scalar $data, object $type)

    // start visiting the object (creation ?)
    void startObjectVisit(object $class, mixed $data, object $type)

    // visit some kind of object item
    void visitObjectItem(object $item, mixed $data)

    // object visit complete event
    mixed endObjectVisit(object $class, mixed $data, object $type)

    // end event
    mixed getResult()
}

This can be the interface expressed as pseudo code of the serializer. The visitor can keep track if depth and some filtering strategies, emitting events based on class reflection.

But... this approach obviously looks very similar to JMS.

@theofidry
Copy link
Contributor Author

i always saw symfony as a place with best practices, where what is offered is close to "really good" or "good enough", while at the moment the current implementation IMO is "good for a tutorial on a blog post".

Woua, that's harsh :P

Having a 1:1 relationship between Normalizers and Encoders is okay (IMO at least, unless you see a problem with that), the current architecture allows that very easily

If they are 1:1, means that they are solving the same problem, and can be merged into a single concept. This will make even the development easier, allowing developers to write code for a single concept, a "json serializer" or a "xml serializer" or a "yaml serializer".

The I disagree saying it's a 1:1, the current JsonEncoder for example, is more than enough as it is. I'm not a XML or JSON master, but it does look it's missing anything, or at least anything that should belong to the core. I apparently can't say as much for the XmlEncoder as according to you it is lacking in a lots of aspect.

There is a risk that the symfony serializer will drive too many developers into a no-exit road (as is already happening, seeing so many people implementing JMS-like feature on top of the sf serializer).

That's something I would avoid and I hope it's not gonna happen... The Serializer handle more than enough cases as it is IMO, more than that feels too specific to belong to the core. That's said I'm no judge there.

Just one example, https://github.com/thephpleague/fractal does clearly conversion from object to JSON, simple and coincise. I'm not saying that is perfect, but the aim is well defined, and the architecture fits well the proposed goal.

I don't feel like there is a problem at this level for the Serializer, the goal is simple: transform a data structure into another. The difference with fractal for example, would be that it not limited to the JSON format.

I see where this is going for the architecture, but (at least to me) it sounds overly complicated. By that I'm not implying it's bad, but the way the Serializer works right now is not bad and works well for most people (or at least from what I'm seeing, which is of course biased...). I can perfectly understand that it can be too limiting for some use cases (e.g. https://github.com/goetas/xsd2php), but that's where you could be a more advanced and bloated Serializer like JMS.

At the end of the day, the most common use case for the serializer is: I have a JSON/XML input, get an object out of it, or given an object, get a JSON/XML.

Note: I realise my wording is not always ideal. In short what I want to say is that the Serializer is "good enough" as it is IMO, I'm just unhappy with how the current extension points are done.

@ogizanagi
Copy link
Contributor

ogizanagi commented Jul 18, 2016

Just saying, but to me the Symfony's serializer does not especially suffer from a bad design, especially because its philosophy is far from the JMS serializer one. The idea of the Symfony's serializer is to keep things simple, not trying to answer every use-cases natively, but instead providing a great and simple architecture.

Despite the fact it offers some great features, I'm almost never using the ObjectNormalizer and rather rely on custom normalizers (sometimes encoders) for each of my needs. Because when I'm writing code for my application, I know exactly what the output should be. I do not need something answering everything, just good and simple interfaces over it. This is way less brainfucking than a JMS Serializer, as soon as you're not trying to answer very generic needs.

Now I can understand some of your concerns if you need to handle things in a very generic way, but to me there is nothing impossible with the current architecture regarding this.
But anyway, I don't think the Symfony's serializer should try to replace entirely a JMS Serializer. Those are simply two different tools. Use or don't use them.

@goetas
Copy link
Contributor

goetas commented Jul 18, 2016

Note: I realise my wording is not always ideal. In short what I want to say is that the Serializer is "good enough" as it is IMO, I'm just unhappy with how the current extension points are done.

You are right, I understood this RFC as something more generic and as an point of discussion for a complete direction change. If the idea is to keep a simple serializer, of course there are improvements into the class hierarchy that can help.

@dunglas
Copy link
Member

dunglas commented Jul 27, 2016

@goetas Indeed the Symfony Serializer doesn't (and will probably never) support advanced XML features such as namespaces. You need to use another tool for that.
However it is very good to export a PHP structure to a 1:1 structure in a standardized format. For instance take a look at we've done in the v2 of API Platform: a graph of entities can be exposed in a bunch of standard formats using content negotiation: JSON-LD, HAL, raw JSON, YAML, CSV and even basic XML (even if it is not advertised right now because it's far from being perfect due to the complexity of XML).

Another example of what is doable easily: storing any PHP structure in a JSON field of a RDBMS (1:1 too) to make it queryable: https://github.com/dunglas/doctrine-json-odm

It is possible and easy thanks to the Symfony Serializer, but it works because we want different representations of the same data structure without enrichment during the serialization process.

If you need some enrichment, the visitor pattern is the way to go and JMS Serializer (depsite the license issue and its lack of maintainance) is probably the right tool.

However, the simple and easy to understand approach followed by the Symfony Serializer is enough for most use cases (especially in the web API field, l most new features have been added in that context). It is easy to add new normalizers and encoders, easy to extend, and very powerful to fit complex business needs.

@theofidry indeed the design of the component can be improved by extracting most of the code of the abstract normalizers in separate classes, introducing interfaces and using composition. I'm sure that it can be done without introducing BC breaks (only by deprecating things) and it will help supporting tools using non-POPO objects like Eloquent and Pomm. It is definitely the path to follow.

This decoupling can be done while implementing the new architecture I proposed here: #19374 (comment)

Those improvements impact several components (PropertyAccess, PropertyInfo and Serializer) but they will make the Serializer more performant, decoupled and extensible without introducing BC breaks. It's a winning operation.

@theofidry
Copy link
Contributor Author

@dunglas I remember seeing this comment it would indeed be a very nice opportunity to both improve drastically performances of the Serializer and rework the extend points.

I think we should include #19291 in the loop as well, in your comment you are talking of changing some things in PropertyAccessor, I think it's worth checking how we could do that to add support for immutable objects to the serializer.

@ro0NL
Copy link
Contributor

ro0NL commented Aug 7, 2016

If symfony serializer could replace jms one day, i would be soooo happy.

Ie. i think in the long run it's worth standardizing this (basicallymostly object hydration) once and for all, according to symfony standards, preferably well designed 👼, but most important being maintained.

@carsonbot
Copy link

Thank you for this suggestion.
There has not been a lot of activity here for a while. Would you still like to see this feature?

@carsonbot
Copy link

Could I get a reply or should I close this?

@dinamic
Copy link

dinamic commented Mar 5, 2021

Shall we keep this open?

@carsonbot carsonbot removed the Stalled label Mar 5, 2021
@stof
Copy link
Member

stof commented Mar 5, 2021

Given that lots of refactoring have happened in the serializer since 2016 (including extracting some proper extension points), I'm closing this issue.

@stof stof closed this as completed Mar 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC RFC = Request For Comments (proposals about features that you want to be discussed) Serializer
Projects
None yet
Development

No branches or pull requests

10 participants