Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Serializer] Add ability to collect denormalization errors #38472

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

julienfalque
Copy link
Contributor

@julienfalque julienfalque commented Oct 7, 2020

Q A
Branch? 5.x
Bug fix? no
New feature? yes
Deprecations? no
Tickets #37419
License MIT
Doc PR -

This PR is a follow-up of #38165.

I tried a different approach: instead of throwing special exceptions, denormalizers/deserializers must always return an instance of the new DenormalizationResult class which wraps the denormalized value or the collected errors.

@greedyivan
Copy link
Contributor

My objection to the proposed solution.

Collecting denormalization exceptions should be the usual way to use it because it is a very useful feature for the API.

Here we have to check result -- null !== $result->getDenormalizedValue() or empty($result->getInvariantViolations()) in case null is a valid result of denormalization. It is not obviously and not OOP-style at all.

It is very likely that this is a return to the json_last_erroruniverse.

@julienfalque
Copy link
Contributor Author

julienfalque commented Nov 4, 2020

Here we have to check result -- null !== $result->getDenormalizedValue() or empty($result->getInvariantViolations()) in case null is a valid result of denormalization.

My first approach was to throw specific exceptions instead (see #38165). The implementation was not very different and there was still overhead to check what happened in the nested denormalizers. Instead of checking what kind of result was returned, one would have to check whether a specific exception was thrown.

My main concern about that is that throwing an exception breaks the regular execution flow and you have no guarantee that it comes from the denormalizer that was just called. Maybe some deeper denormalizer actually thrown it and it wasn't catched by the denormalizer you called. This could lead to unexpected results.

Moreover, using exceptions seemed more like a hack than an actual solution to me.

It is not obviously and not OOP-style at all.

Can you elaborate on that?

One benefit of returning "result" objects is that this can more easily be checked: if a denormalizer doesn't return a DenormalizationResult instance, we could throw an exception immediatly, this could help third-party denormalizers implement the new workflow. Later we could even decide that the new workflow should become the default and drop the collect_invariant_violations.

Maybe the "result" object API can be improved though. I'm open to suggestions.

@greedyivan
Copy link
Contributor

greedyivan commented Nov 4, 2020

My main concern about that is that throwing an exception breaks the regular execution flow and you have no guarantee that it comes from the denormalizer that was just called. Maybe some deeper denormalizer actually thrown it and it wasn't catched by the denormalizer you called. This could lead to unexpected results.

Moreover, using exceptions seemed more like a hack than an actual solution to me.

That is return to that time when you have to use json_last_error after using json_decode.

That boilerplate code should be incapsulated in the library, and don't bother a client each time when it want to denormalize something.

I wrote another simple solution for that: #38968

  • It collects exceptions from both internal objects and types.
  • It collects only those exceptions that is marked for that explicitly.
  • It collects exceptions from initialization with constuctor.
  • It is more backward compatible, cause it violates only cases which explicitly want some kind of exceptions from serializer. It introduces a new one.

I'm not insist, but that solution is more appreciated to use with json-to-dto task, cause it will return a full list of fields, which cannot be denormalized.

public static function failure(array $invariantViolations): self
{
$result = new self();
$result->invariantViolations = $invariantViolations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be interesting to validate that the list really contains only instances of InvariantViolation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO this should be covered with static analysis tools such as PHPStan or Psalm, not with runtime assertions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you added a string to the $invariantViolations variable when there is extra attributes seems to prove that static analysis was not enough.
I don't see any reason not to enforce types, if it's because you think it will add some unwanted overhead in production maybe we can go half way by adding an assert(self::areViolationsValid($invariantViolations)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woops, indeed that's an epic fail :)

I still think this should not be a runtime assertion though (which would have not detected this mistake anyway). There is no static analysis on the Symfony codebase as far as I know, this means this requires me to write tests that covers those scenarios instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which would have not detected this mistake anyway

What do you mean ?
Asserts should be enable on development environment and only disable in production.
Meaning that when running your tests with a php.ini that enables the assertions an error would have been thrown.
In case the static method implementation was not clear:

private static function areViolationsValid(array $violations): bool
{
    foreach ($violations as $violation) {
        if (!$violation instanceof InvariantViolation) {
            return false;
        }
    }

    return true;
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which would have not detected this mistake anyway

What do you mean ?

I mean adding those assertions would have not prevented me from making this mistake because there are no tests to run that code as of now so I would have received no alerts about that anyway.

Asserts should be enable on development environment and only disable in production.

I would not allow disabling runtime assertions in production: the point is to prevent the application from silently ignoring errors and running in an invalid state. IMO detecting those error scenarios is more efficiently done using static analysis if possible, with tests otherwise. Here we have no static analysis so I'll go with tests.

@julienfalque julienfalque force-pushed the serializer-error-collection branch from 112fdfc to f8e564c Compare November 22, 2020 09:24
Copy link
Contributor

@camilledejoye camilledejoye left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job!
Just a few more ideas :)

public static function failure(array $invariantViolations): self
{
$result = new self();
$result->invariantViolations = $invariantViolations;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that you added a string to the $invariantViolations variable when there is extra attributes seems to prove that static analysis was not enough.
I don't see any reason not to enforce types, if it's because you think it will add some unwanted overhead in production maybe we can go half way by adding an assert(self::areViolationsValid($invariantViolations)) ?

@julienfalque julienfalque force-pushed the serializer-error-collection branch from f8e564c to d2b9674 Compare November 22, 2020 10:50
@julienfalque julienfalque force-pushed the serializer-error-collection branch from d2b9674 to 2ef0003 Compare November 22, 2020 11:23
@julienfalque julienfalque force-pushed the serializer-error-collection branch 5 times, most recently from 436021c to b0a559f Compare November 23, 2020 21:09
@julienfalque
Copy link
Contributor Author

I think I'm quite happy with the current API but this PR is in competition with #38968, I'd like to have some feedback to know whether I should invest more time on it. @dunglas may I friendly request a quick review from you please?

@camilledejoye
Copy link
Contributor

camilledejoye commented Nov 26, 2020

I personally like the operation result pattern, it's not common in PHP compare to languages with generics but it adds some value in this case.

I think there is a strong difference between expected and exceptional errors.
We often hears two things:

  • Exception should be kept for exceptional cases
  • We should never trust a user input

The errors we want to collect are errors related to an invalid data in the payload provided by the user.
It's an input and therefore we expect it to fails relatively often, exceptions don't really fit in this case.

And we might also have errors related to an invalid configuration of the serializer or invalid arguments (there is no guarantee the support method will always be called before the denormalize one), these are really exceptional to me.

The benefit to your approach is to clearly separate those two concerns, which avoid having to wonder how to handle each error based only on its type.
For instance when denormalizing, do we have an InvalidArgumentException because the $type is invalid or the $data ?
If it's the type then it should not be delayed, the exception should be propagated.
But if it's the data it might be better to collect it because it's a validation error.

On the other hand the exception approach allows to handle more cases out of the box and is less intrusive.
With your approach, when migrating we must update all our custom denormalizers.
IMO it's a good trade off in order to have a fine grained handling of the errors.

@julienfalque
Copy link
Contributor Author

julienfalque commented Nov 26, 2020

Thanks @camilledejoye, you put this in better words than I would do :)

Another reason I don't like using exceptions is that it relies on the fact that the parent denormalizers will handle them appropriately, but if they don't the whole process will easily fail or behave anormaly silently, without the developper being aware. Using the DenormalizationResult, any uncaught exception is a guarantee that some denormalizers didn't handle it appropriately and will immediately make this fact obvious.

/**
* Denormalizes data back into an object of the given class.
*
* When context option `collect_invariant_violations` is enabled, the
Copy link
Member

@dunglas dunglas Jan 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we deprecate not enabling this option? So in 6.0 we will have only one code path again.

Also, shouldn't we add a similar API to NormalizerInterface for consistency?

@symfony/mergers I'm interested in your opinions about this PR because it's an important change in the API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we deprecate not enabling this option? So in 6.0 we will have only one code path again.

I thought it would be better to have this as an experimental feature so as users don't get forced into a big API change. But thinking about it again, keeping it as an opt-in feature probably means third-party normalizers won't make a move to be compatible with the new API (because maintainers don't know about it or they can't/don't want to update). Deprecating the current API in favor of this one will also allow enforcing the API at type level by making DenormalizerInterface::denormalize() return a DenormalizationResult.

What I'm affraid of is that the new API might introduce some overhead to the denormalization process, is that something we want to be the new default?

Also: if the new API is the only one in 6.0, this new option shall be deprecated in 6.0 and removed in 7.0, right?

Also, shouldn't we add a similar API to NormalizerInterface for consistency?

The use case of the new API for denormalization is that input data can be untrusted and one might want to show detailed error messages rather than a technical exception to the user that provided that input. Is there a similar scenario for normalization? I'm totally ok to apply the same API changes to normalization, I just want to be sure it's actually worth it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to deprecate and 5.x and remove in 6.

I'm not sure about the use cases for NormalizerInterface, but I'm sure we can find some. For instance in API Platform we often have to store metadata along with the normalization result. An example: we store the IRIs of visited documents to generate cache tags. We use currently use mighty tricks involving the serialization context, having a result object would allow to make cleaner things. We'll probably need an extension point for this result object by the way (it can start as a simple a context map).

That being said, the key point here is consistency IMHO. Normalization and de normalization must work in a similar way. Consistent APIs are easier to learn and remember.

private $denormalizedValue;
private $invariantViolations = [];

private function __construct()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could we use a public constructor with named arguments. It's more idiomatic than named factory methods in PHP8.

Also, this would allow users to access both the normalized value and the errors. It could be useful in some or access the partially denormalized data even in case of error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree: named factories are more expressive about the intention of the caller and thus simplifies validating arguments. As an example, I could improve things here by validating that $invariantViolations is not empty in the failure() factory. It would not be possible with a single constructor that supports both cases because we lose the intention of the caller: since we don't know if we're in a success or failure case, we can only accept empty arrays in all cases and infer the use case depending on what the variable contains.

We could mitigate this by allowing e.g. null for $invariantViolations, but then validation would become more complicated and the API less clear.

I added a $partiallyDenormalizedValue argument to the failure() factory, which is a different name than in success(), that would not be possible with a single constructor.

@julienfalque julienfalque force-pushed the serializer-error-collection branch from b0a559f to 934e707 Compare January 3, 2021 09:48
@fabpot
Copy link
Member

fabpot commented Sep 10, 2021

see #42502

@julienfalque julienfalque deleted the serializer-error-collection branch September 10, 2021 07:27
hultberg pushed a commit to hultberg/symfony that referenced this pull request Sep 17, 2021
…ror during denormalization (lyrixx)

This PR was merged into the 5.4 branch.

Discussion
----------

[Serializer] Add support for collecting type error during denormalization

| Q             | A
| ------------- | ---
| Branch?       | 5.4
| Bug fix?      | no
| New feature?  | yes
| Deprecations? | no
| Tickets       | Fix symfony#27824, Fix symfony#42236, Fix symfony#38472, Fix symfony#37419 Fix symfony#38968
| License       | MIT
| Doc PR        |

---

There is something that I don't like about the (de)Serializer. It's about the way it deals with typed properties.
As soon as you add a type to a property, the API can return 500.

Let's consider the following code:
```php
class MyDto
{
    public string $string;
    public int $int;
    public float $float;
    public bool $bool;
    public \DateTime $dateTime;
    public \DateTimeImmutable $dateTimeImmutable;
    public \DateTimeZone $dateTimeZone;
    public \SplFileInfo $splFileInfo;
    public Uuid $uuid;
    public array $array;
    /** `@var` MyDto[] */
    public array $collection;
}
```

and the following JSON:

```json
{
	"string": null,
	"int": null,
	"float": null,
	"bool": null,
	"dateTime": null,
	"dateTimeImmutable": null,
	"dateTimeZone": null,
	"splFileInfo": null,
	"uuid": null,
	"array": null,
	"collection": [
		{
			"string": "string"
		},
		{
			"string": null
		}
	]
}
```

**By default**, I got a 500:
![image](https://user-images.githubusercontent.com/408368/129211588-0ce9064e-171d-42f2-89ac-b126fc3f9eab.png)

It's the same with the prod environment. This is far from perfect when you try to make a public API :/
ATM, the only solution, is to remove all typehints and add assertions (validator component). With that, the public API is nice, but the internal PHP is not so good (PHP 7.4+ FTW!)

In APIP, they have support for transforming to [something](https://github.com/api-platform/core/blob/53837eee3ebdea861ffc1c9c7f052eecca114757/src/Core/Serializer/AbstractItemNormalizer.php#L233-L237) they can handle gracefully. But the deserialization stop on the first error (so the end user must fix the error, try again, fix the second error, try again etc.). And the raw exception message is leaked to the end user. So the API can return something like  `The type of the "string" attribute for class "App\Dto\MyDto" must be one of "string" ("null" given).`. Really not cool :/

So ATM, building a nice public API is not cool.

That's why I propose this PR that address all issues reported
* be able to collect all error
* with their property path associated
* don't leak anymore internal

In order to not break the BC, I had to use some fancy code to make it work 🐒

With the following code, I'm able to collect all errors, transform them in `ConstraintViolationList` and render them properly, as expected.

![image](https://user-images.githubusercontent.com/408368/129215560-b0254a4e-fec7-4422-bee0-95cf9f9eda6c.png)

```php
    #[Route('/api', methods:['POST'])]
    public function apiPost(SerializerInterface $serializer, Request $request): Response
    {
        $context = ['not_normalizable_value_exceptions' => []];
        $exceptions = &$context['not_normalizable_value_exceptions'];

        $dto = $serializer->deserialize($request->getContent(), MyDto::class, 'json', $context);

        if ($exceptions) {
            $violations = new ConstraintViolationList();
            /** `@var` NotNormalizableValueException */
            foreach ($exceptions as $exception) {
                $message = sprintf('The type must be one of "%s" ("%s" given).', implode(', ', $exception->getExpectedTypes()), $exception->getCurrentType());
                $parameters = [];
                if ($exception->canUseMessageForUser()) {
                    $parameters['hint'] = $exception->getMessage();
                }
                $violations->add(new ConstraintViolation($message, '', $parameters, null, $exception->getPath(), null));
            };

            return $this->json($violations, 400);
        }

        return $this->json($dto);
    }
```

If this PR got accepted, the above code could be transferred to APIP to handle correctly the deserialization

Commits
-------

ebe6551 [Serializer] Add support for collecting type error during denormalization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants