Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[RFC][Validator] Move the filtering/sanitation/normalization out of the validators #35316

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sylfabre opened this issue Jan 12, 2020 · 20 comments
Labels
RFC RFC = Request For Comments (proposals about features that you want to be discussed) Stalled Validator

Comments

@sylfabre
Copy link
Contributor

Following up on #28232

Addressed issues

  1. Some validators alter the data they are validating before running the validation.
    For instance, IbanValidator will remove any space and capitalize the string it is given before running validation.

  2. The behavior described in 1 is not consistent across validators.
    For instance, IbanValidator accepts lower-cased strings while BicValidator refuses them.

  3. The behavior described in 1 breaks the Single Responsibility principle as the role of the validator is to validate, not sanitize nor filter nor normalize the data it is working on.

Description
Filtering some common accepted formattings is good UX (like stripping space off an IBAN) as it widens the acceptable data set, so it must be kept. The related use-case are forms written with Form component.

Each filtering should be placed in dedicated classes implementing the DataTransformInterface of the Form component.

Dedicated FormTypes should be created and register these DataTransform implementation (like \Symfony\Component\Form\Extension\Core\Type\NumberType does with both NumberToLocalizedStringTransformer and \Symfony\Component\Form\Extension\Core\DataTransformer\StringToFloatTransformer)

In the next minor version, validators should have an option strict set to false by default to preserve BC. Setting this option to true should skip normalization of the data under validation.

In the next major version:

  • Validators should drop normalization and only run strict validation.
  • Developers should use the new types in their forms.

Identified validators with this flaw:

  • BicValidator (removes spaces)
  • IsbnValidator (removes dashes)
  • IbanValidator (removes spaces, capitalize)
  • LocaleValidator (because of the canonicalize option)

All validators with a normalizer option should be updated too (cf symfony/validator@fdf8dfc#diff-5b9cb67322b0684885ea5ae77e6932da) to remove this normalizer

UuidValidator already has a strict option which may be out of the scope: I'm just wondering what is the point of accepting non-compliant UUIDs.

Example

For instance, we could rely on the following code for IBANs.

<?php

class StringToIbanTransformer implements DataTransformerInterface
{
    public function transform($value)
    {
        if (null === $value) {
            return null;
        }

        if (!\is_string($value)) {
            throw new TransformationFailedException('Expected a string.');
        }
    
        return str_replace(' ', '', strtoupper($value));
    }
    
    public function reverseTransform($value)
    {
        if (null === $value) {
            return null;
        }

        if (!\is_string($value)) {
            throw new TransformationFailedException('Expected a string.');
        }
        
        // Some code to add a space every 4 chars
        // FR1420041010050500013M02606 => FR14 2004 1010 0505 0001 3M02 606
        
        return $value;
    }
}
<?php

class IbanType extends AbstractType
{
    /**
     * {@inheritdoc}
     */
    public function buildForm(FormBuilderInterface $builder, array $options)
    {
        $builder->addViewTransformer(new StringToIbanTransformer());
    }
}
@chalasr chalasr added RFC RFC = Request For Comments (proposals about features that you want to be discussed) Validator labels Jan 12, 2020
@nicolas-grekas
Copy link
Member

nicolas-grekas commented Jan 12, 2020

I've been writing validators/filter before, and my experience is that they are completely linked.
That's not two separate responsibilities that's one: any validation requires some normalization before.
Splitting normalization in a separate piece of code only forces doing the normalization twice: once for validation, twice for normalization (and duplicate the logic, which is a big hint for the SRP break).

My preference would be to make this clear and have validation return the normalized version of what is processed. There could be a strict mode, where input would have to be identical to the normalized version. The robustness principle tells us: be tolerant when parsing input, and strict in what you output.

So, implementing this RFC would be a design mistake for me, I'd better go with what I just described.

@ro0NL
Copy link
Contributor

ro0NL commented Jan 12, 2020

Developers should use the new types in their forms.

forms and validation are separate components; we should reason as such

I agree with @nicolas-grekas, being able to get the normalized value from validation sounds like a good feature addition. Then, strict validation is simply $inputValue === $normalizedValue

@sylfabre
Copy link
Contributor Author

@ro0NL I get your point about forms and validation: maybe then the DataTransformers / normalizers should be part of another component at it is required by both of them.

@nicolas-grekas I agree with you about these points:

  • I've been writing validators/filter before, and my experience is that they are completely linked.
  • any validation requires some normalization before
  • be tolerant when parsing input, and strict in what you output

I've done it too. But IMO it does not mean they have to be part of the same PHP class.
My point is that the validator should only focus on the validation part, while a normalizer should focus on the parsing/normalization part.

Besides, if both are mixed together that means you have to run validation to get a normalized value.
It looks like the job of a normalizer to me, and mixing both is overkill in case you want to normalize some data without validating it.

Lastly, I don't understand why splitting forces doing the normalization twice. It just implies that validation will be strict. You can still have robustness: just normalize your data before validation as it won't do it like it used to.

Right now we have inconsistencies in the validation that I would like to address in the best possible way:

  • sogefrpp is an invalid BIC but fr1420041010050500013m02606 is a valid IBAN
  • FR142004          1010050500013M02606 is a valid IBAN but sylvain       @assoconnect.com is not a valid email

And not being able to get the normalized value can then lead to this use-case:

  1. The database is designed to accept up to 34 chars for an IBAN (as per the related current ISO standard)
  2. A user submits a form with a space-padded IBAN
  3. Validation says the IBAN is valid
  4. The database rejects the INSERT statement because the value is too long

Now let's say we update validators to return the normalized value:

  • How may it be used when validation is used on an object using annotations like on an entity or a DTO (I'm thinking about API Platform here)?
  • Should the Form component use this returned normalized value to update the "normalized" format (used for internal processing as defined in the class comment of Symfony\Component\Form\Form)?

I'm asking to find a way to implement the way you are thinking about because it solves the issue too.

@sylfabre
Copy link
Contributor Author

@nicolas-grekas I share the point of view of @fabpot
given here #30272 (comment)

A validator should only validate the official way.

According to this FR142004          1010050500013M02606 is an invalid IBAN and should be rejected by the validator.

@ro0NL
Copy link
Contributor

ro0NL commented Jan 13, 2020

i think, if a constraint's normalization logic is wrong, that's a bugfix per case.

overall, IIUC, the bigger picture and actual feature request is to obtain normalized data from validation, and let forms populate the underlying model as such. Correct?

@sylfabre
Copy link
Contributor Author

@ro0NL duly noted for bug reports. So far, I haven't found any wrong constraint's normalization logic, but only discrepancies between two given validators about accepted values.

My bigger picture and actual feature request i:

  • to get in the end normalized and strictly validated data
  • a solution to keep using validation with annotation

I've mentioned the Form component here because I think it is a major use case for validation but I actually don't use it in my project so far. I only use validation through annotations on DTOs with API platform. And I expect my API to receive and accept only normalized data, that's why I'm thinking that normalization should not be part of validation.

Any help on this matter is welcome and we (devs in my company) can help on submitting PRs to fix the issue. Another alternative is to write and publish strict validators based on the Symfony ones but I don't like this idea.

@ro0NL
Copy link
Contributor

ro0NL commented Jan 13, 2020

I haven't found any wrong constraint's normalization logic, but only discrepancies between two given validators

that can be legit isnt it? given each constraint is bound to each own domain; so its specs may differ.

if i validate either FR142004 1010050500013M02606 or fr1420041010050500013m02606 at https://www.ibancalculator.com/ it's considered valid, and displayed normalized (other tools confirm as well).

so this needs to be verified per case still.

And I expect my API to receive and accept only normalized data

that's against the robustness principle :)

If we can get normalized values from validation, then we can proceed with e.g. normalize=true|false for constraint annotations and form field options. Constraints would return the normalized value y/n, whereas form would populate the model using the obtained normalized value y/n.

API Platform is out of scope and could implement a same technique when validating and de-serializing payloads.

All validators with a normalizer option should be updated too to remove this normalizer

why? Those are user-defined normalization rules, and out-of-scope as well :)

@sylfabre
Copy link
Contributor Author

@ro0NL ok I'll study discrepancies more closely and I'll open dedicated issues if I think they are legit.

You're right about the robustness and my API. Actually, the real cause is that I don't know how to get the normalized data once API Platform has validated the payload. Your proposition would work, so I'll ask the API Platform about this point (we are starting a project with Les tilleuls so that could help!).

I was thinking that

All validators with a normalizer option should be updated too to remove this normalizer

because I had in mind that validators should not normalized. If the solution is to update validators to return normalized data, then this option is 100% legit.

How should we move forward? I'm not aware of the RFC process.
We can work on a PR to make validators return the normalized data.

@ro0NL
Copy link
Contributor

ro0NL commented Jan 13, 2020

I think we should start with the Validator component. Considering the IBAN constraint;

set the normalized value in the current context whereever the state is valid (e.g. after all violations): $this->context->setNormalizedValue($normalized) (edit: actually, i think we should always set it)

then obtain the normalized values in RecursiveContextualValidator. I think it's best to pass along to ConstraintViolationListInterface as it's the return value for end users, in both valid and invalid scenario. At this point e.g. ConstraintViolationListInterface::getNormalizedValues is a structure holding each validated property path.

To provide a consistent set of values, we should provide the input value whenever a constraint doesnt set a normalized value, to be able to gracefully upgrade per constraint.

Form component + API Platform can iterate each normalized property path, and set its (normalized) value using the property accessor.

Lastly, make everything conditional using normalize=true|false options. However, im not sure about such an option for constraints, as it breaks robustness principle. IMHO we only need to solve hydrating normalized values where needed (thus we only need to be able to obtain them, which simplifies things in general).

@TristanPouliquen
Copy link
Contributor

To pitch in this discussion on the idea of robustness: Isn't it the role of the Serializer (for an API, the Form component otherwise with the DataTransformers mentioned earlier) to manage this aspect as it is the interface with the end user?

In this case, it should be its responsibility to correctly map possible slightly invalid values to a canonical form. And after having passed the serialization, we should be able to rely on the fact that we have a normalized data (after all, that's why we have objects named Normalizers, no?).

Therefore, I really think that the normalization shouldn't be embedded in the validation process, but live on its own. Symfony has done a fabulous job at separating concerns and processes, I would be glad to help continuing on this road ! 🎉

@sylfabre
Copy link
Contributor Author

@nicolas-grekas @ro0NL Any feedback on Tristan's last message?

@stof
Copy link
Member

stof commented May 20, 2020

Currently, validators can do what they want internally to simplify their implementation. But they cannot modify the validated value itself (well, for validators running on an object, they might mutate it if the object is mutable, but that's something unavoidable). To me, things should stay this way.

If your API expects to get normalized IBAN in the model, using normalizers in the Serializer component looks the way to go to me.

@stof
Copy link
Member

stof commented May 20, 2020

Thus, adding support for mutating the validated value in the Validator component would not be simple. The validation process traverses the data graph. Any mutation would have to be propagated up (and objects might not have setters, the Validator component does not require them).

@carsonbot
Copy link

Thank you for this suggestion.
There has not been a lot of activity here for a while. Would you still like to see this feature?

@stof
Copy link
Member

stof commented Feb 18, 2021

maybe the listed validators should have a canonicalize option to control whether the validators should canonicalize the value before validation or no (so whether they accept sloppy values or no). this would allow projects to choose either the strict behavior or the current one depending on their need

@carsonbot carsonbot removed the Stalled label Feb 18, 2021
@carsonbot
Copy link

Thank you for this suggestion.
There has not been a lot of activity here for a while. Would you still like to see this feature?

@sylfabre
Copy link
Contributor Author

@stof I like your idea about a canonicalize option.

Do you think it's doable to have a global setting for it? So it can be set up at the project level for all the codebase?

@carsonbot carsonbot removed the Stalled label Aug 30, 2021
@carsonbot
Copy link

Thank you for this suggestion.
There has not been a lot of activity here for a while. Would you still like to see this feature?

@carsonbot
Copy link

Just a quick reminder to make a comment on this. If I don't hear anything I'll close this.

@carsonbot
Copy link

Hey,

I didn't hear anything so I'm going to close it. Feel free to comment if this is still relevant, I can always reopen!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
RFC RFC = Request For Comments (proposals about features that you want to be discussed) Stalled Validator
Projects
None yet
Development

No branches or pull requests

7 participants