Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Validator] Add the Charset constraint #53154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 27, 2023

Conversation

alexandre-daubois
Copy link
Member

Q A
Branch? 7.1
Bug fix? no
New feature? yes
Deprecations? no
Issues -
License MIT

Our use case: we receive some file contents in our DTOs that we only want to process if their encoding matches UTF-8 and reject the whole thing at validation otherwise.

@carsonbot carsonbot added this to the 7.1 milestone Dec 20, 2023
@alexandre-daubois alexandre-daubois force-pushed the encoding-constraint branch 2 times, most recently from 2562ec4 to 5aed25d Compare December 20, 2023 09:51
Copy link
Contributor

@94noni 94noni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this, exactly the day our production crashed on some file upload ^^
(excel file wrongly saved with bad encoding)

@alexandre-daubois alexandre-daubois changed the title [Validator] Add the Encoding constraint [Validator] Add the CharacterEncoding constraint Dec 21, 2023
@nicolas-grekas
Copy link
Member

Just bikeshadding about the name: what about Charset?

@alexandre-daubois
Copy link
Member Author

alexandre-daubois commented Dec 21, 2023

I like Charset very much! I find it clearer. I updated the PR. I kept the encodings constraint option name to match the name of the mb_detect_encoding() function argument.

@nicolas-grekas nicolas-grekas changed the title [Validator] Add the CharacterEncoding constraint [Validator] Add the Charset constraint Dec 26, 2023
@nicolas-grekas
Copy link
Member

My thoughts: On all my apps, I'd want to reject any non-UTF-8 content. It might already be the case when the DB refuses to persist invalid UTF-8. So this contraints looks both needed and not needed since I think we might want to add this check earlier when processing the input.
Does that make sense?

@alexandre-daubois can you share your use case more precisely? When can one submit non-UTF-8 in your case?

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we receive some file contents in our DTOs

hum, actually you already answered that sorry :)

here are some minor comments, LGTM

@alexandre-daubois
Copy link
Member Author

Thank you for the review, comments are addressed 👍

Copy link
Member

@dunglas dunglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe could you make the new classes final?

@alexandre-daubois
Copy link
Member Author

final suits me fine, added 👍

@nicolas-grekas
Copy link
Member

Thank you @alexandre-daubois.

@smnandre
Copy link
Member

Hi @alexandre-daubois

There is a small problem in the implementation, i poke you there but if you prefer i can create a full issue

As you use the "valid" encodings in your call to mb_detect_encodings, if the value is not encoded in one of those values, the function will always return false.

Meaning you'll never be able to display anything in the {{detected}} value of the message.

if (!\in_array($detected = mb_detect_encoding($value, $constraint->encodings, true), (array) $constraint->encodings, true)) {
    $this->context->buildViolation($constraint->message)
        ->setParameter('{{ detected }}', $detected)
        ->setParameter('{{ encodings }}', implode(', ', $constraint->encodings))
        ->setCode(Charset::BAD_ENCODING_ERROR)
        ->addViolation();
  }

With '😊' and ['ASCII'], the generated message will be

The detected character encoding is invalid (). Allowed encodings are ASCII.

Oh and just another (minor) thing.. the mb_string extension seems to not be required by Validator, but the mb_string polyfill does only check ASCII, UTF-8 and latin-ish encodings. Maybe something to add in doc, or to check at runtime ?

@alexandre-daubois
Copy link
Member Author

Hi! Thank you for letting me know Simon, I'll provide a fix for this 🙂

xabbuh added a commit that referenced this pull request Jan 16, 2024
…dator` (alexandre-daubois)

This PR was merged into the 7.1 branch.

Discussion
----------

[Validator] Fix charset encoding detection in `CharsetValidator`

| Q             | A
| ------------- | ---
| Branch?       | 7.1
| Bug fix?      | yes
| New feature?  | no
| Deprecations? | no
| Issues        | Fix #53154 (comment)
| License       | MIT

After `@smnandre` suggestion, I updated the constraint to display a better message on the detected encoding. Indeed, if we fail to check the encoding is in the provided ones, then we fall back on any encoding detected by mbstring.

Commits
-------

6393e29 [Validator] Fix charset encoding detection in `CharsetValidator`
@fabpot fabpot mentioned this pull request May 2, 2024
@zerkms
Copy link
Contributor

zerkms commented Sep 10, 2024

Why the constraint is named Charset when it validates the encoding? Why it's not Encoding?

It even uses the mb_detect_encoding internally, which is named ..._encoding not ..._charset :-D

@OskarStark
Copy link
Contributor

Can you please open a new issue to discuss this before the 7.2 release? Thanks

@derrabus
Copy link
Member

@OskarStark This feature has been shipped with 7.1 already.

@OskarStark
Copy link
Contributor

Indeed, time flies 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.