-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Yaml] dump non UTF-8 encoded strings as binary data #18294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
xabbuh
commented
Mar 24, 2016
Q | A |
---|---|
Branch? | master |
Bug fix? | yes |
New feature? | no |
BC breaks? | no |
Deprecations? | yes |
Tests pass? | yes |
Fixed tickets | #18241 |
License | MIT |
Doc PR |
c96c24e
to
3334856
Compare
} | ||
|
||
if (self::isBinaryString($value)) { | ||
@trigger_error('Dumping non UTF-8 data without passing the DUMP_BASE64_BINARY_DATA flag is deprecated since Symfony 3.1 and will be removed in 4.0.', E_USER_DEPRECATED); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if Yaml::DUMP_BASE64_BINARY_DATA
should not be set by default in 4.0 and the option deprecated. that way, the YAML dumper always behaves as expected without having to change some options which seems weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's a good idea. I updated the deprecation message to reflect that.
Can we also deprecate |
Imo that does not make much sense. We will introduce the flag in 3.1 and people will need it to be able to dump binary. Deprecating it in the same instance would be weird for users of the dumper. And we cannot dump them as binary strings by default as that would be a BC break, wouldn't it? |
I forgot that this was introduced in 3.1. So, I propose to drop it altogether and always do the right thing by default. As Symfony YAML load is able to deal with binaries, I don't see any BC break. |
f7e2126
to
cb52d8c
Compare
Fair enough. I removed the flag, fixed the tests, and updated the changelog. |
👍 |
@@ -627,7 +627,7 @@ public static function evaluateBinaryScalar($scalar) | |||
|
|||
private static function isBinaryString($value) | |||
{ | |||
return preg_match('/[^\x09-\x0d\x20-\xff]/', $value); | |||
return !preg_match('//u', $value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should include also strings that contains control chars (with a few excluded line CR, LF, TAB)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In similar cases, I usually use:
return !preg_match('//u', $value) || preg_match('/[\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]/', $value);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return !preg_match('//u', $value) || preg_match('/[^\x09-\x0d\x20-\xff]/', $value);
Do you have something like this in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the above regexp is better I think (not invented here :) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I missed your second comment. The previous regex was borrowed from PHPUnit. But if you have good experience with the other one, I am fine with using it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for phpunit's one
@nicolas-grekas I updated the check. |
Thank you @xabbuh. |
…xabbuh) This PR was merged into the 3.1-dev branch. Discussion ---------- [Yaml] dump non UTF-8 encoded strings as binary data | Q | A | ------------- | --- | Branch? | master | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | yes | Tests pass? | yes | Fixed tickets | #18241 | License | MIT | Doc PR | Commits ------- 86e4a6f dump non UTF-8 encoded strings as binary data
This breaks a feature in Drupal 8's configuration management that relies on Symfony's YAML to dump configuration. One of the main advantages of YAML is that it is human readable. We were encoding multiple plural forms like so: We've hit this whilst trying to move Drupal 8 to Symfony 3.2 |
do you really rely on non-unicode chars to split the plurals ? |
@alexpott Thanks for reporting this. The thing is that |