[Yaml] dump non UTF-8 encoded strings as binary data#18294
Conversation
xabbuh
commented
Mar 24, 2016
| Q | A |
|---|---|
| Branch? | master |
| Bug fix? | yes |
| New feature? | no |
| BC breaks? | no |
| Deprecations? | yes |
| Tests pass? | yes |
| Fixed tickets | #18241 |
| License | MIT |
| Doc PR |
c96c24e to
3334856
Compare
| } | ||
|
|
||
| if (self::isBinaryString($value)) { | ||
| @trigger_error('Dumping non UTF-8 data without passing the DUMP_BASE64_BINARY_DATA flag is deprecated since Symfony 3.1 and will be removed in 4.0.', E_USER_DEPRECATED); |
There was a problem hiding this comment.
I'm wondering if Yaml::DUMP_BASE64_BINARY_DATA should not be set by default in 4.0 and the option deprecated. that way, the YAML dumper always behaves as expected without having to change some options which seems weird.
There was a problem hiding this comment.
I think that's a good idea. I updated the deprecation message to reflect that.
|
Can we also deprecate |
|
Imo that does not make much sense. We will introduce the flag in 3.1 and people will need it to be able to dump binary. Deprecating it in the same instance would be weird for users of the dumper. And we cannot dump them as binary strings by default as that would be a BC break, wouldn't it? |
|
I forgot that this was introduced in 3.1. So, I propose to drop it altogether and always do the right thing by default. As Symfony YAML load is able to deal with binaries, I don't see any BC break. |
f7e2126 to
cb52d8c
Compare
|
Fair enough. I removed the flag, fixed the tests, and updated the changelog. |
|
👍 |
| private static function isBinaryString($value) | ||
| { | ||
| return preg_match('/[^\x09-\x0d\x20-\xff]/', $value); | ||
| return !preg_match('//u', $value); |
There was a problem hiding this comment.
maybe we should include also strings that contains control chars (with a few excluded line CR, LF, TAB)?
There was a problem hiding this comment.
In similar cases, I usually use:
return !preg_match('//u', $value) || preg_match('/[\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]/', $value);
There was a problem hiding this comment.
return !preg_match('//u', $value) || preg_match('/[^\x09-\x0d\x20-\xff]/', $value);Do you have something like this in mind?
There was a problem hiding this comment.
the above regexp is better I think (not invented here :) )
There was a problem hiding this comment.
Oh sorry, I missed your second comment. The previous regex was borrowed from PHPUnit. But if you have good experience with the other one, I am fine with using it.
|
@nicolas-grekas I updated the check. |
|
Thank you @xabbuh. |
…xabbuh) This PR was merged into the 3.1-dev branch. Discussion ---------- [Yaml] dump non UTF-8 encoded strings as binary data | Q | A | ------------- | --- | Branch? | master | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | yes | Tests pass? | yes | Fixed tickets | #18241 | License | MIT | Doc PR | Commits ------- 86e4a6f dump non UTF-8 encoded strings as binary data
|
This breaks a feature in Drupal 8's configuration management that relies on Symfony's YAML to dump configuration. One of the main advantages of YAML is that it is human readable. We were encoding multiple plural forms like so: We've hit this whilst trying to move Drupal 8 to Symfony 3.2 |
|
do you really rely on non-unicode chars to split the plurals ? |
|
@alexpott Thanks for reporting this. The thing is that |