Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Yaml] dump non UTF-8 encoded strings as binary data #18294

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 30, 2016

Conversation

xabbuh
Copy link
Member

@xabbuh xabbuh commented Mar 24, 2016

Q A
Branch? master
Bug fix? yes
New feature? no
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets #18241
License MIT
Doc PR

@xabbuh xabbuh added the Yaml label Mar 24, 2016
@xabbuh xabbuh force-pushed the issue-18241 branch 2 times, most recently from c96c24e to 3334856 Compare March 24, 2016 13:33
}

if (self::isBinaryString($value)) {
@trigger_error('Dumping non UTF-8 data without passing the DUMP_BASE64_BINARY_DATA flag is deprecated since Symfony 3.1 and will be removed in 4.0.', E_USER_DEPRECATED);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if Yaml::DUMP_BASE64_BINARY_DATA should not be set by default in 4.0 and the option deprecated. that way, the YAML dumper always behaves as expected without having to change some options which seems weird.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that's a good idea. I updated the deprecation message to reflect that.

@fabpot
Copy link
Member

fabpot commented Mar 29, 2016

Can we also deprecate DUMP_BASE64_BINARY_DATA here as well?

@xabbuh
Copy link
Member Author

xabbuh commented Mar 29, 2016

Imo that does not make much sense. We will introduce the flag in 3.1 and people will need it to be able to dump binary. Deprecating it in the same instance would be weird for users of the dumper. And we cannot dump them as binary strings by default as that would be a BC break, wouldn't it?

@fabpot
Copy link
Member

fabpot commented Mar 29, 2016

I forgot that this was introduced in 3.1. So, I propose to drop it altogether and always do the right thing by default. As Symfony YAML load is able to deal with binaries, I don't see any BC break.

@xabbuh xabbuh changed the title [Yaml] deprecate dumping non UTF-8 strings [Yaml] dump non UTF-8 encoded strings as binary data Mar 29, 2016
@xabbuh xabbuh force-pushed the issue-18241 branch 3 times, most recently from f7e2126 to cb52d8c Compare March 29, 2016 20:21
@xabbuh
Copy link
Member Author

xabbuh commented Mar 29, 2016

Fair enough. I removed the flag, fixed the tests, and updated the changelog.

@fabpot
Copy link
Member

fabpot commented Mar 30, 2016

👍

@@ -627,7 +627,7 @@ public static function evaluateBinaryScalar($scalar)

private static function isBinaryString($value)
{
return preg_match('/[^\x09-\x0d\x20-\xff]/', $value);
return !preg_match('//u', $value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should include also strings that contains control chars (with a few excluded line CR, LF, TAB)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In similar cases, I usually use:
return !preg_match('//u', $value) || preg_match('/[\x00-\x08\x0B\x0E-\x1A\x1C-\x1F\x7F]/', $value);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return !preg_match('//u', $value) || preg_match('/[^\x09-\x0d\x20-\xff]/', $value);

Do you have something like this in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the above regexp is better I think (not invented here :) )

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I missed your second comment. The previous regex was borrowed from PHPUnit. But if you have good experience with the other one, I am fine with using it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for phpunit's one

@xabbuh
Copy link
Member Author

xabbuh commented Mar 30, 2016

@nicolas-grekas I updated the check.

@fabpot
Copy link
Member

fabpot commented Mar 30, 2016

Thank you @xabbuh.

@fabpot fabpot merged commit 86e4a6f into symfony:master Mar 30, 2016
fabpot added a commit that referenced this pull request Mar 30, 2016
…xabbuh)

This PR was merged into the 3.1-dev branch.

Discussion
----------

[Yaml] dump non UTF-8 encoded strings as binary data

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | yes
| New feature?  | no
| BC breaks?    | no
| Deprecations? | yes
| Tests pass?   | yes
| Fixed tickets | #18241
| License       | MIT
| Doc PR        |

Commits
-------

86e4a6f dump non UTF-8 encoded strings as binary data
@xabbuh xabbuh deleted the issue-18241 branch March 30, 2016 14:47
@fabpot fabpot mentioned this pull request May 13, 2016
@alexpott
Copy link
Contributor

This breaks a feature in Drupal 8's configuration management that relies on Symfony's YAML to dump configuration. One of the main advantages of YAML is that it is human readable. We were encoding multiple plural forms like so:
format_plural_string: "1 place\x03@count places"
This becomes:
format_plural_string: !!binary MSBwbGFjZQNAY291bnQgcGxhY2Vz

We've hit this whilst trying to move Drupal 8 to Symfony 3.2

@stof
Copy link
Member

stof commented Nov 24, 2016

do you really rely on non-unicode chars to split the plurals ?

@xabbuh
Copy link
Member Author

xabbuh commented Nov 24, 2016

@alexpott Thanks for reporting this. The thing is that \x03 is no valid unicode character. So the fact that this somehow worked before was more by accident and actually resulted in invalid YAML files (http://yaml-online-parser.appspot.com/, for example, will give you an error like unacceptable character #x0003: special characters are not allowed in "<unicode string>", position 30).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants