Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[HttpFoundation] add HeaderUtils::parseQuery(): it does the same as parse_str() but preserves dots in variable names #37272

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 24, 2020

Conversation

nicolas-grekas
Copy link
Member

Q A
Branch? master
Bug fix? no
New feature? yes
Deprecations? no
Tickets -
License MIT
Doc PR -

Inspired by symfony/psr-http-message-bridge#80
/cc @drupol

Related to #9009, #29664, #26220 but also api-platform/core#509 and https://www.drupal.org/project/drupal/issues/2984272
/cc @dunglas @alexpott

@drupol
Copy link
Contributor

drupol commented Jun 13, 2020

Ho nice !

We should add a test with spaces in the key parameter.

return [
['a=b&c=d'],
['a.b=c'],
['a+b=c'],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@drupol this is the test case with a space in the name

Copy link
Contributor

@drupol drupol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm glad I inspired you for this feature, I couldn't agree more with it.

It will help us at European Commission because we are using API Platform and CAS authentication.
Both are relying on query parameters and they cannot be changed during the request, this is reason why a middleware symfony/psr-http-message-bridge was used, so we could override it with loophp/unaltered-psr-http-message-bridge-bundle.

Once this feature will be merged, I guess it will help a bunch of people and remove some hacks here and there.

However, I really hope this behavior will be updated in PHP 8, I couldn't believe it was working as such when I discovered this issue. It's crazy!

I also tested the PR here: https://3v4l.org/VRCoh

The example compare the behavior of \parse_str() and the custom HeaderUtils::queryParseString() you're introducing.

I did this example because I wanted to check why you were testing the a\0b binary string, I guess this would unlikely happens in real world right ?

I'm also wondering if it wouldn't be easier to parse and convert in every key parameters with \bin2hex(), and then let \parse_str() do its job, and then convert back. It would be also maybe faster than going to all those if conditions ? WDYT?

Also, do you think such a feature will be backported in Symfony < 5.2 ?

@nicolas-grekas
Copy link
Member Author

the a\0b binary string, I guess this would unlikely happens in real world right ?

Yes, that's a very unlikely edge case of parse_str() that I duplicated for parity.

parse and convert in every key parameters with \bin2hex(), and then let \parse_str() do its job, and then convert back. It would be also maybe faster than going to all those if conditions ? WDYT?

I'm not sure what you mean sorry. About perf, those if are going to be very fast, much faster than using a regexp. But please prove me wrong if you want to give your idea a try!

Also, do you think such a feature will be backported in Symfony < 5.2 ?

I don't think it should, that's a new feature.

One should note also that I didn't change the createFromGlobals method. This means that dots will still be replaced by default. This is important to preserve BC. But at least ppl that care now have a helper to opt-in for the fixed parser.

@gmponos
Copy link
Contributor

gmponos commented Jun 14, 2020

Would you be interested into making this function work also with a query string like this?

https://3v4l.org/86Tu5 and return an array containing 'test' => ['what', 'what2']

Since this is a valid URL http://localhost/myendpoint?test=what&test=what2

Sorry if it already does that and I missed it.

related also with my comment here

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 14, 2020

@gmponos I've been wondering about this also. This would be a too big API change when dealing with query/cookie/etc bags. But I wish we'll find a way to get there eventually yes (getting rid of [] to access to multiple submitted values.)

@dunglas
Copy link
Member

dunglas commented Jun 14, 2020

But I whish we'll find a way to get there eventually yes

The Go API is nice to handle these cases: https://golang.org/pkg/net/url/#URL.Query
At a glance, we should always return an array of arrays:

["test" => ["what", "what2"]]
["foo" => ["bar"]]

Maybe could this set of features be included in the proposed URI component (#36999)?

@drupol
Copy link
Contributor

drupol commented Jun 14, 2020

Would you be interested into making this function work also with a query string like this?

https://3v4l.org/86Tu5 and return an array containing 'test' => ['what', 'what2']

Since this is a valid URL http://localhost/myendpoint?test=what&test=what2

Sorry if it already does that and I missed it.

related also with my comment here

I'm about to make a regex that does it all, I will submit my snippet tonight probably.

This is the Father's day today and I might be away from computer...

@nicolas-grekas nicolas-grekas changed the title [HttpFoundation] add HeaderUtils::parseQueryString(): it does the same as parse_str() but preserves dots in variable names [HttpFoundation] add HeaderUtils::parseQuery(): it does the same as parse_str() but preserves dots in variable names Jun 14, 2020
@nicolas-grekas
Copy link
Member Author

At a glance, we should always return an array of arrays:

I'm not sure how this should play with the [] syntax. Eg how should a[b]=c&a[b]=d be parsed? There are several possible answers. This means this is not for this PR at least :) votes welcome.

@dunglas
Copy link
Member

dunglas commented Jun 14, 2020

The [] syntax is a PHP oddity, which isn't covered by any RFC. In "standard mode" (the component), I suggest to not support it. We could maybe also propose a PHP mode that just delegates to parse_str() and all is oddities (dot replacement, array syntax, etc).

@nicolas-grekas
Copy link
Member Author

The [] syntax is a PHP oddity, which isn't covered by any RFC

For sure - also no RFC should cover what happens server-side.

But we cannot migrate away from it without a serious FC/BC plan.
E.g. the form component relies on this syntax.

@dunglas
Copy link
Member

dunglas commented Jun 14, 2020

The PHP mode could be the default (so it's 100% BC). The "standard-compliant" mode is useful only for a small subset of use cases after all (but for instance both Mercure and Vulcain, as well a many other I-D and RFCs use repeated query parameters without the [] suffix, this currently requires custom code to parse such URLs with PHP/Symfony).

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jun 14, 2020

Would this make sense to you? Then it would be up to users to use this function when they create the request object:

--- a/src/Symfony/Component/HttpFoundation/HeaderUtils.php
+++ b/src/Symfony/Component/HttpFoundation/HeaderUtils.php
@@ -196,7 +196,7 @@ class HeaderUtils
     /**
      * Like parse_str(), but preserves dots in variable names.
      */
-    public static function parseQuery(string $query, string $separator = '&'): array
+    public static function parseQuery(string $query, bool $phpMode = true, string $separator = '&'): array
     {
         $q = [];
 
@@ -217,6 +217,12 @@ class HeaderUtils
                 $k = substr($k, 0, $i);
             }
 
+            if (!$phpMode) {
+                $q[$k][] = urldecode(substr($v, 1));
+
+                continue;
+            }
+
             $k = ltrim($k, ' ');
 
             if (false === $i = strpos($k, '[')) {
@@ -226,6 +232,10 @@ class HeaderUtils
             }
         }
 
+        if (!$phpMode) {
+            return $q;
+        }
+
         parse_str(implode('&', $q), $q);
 
         $query = [];

@dunglas
Copy link
Member

dunglas commented Jun 14, 2020

LGTM

@drupol
Copy link
Contributor

drupol commented Jun 14, 2020

Here I am,

I just made a small example with a preg_replace_callback(): https://3v4l.org/d0W4Z

Basically, the regex will convert the relevant part of the keys to hexadecimal, then let parse_str() do its job, then convert back.

There is only one test which produces (a\0b=c)a different result, but I guess we will never have this in real life and maybe we can decide all together to skip it if you prefer. Let me know what you think.

@drupol
Copy link
Contributor

drupol commented Jun 14, 2020

@drupol the code in https://3v4l.org/d0W4Z won't parse correctly the a%5Bb%5D=c query string + it's going to be way slower.

Ok I fixed it... https://3v4l.org/Pam6Z

But ok if it's slower, you decide :-)

@nicolas-grekas
Copy link
Member Author

urldecode($keyValue)

This line will lead to double decoding of the value (once here + twice done by parse_str()) - the implem in symfony/psr-http-message-bridge#80 has the same issue btw.

@drupol
Copy link
Contributor

drupol commented Jun 14, 2020

urldecode($keyValue)

This line will lead to double decoding of the value (once here + twice done by parse_str()) - the implem in symfony/psr-http-message-bridge#80 has the same issue btw.

Well done, I updated it: https://3v4l.org/p7uPM

Copy link
Contributor

@drupol drupol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@simonberger
Copy link
Contributor

simonberger commented Jun 15, 2020

This change heads in a really good direction. Thank you. I have some random comments and questions.

  • Does $ignoreBrackets=true include the leading whitespace on purpose and why?
  • Most likely known but not documented - parseQuery preserves also subsequent whitespace beside dots while parseStr replaces them with _
  • What about naming $ignoreBrackets more to its achievement? I would have troubles to get an idea of its job without studying the documentation. Some deliberately (too) extreme name examples: groupAllParams, alwaysGroup, groupByParameterNames, groupWithoutBrackets alwaysGroupAndIgnoreBrackets.

/**
* Like parse_str(), but preserves dots in variable names.
*/
public static function parseQuery(string $query, bool $ignoreBrackets = false, string $separator = '&'): array
Copy link
Member

@dunglas dunglas Jun 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we just use the native parse_str() if $ignoreBracket is set to false? And so rename $ignoreBracketsto something else such as$phpCompat = true` or something like that?

Many PHP libraries rely on the fact that dot are replaced by underscores (CGI-like) and we may introduce issues by changing this. I would prefer to have a pure PHP mode (with all the oddities, including the dots replacement etc), and a "strict" mode doing nothing more than the URL class in JS for instance. It feels wrong to me to have some intermediary modes such as "replace the dots but ignore brackets".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I'm missing your logic. To me, it feels wrong to parse dots (replacing them with _) but ignore brackets.
The purpose of the method is to give you access to the original names and have a toggle to accept multiple values with or without using [].
Libs that expect keys mangled by php should be given arrays mangled by php, ie you wouldn't use this function with them.

@drupol
Copy link
Contributor

drupol commented Jun 19, 2020

I noticed that Firefox and Chrome behaves differently when it comes to parse the string a%00b=c.

Try this javascript code:

    const queryString = new URLSearchParams('?a%00b=c');
    for (const [key, value] of queryString) {
        console.log('key =', key, 'value = ', value);
    }

Firefox:
image

Chrome:
image

Shouldn't we mimic that behavior here?

We could also check WHATWG doc about this ?

And we could also check what the webplatform test suite cover those cases ?

@nicolas-grekas
Copy link
Member Author

@drupol there is no use case for the null char in URLs. I'm not even sure it's legal from an RFC pov. php-src uses C-strings internally in parse_str(), that's why null chars cut the string. I think we should stick to the behavior of php for this one.

@nicolas-grekas
Copy link
Member Author

Now rebased on top of #37271, ready.

@nicolas-grekas nicolas-grekas force-pushed the hf-parse-qs branch 2 times, most recently from 9f6ac40 to 0c00d95 Compare June 23, 2020 08:48
… `parse_str()` but preserves dots in variable names
@nicolas-grekas
Copy link
Member Author

@simonberger I missed you comment:

Does $ignoreBrackets=true include the leading whitespace on purpose and why?

fixed, trimming now happens unconditionally.

Most likely known but not documented - parseQuery preserves also subsequent whitespace beside dots while parseStr replaces them with _

true, but nobody is using spaces in var names so I skipped advertising this.

What about naming $ignoreBrackets more to its achievement? I would have troubles to get an idea of its job without studying the documentation. Some deliberately (too) extreme name examples: groupAllParams, alwaysGroup, groupByParameterNames, groupWithoutBrackets alwaysGroupAndIgnoreBrackets.

naming... :)
I'm sorry I'm not convinced by any of the current proposals...

@fabpot
Copy link
Member

fabpot commented Jun 24, 2020

Thank you @nicolas-grekas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants