-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy #26532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments to start the review.
My most important comment is that this should be merged as a new feature to me, thus rebased against master.
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/Tests/HttpCache/ResponseCacheStrategyTest.php
Outdated
Show resolved
Hide resolved
If you have time @mpdude, I would love to get your review on this one. |
I'd really like to help here! It's a bit difficult, though, as the description does not directly make clear what we're after. After reading the linked issues my understanding is that we want to enable the Before discussing the implementation, can we agree on what the merge policy needs to be? I have reviewed the existing tests and think that the merge rules could be as follows.
Does that make sense? |
I still don't understand why Symfony generally adds a
You're somewhat right with my incorrect interpretation. I wasn't very sure about revalidation, so I looked it up: https://stackoverflow.com/questions/18148884/difference-between-no-cache-and-must-revalidate#19938619 |
Any progress here? The current implementation breaks client cache headers when working with ESI fragments :( |
friendly ping @aschempp |
I have updated the PR to fix some mentioned coding style and implemented point 1 in my comment. I know there's more, but I think we should discuss the general idea before I finish the implementation. However, most of the notes in my thoughts in the last comment still stand, so I guess we should also ping @mpdude 😁 Before we discuss the fine details of this though, we should agree or even write down our thoughts on the supposed default behavior. Symfony always adds cache control headers which I consider strange. Especially the second if-condition means I can have an According the the RFC, not having a |
Is there anything I can do to push this forward? |
@aschempp would you mind rebasing please? |
5f3f951
to
b98e25a
Compare
Rebase is done. The change from aschempp@3d27b59#diff-e5a339b48cec0fa0a4d0c5c4c705f568R75 has been dropped, but I preserved the unit test and they are still passing. |
I just looked at the disabled unit tests and will follow up with a few changes shortly. |
Sorry for taking a while. I tried to implement merging of One thing I'm not sure about is the default Response behavior. If there are not First of all, |
I have removed two assertions in the HttpCacheTest to fix tests. I'm very well aware that removing assertions to fix a test is not really a practical solution. However, as explained in #26245 (comment), I think the assertions (and Symfony behavior) were wrong in the first place … |
src/Symfony/Component/HttpKernel/Tests/HttpCache/HttpCacheTest.php
Outdated
Show resolved
Hide resolved
Any follow-up questions for this PR I need to address? |
(could you please rebase on latest 3.4?) |
For me, this PR is too confusing to give meaningful feedback. Can we somehow simplify this or break it into smaller pieces we could discuss independently? |
@mpdude did you look at the diff or at the new version of the file directly? What can I do to simplify the changes? I think they are all related, there are no "individual features" that could be split into smaller pieces. @nicolas-grekas I will rebase as soon as any questions are resolved. Previously all changes in the old file were irrelevant because it is basically a complete rewrite of the class. |
Is there anything I can do to get this merged at some point? |
@aschempp Can you rebase this pull request? We cannot merge a pull request with a merge commit. Thank you. |
11a3699
to
77ae8bc
Compare
} elseif (null !== $maxAge = min($this->maxAges)) { | ||
$response->setSharedMaxAge($maxAge); | ||
$response->headers->set('Age', $maxAge - min($this->ttls)); | ||
// Last-Modified and Etag headers cannot be merged, they render the response uncacheable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment does not seem to correspond to the following code, as the following deals with responses not having Last-Modified and Etag
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe formulate it as "If an embedded response uses Last-Modified or an Etag, the combined response is not cacheable."
hm, but the code is saying that if we have a success status AND no last modified / etag, it certainly is cacheable.
should we instead say that if there is either etag or last-modified, we return true?
and to simplify reasoning in this method, i suggest flipping this method over to keepsFinalResponseCacheable
and switch true / false. and invert the bool where we call the method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, the method name was a lengthy discussion and finally was @nicolas-grekas's suggestion.
This step determines that according to RFC, a response is always cacheable if it has one of the given response codes. Not having any cache-control information does not make a response uncacheable, it just does not tell a cache what to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok with the method name. i prefer "positive" naming over negations, but the field is also done that way round so it could add confusion.
to make this robust and easier to understand, how about saying that as soon as there is etag or last-modified, we return true. and move the status check code to the very end. instead of return true
, return !\in_array($response->getStatusCode(), array(...));
. that would be more explicit i think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would not be the same!
- Your suggestion means that a response with either
ETag
orLast-Modified
willMakeFinalResponseUncacheable. That is not what I (tried to) implement - The final response is uncache if
- ( it is not a given status code OR it has
ETag
orLast-Modified
) - AND it does not have any other caching info (like
max-age
).
- ( it is not a given status code OR it has
The implementation just works the opposite way.
- If it has a given status code and none of the headers, we already know it will not
…MakeFinalResponseUncacheable - If either it has a different status code OR it has one of the headers, we must also check for Cache-Control headers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, this is not the same. hm, so if status is 200 and we have no etag/last-modified, we say its ok to cache, even if max-age is set to 0? ah, but then we would still mark the final result with max-age: 0, and same goes for private.
okay, then i agree this is correct, though somewhat counterintuitive. can you mention this in the phpdoc, that cache-control instructions are handled elsewhere?
also, if the fragment has no cache-control header but the master response has max-age: 1 day, would we end up with caching the combined response for a whole day? or should we assume a default max-age when caching something with status 200 and no cache-control instruction? varnish takes 2 minutes in that case, by default...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, so if status is 200 and we have no etag/last-modified, we say its ok to cache, even if max-age is set to 0?
No, we don't say its ok to cache. We just determine that this response does not make the final response uncacheable. The final response can still have no useful caching info and therefore be not cacheable.
if the fragment has no cache-control header but the master response has max-age: 1 day, would we end up with caching the combined response for a whole day?
There is no difference between ESI fragments and the master response. They are all sent to the add
method, and the result of ALL responses (regardless of their type) is added in the update
method. So if any - regardless if master or fragment - has no mergeable data, nothing will be added to the final response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah right, this happens here: https://github.com/symfony/symfony/pull/26532/files#diff-e5a339b48cec0fa0a4d0c5c4c705f568R212 - if one response has no max-age we set that info to false
.
so this should indeed be fine. maybe put part of this explanation into the phpdoc?
We just determine that this response does not make the final response uncacheable. The final response can still have no useful caching info and therefore be not cacheable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you port the gist of this discussion into the comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think its important that we improve things here.
it is quite confusing to reason about these things, as interactions are quite complicated with the many directives. i think we need to tweak the code to be easier to understand to be more confident we get it right this time.
could you have another iteration to improve the readability of this code?
} elseif (null !== $maxAge = min($this->maxAges)) { | ||
$response->setSharedMaxAge($maxAge); | ||
$response->headers->set('Age', $maxAge - min($this->ttls)); | ||
// Last-Modified and Etag headers cannot be merged, they render the response uncacheable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe formulate it as "If an embedded response uses Last-Modified or an Etag, the combined response is not cacheable."
hm, but the code is saying that if we have a success status AND no last modified / etag, it certainly is cacheable.
should we instead say that if there is either etag or last-modified, we return true?
and to simplify reasoning in this method, i suggest flipping this method over to keepsFinalResponseCacheable
and switch true / false. and invert the bool where we call the method.
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
* we have to subtract the age so that the value is normalized for an age of 0. | ||
* | ||
* If the value is lower than the currently stored value, we update the value, to keep a rolling | ||
* minimal value of each instruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets also mention that this method is always called and if there is no setting, the default value is null, and we won't set the information on the final response if it was not present on one of the responses.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
formulation suggestion:
This method is always called, even for responses that do not have the respective instruction in
their `Cache-Control` header (or have no `Cache-Control` header at all). If any response does
not have an instruction, we do not set that instruction on the final response.
src/Symfony/Component/HttpKernel/HttpCache/ResponseCacheStrategy.php
Outdated
Show resolved
Hide resolved
What's the status here? |
This PR is still ready to me. Be aware that it currently points to 3.4 as a bugfix, so I'm not sure about short arrays? I have finished everything requested at the SymfonyCon, but we haven't made any more progress since. I'm not sure who's in charge of a final decision on what needs to be completed and actually merge this? |
Short arrays have been applied to 3.4 also, that's why a rebase is needed :) |
8fd595e
to
76adb4a
Compare
Rebased now and updated to short arrays using the php-cs-fixer |
Thanks! What about unanswered comments from David? |
You mean this one? #26532 (comment) |
the open comments are all about improving the phpdoc or doc comments. basically for the places where i misunderstood the code before, because i think the explanations given in the discussion here and in person in lisbon would merit to be in the code file for future reference. some of the interactions are quite intricate (i don't see a way to improve that) and therefore need to be well documented so we still remember why things are as they are in a couple of years. |
Any suggestions maybe? :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to trust you on this one :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation is the one we discussed during the SymfonyCon hackday, solving lots of weird cases. Great to see it finally validated 😄
5a32de9
to
893118f
Compare
Thank you @aschempp. |
…he/ResponseCacheStrategy (aschempp) This PR was squashed before being merged into the 3.4 branch (closes #26532). Discussion ---------- [HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy | Q | A | ------------- | --- | Branch? | 3.4 | Bug fix? | yes | New feature? | no | BC breaks? | no | Deprecations? | no | Tests pass? | yes | Fixed tickets | #26245, #26352, #28872 | License | MIT | Doc PR | - This PR is a first draft to fix the incorrect merging of private and other cache-related headers that are not meant for the shared cache but the browser (see mentioned issues). The existing implementation of `HttpFoundation\Response` is very much tailored to the `HttpCache`, for example `isCacheable` returns `false` if the response is `private`, which is not true for a browser cache. That is why my implementation does not longer use much of the response methods. They are however still used by the `HttpCache` and we should keep them as-is. FYI, the `ResponseCacheStrategy` does **not** affect the stored data of `HttpCache` but is only applied to the result of multiple merged subrequests/ESI responses. I did read up a lot on RFC2616 as a reference. [Section 13.4](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.4) gives an overall view of when a response MAY be cached. [Section 14.9.1](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1) has more insight into the `Cache-Control` directives. Here's a summary of the relevant information I applied to the implementation: - > Unless specifically constrained by a cache-control (section 14.9) directive, a caching system MAY always store a successful response (see section 13.8) as a cache entry, MAY return it without validation if it is fresh, and MAY return it after successful validation. A response without cache control headers is totally fine, and it's up to the cache (shared or private) to decide what to do with it. That is why the implementation does not longer set `no-cache` if no `Cache-Control` headers are present. - > A response received with a status code of 200, 203, 206, 300, 301 or 410 MAY be stored […] unless a cache-control directive prohibits caching. > A response received with any other status code (e.g. status codes 302 and 307) MUST NOT be returned […] unless there are cache-control directives or another header(s) that explicitly allow it. This is what `ResponseCacheStrategy::isUncacheable` implements to decide whether a response is not cacheable at all. It differs from `Response::isCacheable` which only returns true if there are actual `Cache-Control` headers. - > [Section 13.2.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.2.3): When a response is generated from a cache entry, the cache MUST include a single Age header field in the response with a value equal to the cache entry's current_age. That's why the implementation **always** adds the `Age` header. It takes the oldest age of any of the responses as common denominator for the content. - > [Section 14.9.3](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.3): If a response includes an s-maxage directive, then for a shared cache (but not for a private cache), the maximum age specified by this directive overrides the maximum age specified by either the max-age directive or the Expires header. This effectively means that `max-age`, `s-maxage` and `Expires` must all be kept on the response. My implementation assumes that we can only do that if they exist in **all** of the responses, and then takes the lowest value of any of them. Be aware the implementation might look confusing at first. Due to the fact that the `Age` header might come from another subresponse than the lowest expiration value, the values are stored relative to the current response date and then re-calculated based on the age header. The Symfony implementation did not and still does not implement the full RFC. As an example, some of the `Cache-Control` headers (like `private` and `no-cache`) MAY actually have a string value, but the implementation only supports boolean. Also, [Custom `Cache-Control` headers](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.6) are currently not merged into the final response. **ToDo/Questions:** 1. [Section 13.5.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html#sec13.5.2) specifies that we must add a [`Warning 214 Transformation applied`](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.46) if we modify the response headers. 2. Should we add an `Expires` headers based on `max-age` if none is explicitly set in the responses? This would essentially provide the same information as `max-age` but with support for HTTP/1.0 proxies/clients. 3. I'm not sure about the implemented handling of the `private` directive. The directive is currently only added to the final response if it is present in all of the subresponses. This can effectively result in no cache-control directive, which does not tell a shared cache that the response must not be cached. However, adding a `private` might also tell a browser to actually cache it, even though non of the other responses asked for that. 4. > [Section 14.9.2](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.2): The purpose of the `no-store` directive is to prevent the inadvertent release or retention of sensitive information […]. The `no-store` directive applies to the entire message, and MAY be sent either in a response or in a request. If sent in a request, a cache MUST NOT store any part of either this request or any response to it. If sent in a response, a cache MUST NOT store any part of either this response or the request that elicited it. I have not (yet) validated whether the `HttpCache` implementation respects any of this. 5. As far as I understand, the current implementation of [`ResponseHeaderBag::computeCacheControlValue`](https://github.com/symfony/symfony/blob/master/src/Symfony/Component/HttpFoundation/ResponseHeaderBag.php#L313) is incorrect. `no-cache` means a response [must not be cached by a shared or private cache](https://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.9.1), which overrides `private` automatically. 5. The unit tests are still very limited and I want to add plenty more to test and sort-of describe the implementation or assumptions on the RFC. /cc @nicolas-grekas #SymfonyConHackday2018 Commits ------- 893118f [HttpKernel] Correctly merging cache directives in HttpCache/ResponseCacheStrategy
Great job @aschempp and all others involved! |
This PR is a first draft to fix the incorrect merging of private and other cache-related headers that are not meant for the shared cache but the browser (see mentioned issues).
The existing implementation of
HttpFoundation\Response
is very much tailored to theHttpCache
, for exampleisCacheable
returnsfalse
if the response isprivate
, which is not true for a browser cache. That is why my implementation does not longer use much of the response methods. They are however still used by theHttpCache
and we should keep them as-is. FYI, theResponseCacheStrategy
does not affect the stored data ofHttpCache
but is only applied to the result of multiple merged subrequests/ESI responses.I did read up a lot on RFC2616 as a reference. Section 13.4 gives an overall view of when a response MAY be cached. Section 14.9.1 has more insight into the
Cache-Control
directives.Here's a summary of the relevant information I applied to the implementation:
A response without cache control headers is totally fine, and it's up to the cache (shared or private) to decide what to do with it. That is why the implementation does not longer set
no-cache
if noCache-Control
headers are present.This is what
ResponseCacheStrategy::isUncacheable
implements to decide whether a response is not cacheable at all. It differs fromResponse::isCacheable
which only returns true if there are actualCache-Control
headers.That's why the implementation always adds the
Age
header. It takes the oldest age of any of the responses as common denominator for the content.This effectively means that
max-age
,s-maxage
andExpires
must all be kept on the response. My implementation assumes that we can only do that if they exist in all of the responses, and then takes the lowest value of any of them. Be aware the implementation might look confusing at first. Due to the fact that theAge
header might come from another subresponse than the lowest expiration value, the values are stored relative to the current response date and then re-calculated based on the age header.The Symfony implementation did not and still does not implement the full RFC. As an example, some of the
Cache-Control
headers (likeprivate
andno-cache
) MAY actually have a string value, but the implementation only supports boolean. Also, CustomCache-Control
headers are currently not merged into the final response.ToDo/Questions:
Section 13.5.2 specifies that we must add a
Warning 214 Transformation applied
if we modify the response headers.Should we add an
Expires
headers based onmax-age
if none is explicitly set in the responses? This would essentially provide the same information asmax-age
but with support for HTTP/1.0 proxies/clients.I'm not sure about the implemented handling of the
private
directive. The directive is currently only added to the final response if it is present in all of the subresponses. This can effectively result in no cache-control directive, which does not tell a shared cache that the response must not be cached. However, adding aprivate
might also tell a browser to actually cache it, even though non of the other responses asked for that.I have not (yet) validated whether the
HttpCache
implementation respects any of this.As far as I understand, the current implementation of
ResponseHeaderBag::computeCacheControlValue
is incorrect.no-cache
means a response must not be cached by a shared or private cache, which overridesprivate
automatically.The unit tests are still very limited and I want to add plenty more to test and sort-of describe the implementation or assumptions on the RFC.
/cc @nicolas-grekas
#SymfonyConHackday2018