Builtin cache invalidation system aka make API Platform fast as hell#952
Builtin cache invalidation system aka make API Platform fast as hell#952dunglas merged 1 commit intoapi-platform:masterfrom
Conversation
| * | ||
| * @param GetResponseForControllerResultEvent $event | ||
| */ | ||
| public function onKernelView(GetResponseForControllerResultEvent $event) |
There was a problem hiding this comment.
Probably useless because it is already handled for Doctrine and not reliable enough for non-Doctrine data providers.
| } | ||
| } | ||
|
|
||
| private function addResources(Request $request) |
There was a problem hiding this comment.
Probably useless because it is already handled for Doctrine and not reliable enough for non-Doctrine data providers.
|
This looks amazing, but does providing a solution coupled to specific cache solution as first class citizen a good idea? Maybe an intermediate solution build using standard Http headers would improve interoperability. |
|
Nice work @dunglas. This is a very similar strategy to what i've implemented previously.
|
| "doctrine/orm": "^2.5", | ||
| "doctrine/annotations": "^1.2", | ||
| "friendsofsymfony/user-bundle": "^2.0@dev", | ||
| "guzzlehttp/guzzle": "^6.0", |
There was a problem hiding this comment.
You should use php-http/httplug for better interop.
There was a problem hiding this comment.
Maybe when it will be a PSR. But for now it has nos advantage over using Guzzle 6 for this use case: it is harder to setup (requires to install an extra Symfony bundle), adds complexity and a (small) performance overhead.
Guzzle 6 is already an abstraction layer, with builtin curl and streams implementations. We only depend of the ClientInterface, it's easy to bridge other implementations (but I think it is not worth it).
As we support PHP 7 only, we don't target older libraries such as Guzzle 5.
There was a problem hiding this comment.
@dunglas IMHO, you should consider using buzz. And It's not a joke:
- Guzzle have so many version, and these versions are not compatible each others. So people that are using a version that is incompatible with guzzle 6 will be blocked.
- Buzz is very simple and works well
There was a problem hiding this comment.
AFAIK it doesn't support async requests... and it has not been updated since 2015.
There was a problem hiding this comment.
When guzzle guys will release the 7.0 version, you will be stuck with the 6.0 version. Http clients are a widely common dependency and so conflicts may appears. But ATM you could go with guzzle 6. Using httplug could be added later.
There was a problem hiding this comment.
I have a very strong (I think) argument for using HTTPlug: you do not get into hell like FOSHttpCache did (and still is) with their hard coupling to a specific Guzzle version.
|
@fbourigault there is an abstraction layer and and implementation (Varnish). Basically, anyone can add support for any cache provider supporting cache invalidation (it's a matter of implementing an interface). I plan to add support CloudFlare in core too. There is no standard (HTTP headers) for cache invalidation (only expiration is supported in RFCs) but I choose to use @bendavies regarding FosHttpCache, for 2 reasons:
It can work with any mechanism of authentification (it's why I've introduced a way to configure the 3/ To avoid collisions (if the a resource is tagged with |
By standard HTTP headers, I mean revalidation. But maybe in such case, caching is not efficient. |
|
There is no standard (HTTP headers) for cache invalidation (only
expiration is supported in RFCs) but I choose to use Cache-Tags because
it's the one used by CloudFlare.
Custom HTTP headers should be prepended by `X-`. CloudFlare is flaunting
the spec (bad).
If the resource varies depending of the Authorization header, just
set api_platform.vary to ['Content-Type', 'Authorization'] (you'll need to
tweak the Varnish config too).
It's not so simple. An authorization context needs to be provided and
hashed in Varnish. Blindly adding `Vary: Authorization` would kill hit
rates, if your intention is to keep a cache for each user, for example.
A key concept of REST is being explicit, a resource identified by an IRI
should not vary depending of the current logged in user (in this case, the
URL should not be the same).
False. A resource can vary by any header as declared in the Vary header.
…On Mon, 20 Feb 2017, 23:25 Fabien Bourigault, ***@***.***> wrote:
There is no standard (HTTP headers) for cache invalidation (only
expiration is supported in RFCs) but I choose to use Cache-Tags because
it's the one used by CloudFlare.
By standard HTTP headers, I mean revalidation. But maybe in such case,
caching is not efficient.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#952 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAhf63uGzERC5mdRPdJ0d2iStftzosjsks5rebBOgaJpZM4MGFp2>
.
|
|
To avoid collision (if the a resource is tagged with /foos/,/foos1 and
another with only /foos/2, sending a BAN request with /foos as parameter
will ban both, and it's a bug. Using md5 hashes prevent this problem. A
CRC32 hash should do the trick too, but md5 is usually faster on modern
servers.
Just use regex. Varnish has good regex support. Using a hash is a nightmare
when trying to debug cache tags. :)
…On 20 Feb 2017 21:32, "Kévin Dunglas" ***@***.***> wrote:
@fbourigault <https://github.com/fbourigault> there is an abstraction
layer and and implementation (Varnish). Basically, anyone can add support
for any cache provider supporting cache invalidation (it's a matter of
implementing an interface). I plan to add support CloudFlare in core too.
There is no standard (HTTP headers) for cache invalidation (only
expiration is supported in RFCs) but I choose to use Cache-Tags because
it's the one used by CloudFlare.
@bendavies <https://github.com/bendavies> regarding FosHttpCache, for 2
reasons:
- my implementation is a bit different (the md5 hash) and tight to the
concept of "resources" and "IRIs", not present in FosHttpCache
- the implementation is trivial (and it's possible to bridge it with
FosHttpCache if wanted), it will ease our maintenance process to not have a
dependency to a 3rd party library we don't maintain (our soft dependency to
FosUser is a pain to maintain...).
It can work with any mechanism of authentification (it's why I've
introduced a way to configure the Vary HTTP header). If the resource
varies depending of the Authorization header, just set api_platform.vary
to ['Content-Type', 'Authorization'] (you'll need to tweak the Varnish
config too). A key concept of REST is being explicit, a resource identified
by an IRI should not vary depending of the current logged in user (in this
case, the URL should not be the same).
3/ To avoid collision (if the a resource is tagged with /foos/,/foos1 and
another with only /foos/2, sending a BAN request with /foos as parameter
will ban both, and it's a bug. Using md5 hashes prevent this problem. A
CRC32 hash should do the trick too, but md5 is usually faster on modern
servers.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#952 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAhf6wbwV3BrEH6NhwLvb1gsngiwy9X4ks5reZXzgaJpZM4MGFp2>
.
|
|
I'll make the header configurable but I'll keep CloudFlare compatibility by default, it's an invaluable feature. By the way this header should not be exposed to the end client. Regarding Vary, it's exactly what I explain in my post 🙂 Regarding the hit rate, doing more advanced things like hashing cannot be automated on the API Platform side, it requires a custom development and it's easy to implement using the new vary option. |
|
Regex (at least simple regexes) doesn't fix the issue in my example. /foos will match /foos/*, not only the collection response. |
|
/^\/foos$/
…On Mon, 20 Feb 2017, 23:56 Kévin Dunglas, ***@***.***> wrote:
Regex (at least simple regexes) doesn't fix the issue in my example. /foos
will match /foos/*, not only the collection response.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#952 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAhf6xda_oN9xDzCcGI-KffMusAtGIHtks5rebe4gaJpZM4MGFp2>
.
|
|
But maybe a more complex regex can do the trick, I'll give it a try (I agree that plain tags are better than hashes for debug). |
|
The comma must be handled too. |
features/http_cache/tags.feature
Outdated
| """ | ||
| Then the response status code should be 200 | ||
| And the header "Cache-Tags" should not exist | ||
| And "/relation_embedders/1,/related_dummies/1,/third_levels/1" IRIs should be purged |
There was a problem hiding this comment.
A PUT may change the value of an attribute used to order a responses. You should invalidate collection too
features/http_cache/tags.feature
Outdated
| And I send a "DELETE" request to "/relation_embedders/1" | ||
| Then the response status code should be 204 | ||
| And the header "Cache-Tags" should not exist | ||
| And "/relation_embedders/1,/relation_embedders" IRIs should be purged |
There was a problem hiding this comment.
What's about related_dummies who was previously linked to this entity? They should be invalidated too
There was a problem hiding this comment.
They will be, because they are marked with /relation_embedders/1.
features/http_cache/tags.feature
Outdated
| """ | ||
| Then the response status code should be 201 | ||
| And the header "Cache-Tags" should not exist | ||
| And "/relation_embedders,/related_dummies,/third_levels,/relation_embedders/1,/related_dummies/1,/third_levels/1" IRIs should be purged |
There was a problem hiding this comment.
I don't understand in you code how you make the link between relation_embedders and third_levels? Who flaged it?
There was a problem hiding this comment.
It's because of this line: https://github.com/api-platform/core/pull/952/files#diff-0362bfae91549bcd08af0d06bd806b37R14
features/http_cache/tags.feature
Outdated
| Scenario: Tags must be set for items | ||
| When I send a "GET" request to "/relation_embedders/1" | ||
| Then the response status code should be 200 | ||
| And the header "Cache-Tags" should be equal to "aa9e2bee5be20590f7dcc520ce2dffca,12a0c94f947a680d68bd6f65e025457d,91774d67418192a057e25dae00345572" |
There was a problem hiding this comment.
md5 obfuscate tests... :(
I wonder if a dedicated context could be more readable: the cache-tag header should be equals to "/relation_embedders, ...."
There was a problem hiding this comment.
True, I'm working on a fix to remove the hash and use plain IRIs. I'll update the PR when it's ready.
| ->end() | ||
| ->booleanNode('public')->defaultNull()->info('To make all responses public by default.')->end() | ||
| ->booleanNode('enable_tags')->defaultFalse()->info('Add cache tags to the response.')->end() | ||
| ->scalarNode('varnish_url')->defaultNull()->info('URL of the Varnish server to purge using cache tags when a resource is updated.')->end() |
There was a problem hiding this comment.
Don't forget user with several varnish servers
2ac0f1e to
603fa12
Compare
|
Thank you for the reviews everyone!
|
ce8419e to
29c2b4f
Compare
soyuka
left a comment
There was a problem hiding this comment.
Would be nice to add unit tests to all those listeners!
| public function onFlush(OnFlushEventArgs $eventArgs) | ||
| { | ||
| $iriConverter = $this->container->get('api_platform.iri_converter'); | ||
| $resourceManager = $this->container->get('api_platform.http_cache.resource_manager'); |
There was a problem hiding this comment.
Circular reference is only because of IriConverter no? Can't you inject the ResourceManager properly though?
| return; | ||
| } | ||
|
|
||
| $parts = array_map(function ($iri) { |
There was a problem hiding this comment.
A small comment to explain what goes below would be great for future readers!
features/http_cache/tags.feature
Outdated
| And I send a "DELETE" request to "/relation_embedders/1" | ||
| Then the response status code should be 204 | ||
| And the header "Cache-Tags" should not exist | ||
| And "/relation_embedders/1,/relation_embedders" IRIs should be purged |
| <argument type="service" id="api_platform.http_cache.resource_manager" /> | ||
| <argument type="service" id="api_platform.http_cache.purger" /> | ||
|
|
||
| <tag name="kernel.event_listener" event="kernel.terminate" method="onKernelTerminate" /> |
There was a problem hiding this comment.
I suggest to move the purge as close as possible to the transaction (doctrine postFlush IMO)
I understand your point of view to return a response to the user as soon as possible, but this introduce cross concurrency bugs.
Example:
- list all blogs (cache mis)
- post a blog
- list all blog (cache hit when called before the kernel terminate even)
I already have this issue in the my CI (agreed, this is a particular case with a lot of stress tests). but the probability of cross concurrency is higher when:
- you have several varnishes to purge
- you have other kernel.terminate events with an higher priority (sending an email for instance)
There was a problem hiding this comment.
^ That does not actually address the race condition, but only trying to avoid it.
There was a problem hiding this comment.
indeed, it fixes one case.
BTW (not related to the RC): A small improvment for the user would be to start async request on the postFlush event, and wait on the kernelFinish event.
| private $vary; | ||
| private $public; | ||
|
|
||
| public function __construct(int $maxAge = null, int $sharedMaxAge = null, array $vary, bool $public = null) |
There was a problem hiding this comment.
I suggest to add a LastModified header too initialized to the default value now().
As a result, Varnish will handled it for free (by default), and returns 304 code stuff.
This small change should not interfer in your App and how you invalid cache. But will save bandwidth between the user and varnish.
There was a problem hiding this comment.
Varnish already returns 304 response properly. We should not send bogus Last-Modified headers.
There was a problem hiding this comment.
I thought this listener was used only with "tag" thing. I apologize, agreed, this header don't make sens here.
But I think, you missed my point, the idea, when using tagged response, is to add a validation header (could be etag or last-modified, don't care). To save bandwidth between users and varnish when cache is hit.
There was a problem hiding this comment.
IMO we should provide a good default config for this (in the standard edition) but let the user configure it as he wants.
There was a problem hiding this comment.
the idea, when using tagged response, is to add a validation header (could be etag or last-modified, don't care). To save bandwidth between users and varnish when cache is hit.
That's incorrect. Varnish already sends 304 response with proper ETag in that case. There is no need for the backend to send an ETag or Last-Modified header unless the backend intends to return 304 on vcl_backend_fetch.
There was a problem hiding this comment.
Let explain with a schema (my last try)...
Here is the actual implementation

My objective is to transform the last response in step 6 into a 304.
- To do it, varnish need a
if-none-matchheader in step 5 - To do it, client need a
etagheader in step 4 - To do it, varnish needs nothing, it just forward the response from the app from step 3
- To do it, app must send this header <= this is my point
And Here is the target sequence

Your last comment suggest that varnish add this header by it self, with something like that

Is it what you mean? Because I doubt varnish add such header by itself (unless you implement this behavior in a VCL which is fine to me too)
There was a problem hiding this comment.
Because I doubt varnish add such header by itself
Varnish does it by default. Try it. At least it has been doing so since 4.0 😄
There was a problem hiding this comment.
I bootstraped an empty api-plateform from this PR api-platform/api-platform#238
curl test2_varnish_1.docker/foos -i
HTTP/1.1 200 OK
Server: nginx/1.11.10
Content-Type: application/ld+json; charset=utf-8
X-Powered-By: PHP/7.1.3
Vary: Content-Type
X-Content-Type-Options: nosniff
X-Frame-Options: deny
Cache-Control: public, s-maxage=3600
Link: <http://test2_varnish_1.docker/docs.jsonld>; rel="http://www.w3.org/ns/hydra/core#apiDocumentation"
Cache-Tags: /foos
Date: Wed, 22 Mar 2017 07:23:50 GMT
X-Varnish: 26 32789
Age: 3
Via: 1.1 varnish-v4
Accept-Ranges: bytes
Content-Length: 111
Connection: keep-alive
{"@context":"\/contexts\/Foo","@id":"\/foos","@type":"hydra:Collection","hydra:member":[],"hydra:totalItems":0}
varnish don't add validation header 😕
There was a problem hiding this comment.
@jderusse You're right. I overlooked that FOSHttpCacheBundle was the one adding it. My apologies...
I think we should use the same approach. Just use the MD5 hash of the Response content as ETag. MD5 is cheap, so no reason to use a meaningless Last-Modified. 😄
There was a problem hiding this comment.
Doing a md5 of the response is not ok (the response contains the date). We may do a md5 of the response's body.
| { | ||
| $loader->load('http_cache.xml'); | ||
|
|
||
| if (true !== $config['http_cache']['enable_tags']) { |
There was a problem hiding this comment.
When enable_tags is true and maxAge > 0. We should warn the developer that the browser will cache responses and remove the benefits of cache invalidation.
There was a problem hiding this comment.
It can be intended (I often do that). It allows to always serve fresh data for the first request but reduce the server load/bandwidth usage thanks to expiration. It's often not a problem when the TTL is low. I would let the user do what he wants without warning.
| return; | ||
| } | ||
|
|
||
| $resources = $request->attributes->get('_resources', []); |
There was a problem hiding this comment.
This tight couple the purge system with http requests context which is used like a databag.
There was a problem hiding this comment.
Yes I'll try to find a better solution but I'm not sure there is one. Anyway its called purge HTTP cache, it's not a big deal to couple it with the HTTP request.
5359fa6 to
a32a049
Compare
|
All comments handled. Can you make a last review? |
|
|
||
| $data = ['_links' => ['self' => ['href' => $this->iriConverter->getIriFromItem($object)]]]; | ||
| $data = ['_links' => ['self' => ['href' => $context['iri']]]]; | ||
| $context['debug'] = true; |
There was a problem hiding this comment.
Is there any reason for this property in the context?
There was a problem hiding this comment.
Good catch, it has nothing to do here!
features/http_cache/tags.feature
Outdated
| @@ -0,0 +1,64 @@ | |||
| Feature: Cache invalidation trough HTTP Cache tags | |||
| !$request->isMethodCacheable() | ||
| || !$response->isCacheable() | ||
| || (!$attributes = RequestAttributesExtractor::extractAttributes($request)) | ||
| || !$resources = $request->attributes->get('_resources') |
There was a problem hiding this comment.
no parenthesis here, but the line above yes?
There was a problem hiding this comment.
The parenthesis on the previous line are mandatory because of operator priorities.
| $normalizer->setSerializer($serializerProphecy->reveal()); | ||
|
|
||
| $this->assertEquals(['name' => 'hello'], $normalizer->normalize($dummy)); | ||
| $this->assertEquals(['name' => 'hello'], $normalizer->normalize($dummy, null, ['resources' => []])); |
There was a problem hiding this comment.
Shouldn't we avoid changing those tests? It feels like the tests (this one and the others) won't pass without 2 more arguments (eg, null, ['resources' => []]). Is it the case / Isn't this breaking things somehow?
There was a problem hiding this comment.
It passes without the arguments and without change, but I've modified it to test the new behavior.
There was a problem hiding this comment.
Okay as long as there is a test that doesn't adds up those arguments.
|
I'm going to test this with a project, ill tell if there are some bugs |
11dd011 to
6075fa2
Compare
|
May you use http://httplug.io/ instead of guzzle directly? |
d044bac to
17443aa
Compare
It has been raised before, and I'd like to once again echo this. |
|
HTTPPlug introduces a lot of complexity (including an extra bundle to configure... until we get Flex) for no gain here. I'm still 👎 for now. |
|
And by the way, Guzzle is a soft dependency here. So someone wanting to use another client can do it, he just have to implement by himself the |
|
As long as Guzzle is a soft dependency it's ok |
Builtin cache invalidation system aka make API Platform fast as hell
The usual quote:
Well, API Platform is an awesome name, so this PR try to solve the other hard thing: caching to make your API as fast as possible.
It introduces a builtin mechanism to always serve API responses from a cache (Varnish and CloudFlare are targeted), and invalid stale data in real time when a resource is updated, deleted or created.
With this mechanism, on my computer and using Docker, large and complex responses are served in ~15ms instead of ~700ms without cache.
The API Platform serializer has been tweaked to store the list of all resources included in a given response (the root document, but also embedded documents and documents appearing in lists).
The response is marked with all resources it contains in the
Cache-TagsHTTP header. A md5 hash of all included IRIs is generated to prevent collisions and reduce the header size.Then, all API's responses are stored with a high expiration time in the proxy cache. On all subsequent (read) requests from a client, the response is served from the proxy, the PHP application is not touched.
When a resource is modified (only changes made to Doctrine entities are supported for now), all responses containing or referencing it are purged from the cache.
A class to purge Varnish is provided in this PR, and a class to purge CloudFlare (enterprise plans only) will be provided later. An interface allows to add support for other cache providers.
Paged collections are also handled: if a resource is added or removed, all collection pages are purged. If a resource is edited, pages (and all other API responses, including those embedding it as a nested document) are purged.
Enabling this new feature doesn't require to change existing code. The following config enable the mechanism and mades your API instantly blazing fast:
I've also opened a PR to add Varnish with a compatible setup to the API Platform Docker setup: api-platform/api-platform#238.
Last but not least, this PR introduces some config options to set global cache settings. Example:
Note: for advanced needs, prefer the awesome FosHttpCache library.
As stated in the quote, cache invalidation is a hard thing and this PR probably contains bugs and edge cases. Please test it and report any problem.
TODO: