Releases: vimeo/galaxycache
v1.3.1 -- Peek Metrics fixes
What's Changed
Full Changelog: v1.3.0...v1.3.1
v1.3.0 -- Peeking and NotFoundErr
New Features
Peer Peeking
When new instances start, they'll invariably have an empty (and cold) cache. Peer Peeking provides a mechanism for that cache to leverage the data that's still in peers' caches that the new instance is taking ownership of without having to repopulate all cache entries anew.
When the galaxy is configured with a galaxycache.PeekPeerCfg, for the first WarmTime after startup a request is made with a short timeout (configured with PeekTimeout) to the "fallthrough owner" of that key (the host that would own this key if the receiving peer wasn't in the hashring).
The configured timeout should be set to something reasonably short, but longer than the expected network/queuing latency. (the initial recommendation is between 2 and 10ms for galaxycache setups that don't span multiple regions)
This feature provides a limited form of cache persistence when new instances start. In particular, it prevents newly started instances/peers from having much higher load (and worse latency) while they fill their "main" cache for the first few minutes.
The WarmTime is intended to be set to the expected to be set long enough that the cache-hitrate settles out after a new instance starts. (depending on request-rate this may be minutes to hours, and is likely to be galaxy-specific)
It's worth noting that because ownership is assigned based on consistent hashing, Peek requests will be spread among the entire set of peers, not just one that's starting or stopping. (we have plans for extending this mechanism to handle other parts of the instance lifecycle, but startup of a new pod is the simplest (and most impactful) place to start)
NotFoundErr local fetch suppression
Historically, Galaxycache has had no way of distinguishing different errors when issuing a request to a peer, so cache-misses that received an error from the backend getter always fell back to a repeated local fetch.
This is fine when the only source of such errors is internal errors, but has some drawbacks when caching data that's resident in a datastore of some sort that may return a "Not Found"/ENOENT of some sort, in which case repeated local fallbacks can be expensive and defeat single-flighting.
This release adds an error interface (NotFoundErr), which can be used to test for a not-found coming from the backend using errors.As.
e.g. the gRPC transport checks here:
To make it easy to integrate with error-wrapping, there's a TrivialNotFoundErr struct{} error-type that can be wrapped in any errors.As-compatible way is convenient.
e.g. here's the http transport's wrapping:
https://github.com/vimeo/galaxycache/blob/64af5bac10d9034540f9353983804534d8ed5db1/http/http.go#L287
What's Changed
- grpc: convert to opaque API by @dfinkel in #63
- define NotFoundErr interface by @dfinkel in #64
- peek: add stub handling in Galaxy & gRPC+HTTP handling by @dfinkel in #65
- peek: implement full galaxy-level Peek handling and automatic calls (opt-in) by @dfinkel in #66
- http: consistently drain the response body by @dfinkel in #67
- chtest: fix segments-per-key calculation & add layout explanation by @dfinkel in #68
- http & grpc peeking & NOT_FOUND fixes + tests by @dfinkel in #69
- compatibility tests: gRPC & HTTP tests by @dfinkel in #70
- PeekPeerCfg: add dialsdesc tags & Verify method by @dfinkel in #71
Full Changelog: v1.2.1...v1.3.0
v1.2.2 -- chtest segments-per-key fix
Full Changelog: v1.2.1...v1.2.2
v1.2.1 -- chtest test helper package
Note: unless there is a critical bug discovered in older versions, this will be the last release that supports Go versions before go 1.23
What's Changed
- consistenthash: add chtest subpackage by @dfinkel in #61
- chtest: key pregistration & non-test build tag by @dfinkel in #62
Full Changelog: v1.2.0...v1.2.1
v1.2.0 -- consistenthash auxiliary updates (bulk-check owner)
What's Changed
- LRU fuzzing overhaul by @dfinkel in #57
- consistenthash: fix GetReplicated owner-collision handling (add tests) by @dfinkel in #59
- consistenthash: fix owner-collision handling and introduce a way to bulk-check ownership of a specific peer by @dfinkel in #58
Full Changelog: v1.1.2...v1.2.0
v1.1.2 -- small byte buckets
What's Changed
- k8swatch: update galaxycache dep by @dfinkel in #55
- opencensus views: add small-byte buckets by @dfinkel in #56
Full Changelog: v1.1.1...v1.1.2
k8swatch/v0.2.0
What's Changed
Full Changelog: v1.1.1...k8swatch/v0.2.0
v1.1.1
v1.1.0
k8swatch/v0.1.0 -- initial release
What's Changed
Full Changelog: v1.1.0...k8swatch/v0.1.0