-
Notifications
You must be signed in to change notification settings - Fork 504
Add a backoff retry timer to the ConsulDtabStore observer #1742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…API less often Signed-off-by: Dennis Adjei-Baah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Let's also modify the CatalogApi and HealthApi apply constructors in ConsulApi.scala to accept backoffs as a parameter. That way, we can make backoffs configurable in ConsulConfig in ConsulInitializer.scala. This can be done by adding a backoff member to ConsulConfig of type Option[BackoffConfig].
As for a default value, a maximum backoff of 1 minute seems reasonable to me.
|
|
||
| val run = Var.async[Activity.State[Set[Ns]]](Activity.Pending) { updates => | ||
| @volatile var running = true | ||
| var retPending: Future[_] = Future.Unit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unused
| updates() = Activity.Failed(e) | ||
| log.error("consul ns list observation error %s", e) | ||
| cycle(None) | ||
| val sleep #:: backoffs0 = backoffs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The convention is usually that backoffs0 represents the prior backoffs and backoffs1 represent the next backoffs. Alternatively, rest or tail is sometimes used to denote the tail of a stream. So this could be
val sleep #:: backoffs1 = backoffs0 or val sleep #:: rest = backoffs.
| log.error("consul ns list observation error %s", e) | ||
| cycle(None) | ||
| val sleep #:: backoffs0 = backoffs | ||
| Future.sleep(sleep).flatMap(_ => cycle(None, backoffs0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIOLI: flatMap is perfectly fine here but there is also Future.sleep(sleep).before(cycle(None, backoffs0)) that is equivalent. Take your pick.
|
@adleong after changing the ConsulConfig to add a configurable backoff setting, I realized that ConsulConfig is used for the consul Namer plugin in linkerd. This change is for the Namerd Dtabstore plugin. Do we need to also change the Namer plugin in Linkerd to reflect this behavior? |
|
It would probably be a good idea for the consul namer to backoff on errors too. But that doesn't have to be part of this change. |
Signed-off-by: Dennis Adjei-Baah <[email protected]>
|
I was able to successfully test the config value by providing a max backoff of 1ms. The config was able to override the default value. Would love feedback on the config field name. I tried to make sure it made sense but also didn't to have a really long field name. |
Signed-off-by: Dennis Adjei-Baah <[email protected]>
|
|
||
| object KvApi { | ||
| def apply(c: Client): KvApi = new KvApi(c, s"/$versionString") | ||
| val DefaultMaxBackOffDuration = 10000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets make this a Duration instead of an Int
| writeConsistencyMode: Option[ConsistencyMode] = None, | ||
| failFast: Option[Boolean] = None | ||
| failFast: Option[Boolean] = None, | ||
| maxBackoffDurationMs: Option[Int] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be an Option[BackoffConfig] instead of an Option[Int].
| readConsistency: Option[ConsistencyMode] = None, | ||
| writeConsistency: Option[ConsistencyMode] = None | ||
| writeConsistency: Option[ConsistencyMode] = None, | ||
| _timer: Timer = DefaultTimer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can do implicit val timer: Timer = DefaultTimer here
| object KvApi { | ||
| def apply(c: Client): KvApi = new KvApi(c, s"/$versionString") | ||
| val DefaultMaxBackOffDuration = 10000 | ||
| def apply(c: Client, backoffMs: Int = DefaultMaxBackOffDuration): KvApi = new KvApi(c, s"/$versionString", Backoff.exponentialJittered(1.milliseconds, backoffMs.milliseconds)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets have this accept a Stream[Duration]
Signed-off-by: Dennis Adjei-Baah <[email protected]>
Signed-off-by: Dennis Adjei-Baah <[email protected]>
| @@ -0,0 +1,26 @@ | |||
| package io.buoyant.namerd | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rm this file, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops thought I deleted it. thanks
Signed-off-by: Dennis Adjei-Baah <[email protected]>
Signed-off-by: Dennis Adjei-Baah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions/minor nits.
|
|
||
| object KvApi { | ||
| def apply(c: Client): KvApi = new KvApi(c, s"/$versionString") | ||
| def apply(c: Client, backoff: Stream[Duration]): KvApi = new KvApi(c, s"/$versionString", backoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the default value removed from the constructor be added here? (i.e,
def apply(
c: Client,
backoff: Stream[Duration] = Backoff.exponentialJittered(1.milliseconds, 5.seconds)
): KvApi = new KvApi(c, s"/$versionString", backoff)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this would also make it no longer necessary to change all the tests in KVApiTest...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that the default is defined in one place only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I saw that after scrolling through the rest of the PR; disregard that.
| import com.twitter.finagle.buoyant.ParamsMaybeWith | ||
| import com.twitter.util.Duration | ||
| import io.buoyant.config.PolymorphicConfig | ||
| import io.buoyant.namer.BackoffConfig |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if I'm a fan of having this live in the namer package, since I'm pretty sure the backoff config is used in places other than in namers? I think maybe it should be moved someplace where it's visible to the namer module, but not necessarily into that package?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure there exists a package that's a perfect fit. It needs to be visible from both linkerd and namerd and it depends on finagle and config.
| object ConsulConfig { | ||
| val DefaultHost = "localhost" | ||
| val DefaultPort = Port(8500) | ||
| val DefaultBackoff = Backoff.decorrelatedJittered(1.millis, 1.minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was the default backoff (for KvApi) not previously exponentialJittered rather than decorrelatedJittered? Is there a motivation behind this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BackoffConfig only produces decorrelatedJittered and constant. Never exponentialJittered. This is more consistent with elsewhere in linkerd.
Signed-off-by: Dennis Adjei-Baah <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔙📴🔁⏱👍 (as soon as CI passes)
Signed-off-by: Dennis Adjei-Baah <[email protected]>
## 1.3.4 2017-12-15 Linkerd 1.3.4 continues the focus on reliability and stability. It includes a bugfix for HTTP/2 and gRPC routers, several improvements to the Consul namer and dtab store, fixes for 4xx responses in the Kubernetes namer, and more. * Fix an issue where the `io.l5d.path` identifier would consume query parameters from the request URL, preventing them from reaching the downstream service ([#1734](#1734)). * Several minor fixes to documentation and examples. * Consul * Improve handling of invalid namespaces in Namerd's Consul dtab store ([#1739](#1739)). * Add backoffs to Consul dtab store observation retries ([#1742](#1742)). * Fix `io.l5d.consul` namer logging large numbers of spurious error messages during normal operation ([#1738](#1738)). * HTTP/2 and gRPC * Fix buffer data corruption regression introduced in 1.3.3 ([#1751](#1751)). Thanks to [@vadimi](https://github.com/vadimi), who contributed to this fix! * Kubernetes * Improve handling of Kubernetes API watch errors in `io.l5d.k8s` ([#1744](#1744), [#1752](#1752)). * Namerd * Fix `NoHostsAvailable` exception thrown by `io.l5d.mesh` when Namerd has namers configured with transformers ([#1729](#1729)). Signed-off-by: Eliza Weisman <[email protected]>
The Consul namer storage plugin retries a request to a Consul API instance when it receives a response error response that is not 4XX. In this case the namer retries infinitely, which is intended functionality. However, each retry is done in rapid succession with a log message printed right before each retry. This causes the namerd logs to grow rapidly. This PR adds the ability to exponentially increase the amount of time between each successive retry so that log sizes are more manageable.
One thing I am not sure about is the The
Backoffparameter for the Consul API inKvApi.scala. right now it is set toBackoff.exponentialJittered(1.milliseconds, 5.seconds). The 5 seconds is the max duration the API will wait to retry. This still feels a little to short of a window so I am open to feedback on what duration would be appropriate in this scenario.Signed-off-by: Dennis Adjei-Baah [email protected]