Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@dadjeibaah
Copy link
Contributor

The Consul namer storage plugin retries a request to a Consul API instance when it receives a response error response that is not 4XX. In this case the namer retries infinitely, which is intended functionality. However, each retry is done in rapid succession with a log message printed right before each retry. This causes the namerd logs to grow rapidly. This PR adds the ability to exponentially increase the amount of time between each successive retry so that log sizes are more manageable.

One thing I am not sure about is the The Backoff parameter for the Consul API in KvApi.scala. right now it is set to Backoff.exponentialJittered(1.milliseconds, 5.seconds). The 5 seconds is the max duration the API will wait to retry. This still feels a little to short of a window so I am open to feedback on what duration would be appropriate in this scenario.

Signed-off-by: Dennis Adjei-Baah [email protected]

@dadjeibaah dadjeibaah self-assigned this Dec 11, 2017
Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Let's also modify the CatalogApi and HealthApi apply constructors in ConsulApi.scala to accept backoffs as a parameter. That way, we can make backoffs configurable in ConsulConfig in ConsulInitializer.scala. This can be done by adding a backoff member to ConsulConfig of type Option[BackoffConfig].

As for a default value, a maximum backoff of 1 minute seems reasonable to me.


val run = Var.async[Activity.State[Set[Ns]]](Activity.Pending) { updates =>
@volatile var running = true
var retPending: Future[_] = Future.Unit
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused

updates() = Activity.Failed(e)
log.error("consul ns list observation error %s", e)
cycle(None)
val sleep #:: backoffs0 = backoffs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The convention is usually that backoffs0 represents the prior backoffs and backoffs1 represent the next backoffs. Alternatively, rest or tail is sometimes used to denote the tail of a stream. So this could be
val sleep #:: backoffs1 = backoffs0 or val sleep #:: rest = backoffs.

log.error("consul ns list observation error %s", e)
cycle(None)
val sleep #:: backoffs0 = backoffs
Future.sleep(sleep).flatMap(_ => cycle(None, backoffs0))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIOLI: flatMap is perfectly fine here but there is also Future.sleep(sleep).before(cycle(None, backoffs0)) that is equivalent. Take your pick.

@siggy siggy added this to the 1.3.4 milestone Dec 12, 2017
@dadjeibaah
Copy link
Contributor Author

@adleong after changing the ConsulConfig to add a configurable backoff setting, I realized that ConsulConfig is used for the consul Namer plugin in linkerd. This change is for the Namerd Dtabstore plugin. Do we need to also change the Namer plugin in Linkerd to reflect this behavior?

@adleong
Copy link
Member

adleong commented Dec 13, 2017

It would probably be a good idea for the consul namer to backoff on errors too. But that doesn't have to be part of this change.

@dadjeibaah
Copy link
Contributor Author

I was able to successfully test the config value by providing a max backoff of 1ms. The config was able to override the default value. Would love feedback on the config field name. I tried to make sure it made sense but also didn't to have a really long field name.


object KvApi {
def apply(c: Client): KvApi = new KvApi(c, s"/$versionString")
val DefaultMaxBackOffDuration = 10000
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets make this a Duration instead of an Int

writeConsistencyMode: Option[ConsistencyMode] = None,
failFast: Option[Boolean] = None
failFast: Option[Boolean] = None,
maxBackoffDurationMs: Option[Int] = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an Option[BackoffConfig] instead of an Option[Int].

readConsistency: Option[ConsistencyMode] = None,
writeConsistency: Option[ConsistencyMode] = None
writeConsistency: Option[ConsistencyMode] = None,
_timer: Timer = DefaultTimer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do implicit val timer: Timer = DefaultTimer here

object KvApi {
def apply(c: Client): KvApi = new KvApi(c, s"/$versionString")
val DefaultMaxBackOffDuration = 10000
def apply(c: Client, backoffMs: Int = DefaultMaxBackOffDuration): KvApi = new KvApi(c, s"/$versionString", Backoff.exponentialJittered(1.milliseconds, backoffMs.milliseconds))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets have this accept a Stream[Duration]

@@ -0,0 +1,26 @@
package io.buoyant.namerd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rm this file, I think

Copy link
Contributor Author

@dadjeibaah dadjeibaah Dec 14, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops thought I deleted it. thanks

Dennis Adjei-Baah added 2 commits December 14, 2017 15:52
Signed-off-by: Dennis Adjei-Baah <[email protected]>
Signed-off-by: Dennis Adjei-Baah <[email protected]>
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions/minor nits.


object KvApi {
def apply(c: Client): KvApi = new KvApi(c, s"/$versionString")
def apply(c: Client, backoff: Stream[Duration]): KvApi = new KvApi(c, s"/$versionString", backoff)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the default value removed from the constructor be added here? (i.e,

def apply(
  c: Client,
  backoff: Stream[Duration] = Backoff.exponentialJittered(1.milliseconds, 5.seconds)
): KvApi = new KvApi(c, s"/$versionString", backoff)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would also make it no longer necessary to change all the tests in KVApiTest...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that the default is defined in one place only.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I saw that after scrolling through the rest of the PR; disregard that.

import com.twitter.finagle.buoyant.ParamsMaybeWith
import com.twitter.util.Duration
import io.buoyant.config.PolymorphicConfig
import io.buoyant.namer.BackoffConfig
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I'm a fan of having this live in the namer package, since I'm pretty sure the backoff config is used in places other than in namers? I think maybe it should be moved someplace where it's visible to the namer module, but not necessarily into that package?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure there exists a package that's a perfect fit. It needs to be visible from both linkerd and namerd and it depends on finagle and config.

object ConsulConfig {
val DefaultHost = "localhost"
val DefaultPort = Port(8500)
val DefaultBackoff = Backoff.decorrelatedJittered(1.millis, 1.minute)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was the default backoff (for KvApi) not previously exponentialJittered rather than decorrelatedJittered? Is there a motivation behind this change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BackoffConfig only produces decorrelatedJittered and constant. Never exponentialJittered. This is more consistent with elsewhere in linkerd.

Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔙📴🔁⏱👍 (as soon as CI passes)

Signed-off-by: Dennis Adjei-Baah <[email protected]>
@siggy siggy merged commit fcea27d into master Dec 15, 2017
@siggy siggy deleted the deebo91/consul-infinite-log-backoff branch December 15, 2017 01:40
@hawkw hawkw mentioned this pull request Dec 15, 2017
hawkw added a commit that referenced this pull request Dec 15, 2017
## 1.3.4 2017-12-15

Linkerd 1.3.4 continues the focus on reliability and stability. It includes a bugfix for HTTP/2 and gRPC routers, several improvements to the Consul namer and dtab store, fixes for 4xx responses in the Kubernetes namer, and more.

* Fix an issue where the `io.l5d.path` identifier would consume query parameters from the request URL, preventing them from reaching the downstream service ([#1734](#1734)).
* Several minor fixes to documentation and examples.
* Consul
  * Improve handling of invalid namespaces in Namerd's Consul dtab store ([#1739](#1739)).
  * Add backoffs to Consul dtab store observation retries ([#1742](#1742)).
  * Fix `io.l5d.consul` namer logging large numbers of spurious error messages during normal operation ([#1738](#1738)).
* HTTP/2 and gRPC
  * Fix buffer data corruption regression introduced in 1.3.3 ([#1751](#1751)). Thanks to [@vadimi](https://github.com/vadimi), who contributed to this fix!
* Kubernetes
  * Improve handling of Kubernetes API watch errors in `io.l5d.k8s` ([#1744](#1744), [#1752](#1752)).
* Namerd
  * Fix `NoHostsAvailable` exception thrown by `io.l5d.mesh` when Namerd has namers configured with transformers ([#1729](#1729)).

Signed-off-by: Eliza Weisman <[email protected]>
Tim-Brooks pushed a commit to Tim-Brooks/linkerd that referenced this pull request Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants