-
Notifications
You must be signed in to change notification settings - Fork 504
add Namer for DNS SRV records #1611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3578e5a
to
956f496
Compare
Hi @ccmtaylor, this is great, thank you for your contribution! I'll find some reviewers for this PR; in the mean time, would you mind making sure you've signed our Contributor License Agreement so we can merge this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This basically looks good to me, here are a handful of minor changes you might want to consider. I'd like @olix0r to sign off on this PR as well before we can merge this.
NameTree.Leaf(Name.Bound(Var.value(Addr.Bound(srvRecords: _*)), id)) | ||
} | ||
case code => | ||
log.trace("unexpected RCODE: %d for %s", code, address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think anything that makes NameTree.Fail
probably ought to be logged at the warning
level.
assert(curator.dnsHosts === Some(Seq("localhost"))) | ||
} | ||
|
||
test("can resolve some public SRV revord") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd feel somewhat more comfortable if tests that create external network connections went in integration rather than unit tests.
extends Namer { | ||
|
||
private val log = Logger.get("dnssrv") | ||
private val cache = TrieMap.empty[Path, Var[State[NameTree[Name]]]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take it or leave it: consider com.twitter.util.Memoize as an alternative to map.getOrElseUpdate
.
case Some(hosts) => new DNS.ExtendedResolver(hosts.toArray) | ||
case None => new DNS.ExtendedResolver() | ||
} | ||
resolver.setEDNS(0, 2048, 0, Collections.EMPTY_LIST) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe put the level, payload size, etc, in a constant?
case _ => Activity.value(NameTree.Neg) | ||
} | ||
|
||
private[dnssrv] def lookupSrv(address: String, id: Path, residual: Path): Try[NameTree[Name]] = Try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen other namers hand around a residual
path, but I'm not sure how to use it, or why they would.
@hawkw thaks for the quick review! I've addressed your comments; let me know if things look alright.
SoundCloud recently signed a CLA. I used my SC email in the commits, but if you prefer, I can close this PR and re-submit from a fork under github.com/soundcloud. |
1278eb0
to
bd0c321
Compare
Oh, okay; I wasn't aware of that and "please sign the CLA" is just in my boilerplate response for all first-time contributors. You're fine, then! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really awesome! Very nice work! I've left a few minor comments below.
|
||
private val log = Logger.get("dnssrv") | ||
private val memoizedLookup: (Path) => Activity[NameTree[Name]] = Memoize { | ||
case path@Path.Utf8(address, _) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like you're expecting the path to have exactly 2 segments here. I think this should call .take(1)
before the match and then match that one segment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I actually wrote it that way first, but thought it looked a bit nicer like this because it's one fewer level of indentation (though I guess Utf8.unapply()
might need to do more work.
val id = path.take(1) | ||
Activity(Var.async[State[NameTree[Name]]](Activity.Pending) { state => | ||
timer.schedule(refreshInterval) { | ||
val next = lookupSrv(address, prefix ++ id) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should also pass the residual into this method path.drop(1)
. (see related comment below)
case _ => Activity.value(NameTree.Neg) | ||
} | ||
|
||
private[dnssrv] def lookupSrv(address: String, id: Path): Try[NameTree[Name]] = Try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part of this method is expected to throw exceptions? It would be nice to scope the Try block more narrowly if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resolver.send(query)
throws IOException
. I'll scope the Try{}
to that call.
case _ => Activity.value(NameTree.Neg) | ||
} | ||
|
||
private[dnssrv] def lookupSrv(address: String, id: Path): Try[NameTree[Name]] = Try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this method should take the path residual as a parameter as well.
val query = DNS.Message.newQuery(question) | ||
log.debug("looking up %s", address) | ||
val m = resolver.send(query) | ||
log.debug("got response %s", address) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this log is correct
// valid DNS entry, but no instances. | ||
// for some reason, NameTree.Empty doesn't work right | ||
log.trace("empty response for %s", address) | ||
NameTree.Neg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's debatable if you want NameTree.Neg here or a leaf with Addr.Bound containing an empty set. The former will consider the name invalid and will fall back to any alternatives in the NameTree. The later will be considered a valid name but any requests will fail because there are no addresses to send to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a major linkerd use-case for us is to load-balance across two or more deployments of a service (with different SRV records), i.e. the dtab looks like this:
/svc/myservice =>
/dnssrv/myservice1.srv.example.org &
/dnssrv/myservice2.srv.example.org;
if myservice1
resolves to Addr.Bound(Set.empty)
, would linkerd still call myservice2
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you use an empty address set then half of the time you'll pick the first branch of the union and fail the request and half of the time you'll pick the second branch of the union and use myservice2
If you use NameTree.Neg then you'll skip to the myservice2 every time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NameTree.Neg | ||
} else { | ||
log.trace("got %d results for %s", srvRecords.length, address) | ||
NameTree.Leaf(Name.Bound(Var.value(Addr.Bound(srvRecords: _*)), id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you want to set the residual as the path
of the Name.Bound here
} | ||
case code => | ||
log.warning("unexpected RCODE: %s for %s", DNS.Rcode.string(code), address) | ||
NameTree.Fail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be a Throw(...)
instead of a Return(NameTree.Fail)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I'm a little unclear on the behaviour of the different NameTree.*
variants. When would I use NameTree.Fail
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be useful if you want to artificially halt evaluation of a name tree, but it's rarely used in practice.
|
||
@JsonIgnore | ||
override def newNamer(params: Params): Namer = { | ||
import org.xbill.DNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move imports to top of file, please
class DnsSrvNamerIntegrationTest extends FunSuite with Matchers { | ||
test("can resolve some public SRV revord") { | ||
val namer = new DnsSrvNamer(Path.empty, new DNS.ExtendedResolver, new NullTimer, Duration.Zero, new NullStatsReceiver) | ||
val result = namer.lookupSrv("_http._tcp.mxtoolbox.com.", Path.read("/foo")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be a better black-box test if you use namer.lookup instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but I couldn't figure out how to do something like Future.await()
for an Activity
. The closest I could find is Activity.sample
, but I'm not sure if that would race if the lookup is too slow. I'll push something, but please take a look if that's correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you import io.buoyant.namer.RichActivity
then you can do await(namer.lookup(...).toFuture)
not sure if this makes a difference in practice, since Try's `flatMap` also traps exceptions.
@hawkw @adleong: what kind of instrumentation would you expect on namers? I'm passing in a |
class DnsSrvNamerIntegrationTest extends FunSuite with Awaits with Matchers { | ||
test("can resolve some public SRV revord") { | ||
val namer = new DnsSrvNamer(Path.empty, new DNS.ExtendedResolver, new NullTimer, Duration.Zero, new NullStatsReceiver) | ||
Activity.sample(namer.lookup(Path.read("/_http._tcp.mxtoolbox.com."))) match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned in a previous comment, I'd appreciate a +1 that this should work and doesn't just race on the Activity
being in Pending
state.
test("parse config") { | ||
val yaml = s""" | ||
|kind: io.l5d.dnssrv | ||
|experimental: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I make this required? If so, where?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Set experimentalRequired = true
in DnsSrvNamerInitializer
. See (for example) K8sNamerInitializer
:
linkerd/namer/k8s/src/main/scala/io/buoyant/namer/k8s/K8sExternalInitializer.scala
Lines 34 to 35 in 5d22b56
@JsonIgnore | |
override val experimentalRequired = true |
@ccmtaylor we don't currently collect any metrics from namers, although we probably should. |
5a34d87
to
5f0a328
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐ This is awesome! Very nice work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
massive thanks for this submission.
One small question...
) | ||
val query = DNS.Message.newQuery(question) | ||
log.debug("looking up %s", address) | ||
Try(resolver.send(query)) flatMap { message => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that resolver.send is a potentially blocking call? If so, then I think DnsSrvNamer should probably be constructed with a com.twitter.util.FuturePool so that these calls don't block the timer thread.
Furthermore, it may be a good idea to record a stat around the lookup times.
case Throw(e) => Activity.Failed(e) | ||
} | ||
state.update(next) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just to confirm i understand this... timer.schedule returns a closable and the timer is canceled when the Var observation is released?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, that is correct
lifting this to a top-level comment, so it doesn't get lost in revisions:
very good point, thanks @olix0r! I tried (and failed) to do this today. It's straight forward to make futurePool(resolver.send(query)) flatMap { message =>
//...
Future.value(NameTree.whatever)
//...
Future.exception(e)
} but I can't figure out how to fit Futures into the Activity/Var/Timer APIs :(. Here's what I tried:
I'm pretty fluent in using Futures, but the Activity and Var APIs are unfamiliar to me, so I'd appreciate any hints. |
@ccmtaylor The Var/Activity apis are a bit nuanced and are difficult at times for all of us ;) If you poke around some other namers you may find an idiom like the following: implicit private[this] val timer = DefaultTimer.twitter
// ...
Activity(Var.async[Activity.State[NameTree[Name]]](Activity.Pending) { state =>
@volatile var stopped: Boolean = false
def loop(): Future[Unit] = {
if (stopped) Future.Unit
else pool(resolve(name)).transform { result =>
result match {
Ok(addrs) =>
state() = Activity.Ok(NameTree.Leaf(Addr.Bound(addrs)))
Throw(e) =>
log.error(e, "resolution error: %s", name)
// it may be appropriate to fail the resolution (i.e. Activity.Failed) if there was a pending state before
}
// Wait a TTL before resolving again.
Future.sleep(ttl).before(loop())
}
}
val pending = loop()
Closable.make { _ =>
stopped = true
// Cancel the timer notification or pending requests
pending.raise(...)
Future.Unit
}
}) ... or something roughly like that. |
3616376
to
a65dec5
Compare
@olix0r thanks for the feedback, that worked great! I found |
use a FuturePool to avoid blocking the timer thread
a65dec5
to
eb723b0
Compare
this should help mitigate effects of transient DNS failures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for putting this together! looks really good
import com.twitter.util._ | ||
import org.xbill.DNS | ||
|
||
class DnsSrvNamer(prefix: Path, resolver: DNS.Resolver, refreshInterval: Duration, stats: StatsReceiver, pool: FuturePool)(implicit val timer: Timer) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer to break this very long line into a line per argument
val query = DNS.Message.newQuery(question) | ||
log.debug("looking up %s", address) | ||
pool { | ||
val message = Stat.time(latency, TimeUnit.SECONDS)(resolver.send(query)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably better to measure in milliseconds?
private val success = stats.counter("lookup_successes_total") | ||
private val failure = stats.counter("lookup_failures_total") | ||
private val zeroResults = stats.counter("lookup_zero_results_total") | ||
private val latency = stats.stat("request_duration_seconds") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as mentioned below, this should probably be measured in millis (and have suffix _ms
like other finagle metrics)
Hi @ccmtaylor, thanks for those last commits yesterday! I'm going to go ahead and merge this, and it'll show up in Linkerd 1.2.0 (which we're expecting to release by September 14th, if not before). Thanks again for your contribution! |
Thanks for the fix, @ccmtaylor! |
While writing the docs for #1611, a line was accidentally removed from the Istio docs. This adds it back.
## 1.2.0 2017-09-07 * **Breaking Change**: `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd admin, and namerd admin now serve on 127.0.0.1 by default (instead of 0.0.0.0). * **Breaking Change**: Removed support for PKCS#1-formatted keys. PKCS#1 formatted keys must be converted to PKCS#8 format. * Added experimental `io.l5d.dnssrv` namer for DNS SRV records (#1611) * Kubernetes * Added an experimental `io.l5d.k8s.configMap` interpreter for reading dtabs from a Kubernetes ConfigMap (#1603). This interpreter will respond to changes in the ConfigMap, allowing for dynamic dtab updates without the need to run Namerd. * Made ingress controller's ingress class annotation configurable (#1584). * Fixed an issue where Linkerd would continue routing traffic to endpoints of a service after that service was removed (#1622). * Major refactoring and performance improvements to `io.l5d.k8s` and `io.l5d.k8s.ns` namers (#1603). * Ingress controller now checks all available ingress resources before using a default backend (#1607). * Ingress controller now correctly routes requests with host headers that contain ports (#1607). * HTTP/2 * Fixed an issue where long-running H2 streams would eventually hang (#1598). * Fixed a memory leak on long-running H2 streams (#1598) * Added a user-friendly error message when a HTTP/2 router receives a HTTP/1 request (#1618) * HTTP/1 * Removed spurious `ReaderDiscarded` exception logged on HTTP/1 retries (#1609) * Consul * Added support for querying Consul by specific service health states (#1601) * Consul namers and Dtab store now fall back to a last known good state on Consul observation errors (#1597) * Improved log messages for Consul observation errors (#1597) * TLS * Removed support for PKCS#1 keys (#1590) * Added validation to prevent incompatible `disableValidation: true` and `clientAuth` settings in TLS client configurations (#1621) * Changed `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd admin, and namerd admin to serve on 127.0.0.1 by default (instead of 0.0.0.0) (#1366) * Deprecated `io.l5d.statsd` telemeter.
When scrollbars are set to always be visible in a browser, we see them appear in the sidebar component of the dashboard. This PR adds CSS that hides the scrollbar for WebKit browsers, i.e., Chrome and Safari and uses an overflow: hidden technique inspired by this solution to hide the scrollbar in Firefox. fixes linkerd#1611 Signed-off-by: Dennis Adjei-Baah <[email protected]>
This adds a plugin that allows linkerd to use SRV records for service discovery. A
linkerd.yaml
file using the plugin might contain something like this:Fixes #1610