Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

ccmtaylor
Copy link
Contributor

@ccmtaylor ccmtaylor commented Aug 25, 2017

This adds a plugin that allows linkerd to use SRV records for service discovery. A linkerd.yaml file using the plugin might contain something like this:

routers:
- protocol: http
  dtab: |
    /dnssrv => /#/io.l5d.dnssrv;
    /svc/myservice =>
               /dnssrv/myservice.srv.example.org &
               /dnssrv/myservice2.srv.example.org;
    /svc/other => 
               /dnssrv/other.srv.example.org;

namers:
- kind: io.l5d.dnssrv
  experimental: true

Fixes #1610

@hawkw
Copy link
Contributor

hawkw commented Aug 25, 2017

Hi @ccmtaylor, this is great, thank you for your contribution!

I'll find some reviewers for this PR; in the mean time, would you mind making sure you've signed our Contributor License Agreement so we can merge this?

@hawkw hawkw added this to the 1.2.0 milestone Aug 25, 2017
@hawkw hawkw requested review from hawkw and olix0r August 25, 2017 16:00
Copy link
Contributor

@hawkw hawkw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This basically looks good to me, here are a handful of minor changes you might want to consider. I'd like @olix0r to sign off on this PR as well before we can merge this.

NameTree.Leaf(Name.Bound(Var.value(Addr.Bound(srvRecords: _*)), id))
}
case code =>
log.trace("unexpected RCODE: %d for %s", code, address)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think anything that makes NameTree.Fail probably ought to be logged at the warning level.

assert(curator.dnsHosts === Some(Seq("localhost")))
}

test("can resolve some public SRV revord") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd feel somewhat more comfortable if tests that create external network connections went in integration rather than unit tests.

extends Namer {

private val log = Logger.get("dnssrv")
private val cache = TrieMap.empty[Path, Var[State[NameTree[Name]]]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take it or leave it: consider com.twitter.util.Memoize as an alternative to map.getOrElseUpdate.

case Some(hosts) => new DNS.ExtendedResolver(hosts.toArray)
case None => new DNS.ExtendedResolver()
}
resolver.setEDNS(0, 2048, 0, Collections.EMPTY_LIST)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe put the level, payload size, etc, in a constant?

case _ => Activity.value(NameTree.Neg)
}

private[dnssrv] def lookupSrv(address: String, id: Path, residual: Path): Try[NameTree[Name]] = Try {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen other namers hand around a residual path, but I'm not sure how to use it, or why they would.

@ccmtaylor
Copy link
Contributor Author

@hawkw thaks for the quick review! I've addressed your comments; let me know if things look alright.

would you mind making sure you've signed our Contributor License Agreement so we can merge this?

SoundCloud recently signed a CLA. I used my SC email in the commits, but if you prefer, I can close this PR and re-submit from a fork under github.com/soundcloud.

@hawkw
Copy link
Contributor

hawkw commented Aug 28, 2017

@ccmtaylor:

SoundCloud recently signed a CLA. I used my SC email in the commits, but if you prefer, I can close this PR and re-submit from a fork under github.com/soundcloud.

Oh, okay; I wasn't aware of that and "please sign the CLA" is just in my boilerplate response for all first-time contributors. You're fine, then!

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome! Very nice work! I've left a few minor comments below.


private val log = Logger.get("dnssrv")
private val memoizedLookup: (Path) => Activity[NameTree[Name]] = Memoize {
case path@Path.Utf8(address, _) =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you're expecting the path to have exactly 2 segments here. I think this should call .take(1) before the match and then match that one segment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I actually wrote it that way first, but thought it looked a bit nicer like this because it's one fewer level of indentation (though I guess Utf8.unapply() might need to do more work.

val id = path.take(1)
Activity(Var.async[State[NameTree[Name]]](Activity.Pending) { state =>
timer.schedule(refreshInterval) {
val next = lookupSrv(address, prefix ++ id) match {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should also pass the residual into this method path.drop(1). (see related comment below)

case _ => Activity.value(NameTree.Neg)
}

private[dnssrv] def lookupSrv(address: String, id: Path): Try[NameTree[Name]] = Try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which part of this method is expected to throw exceptions? It would be nice to scope the Try block more narrowly if possible

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolver.send(query) throws IOException. I'll scope the Try{} to that call.

case _ => Activity.value(NameTree.Neg)
}

private[dnssrv] def lookupSrv(address: String, id: Path): Try[NameTree[Name]] = Try {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this method should take the path residual as a parameter as well.

val query = DNS.Message.newQuery(question)
log.debug("looking up %s", address)
val m = resolver.send(query)
log.debug("got response %s", address)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this log is correct

// valid DNS entry, but no instances.
// for some reason, NameTree.Empty doesn't work right
log.trace("empty response for %s", address)
NameTree.Neg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's debatable if you want NameTree.Neg here or a leaf with Addr.Bound containing an empty set. The former will consider the name invalid and will fall back to any alternatives in the NameTree. The later will be considered a valid name but any requests will fail because there are no addresses to send to.

Copy link
Contributor Author

@ccmtaylor ccmtaylor Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a major linkerd use-case for us is to load-balance across two or more deployments of a service (with different SRV records), i.e. the dtab looks like this:

    /svc/myservice =>
               /dnssrv/myservice1.srv.example.org &
               /dnssrv/myservice2.srv.example.org;

if myservice1 resolves to Addr.Bound(Set.empty), would linkerd still call myservice2?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use an empty address set then half of the time you'll pick the first branch of the union and fail the request and half of the time you'll pick the second branch of the union and use myservice2

If you use NameTree.Neg then you'll skip to the myservice2 every time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds like NameTree.Neg is the right choice then :). In general, why would linkerd not fallback/load-balance for empty sets? I see this has come up in #1612 and #1549, too.

NameTree.Neg
} else {
log.trace("got %d results for %s", srvRecords.length, address)
NameTree.Leaf(Name.Bound(Var.value(Addr.Bound(srvRecords: _*)), id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you want to set the residual as the path of the Name.Bound here

}
case code =>
log.warning("unexpected RCODE: %s for %s", DNS.Rcode.string(code), address)
NameTree.Fail
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be a Throw(...) instead of a Return(NameTree.Fail)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I'm a little unclear on the behaviour of the different NameTree.* variants. When would I use NameTree.Fail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can be useful if you want to artificially halt evaluation of a name tree, but it's rarely used in practice.


@JsonIgnore
override def newNamer(params: Params): Namer = {
import org.xbill.DNS
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move imports to top of file, please

class DnsSrvNamerIntegrationTest extends FunSuite with Matchers {
test("can resolve some public SRV revord") {
val namer = new DnsSrvNamer(Path.empty, new DNS.ExtendedResolver, new NullTimer, Duration.Zero, new NullStatsReceiver)
val result = namer.lookupSrv("_http._tcp.mxtoolbox.com.", Path.read("/foo"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be a better black-box test if you use namer.lookup instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, but I couldn't figure out how to do something like Future.await() for an Activity. The closest I could find is Activity.sample, but I'm not sure if that would race if the lookup is too slow. I'll push something, but please take a look if that's correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you import io.buoyant.namer.RichActivity then you can do await(namer.lookup(...).toFuture)

not sure if this makes a difference in practice, since Try's `flatMap`
also traps exceptions.
@ccmtaylor
Copy link
Contributor Author

ccmtaylor commented Aug 29, 2017

@hawkw @adleong: what kind of instrumentation would you expect on namers? I'm passing in a StatsReceiver, but not using it atm. I couldn't see anything consistent across the other namers, but I'm not sure if there exist some metrics across all kinds of namers (e.g. lookup latency, success/failure counts)?

class DnsSrvNamerIntegrationTest extends FunSuite with Awaits with Matchers {
test("can resolve some public SRV revord") {
val namer = new DnsSrvNamer(Path.empty, new DNS.ExtendedResolver, new NullTimer, Duration.Zero, new NullStatsReceiver)
Activity.sample(namer.lookup(Path.read("/_http._tcp.mxtoolbox.com."))) match {
Copy link
Contributor Author

@ccmtaylor ccmtaylor Aug 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in a previous comment, I'd appreciate a +1 that this should work and doesn't just race on the Activity being in Pending state.

test("parse config") {
val yaml = s"""
|kind: io.l5d.dnssrv
|experimental: true
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I make this required? If so, where?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Set experimentalRequired = true in DnsSrvNamerInitializer. See (for example) K8sNamerInitializer:

@JsonIgnore
override val experimentalRequired = true

@hawkw
Copy link
Contributor

hawkw commented Aug 29, 2017

@ccmtaylor we don't currently collect any metrics from namers, although we probably should.

Copy link
Member

@adleong adleong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐ This is awesome! Very nice work.

Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

massive thanks for this submission.

One small question...

)
val query = DNS.Message.newQuery(question)
log.debug("looking up %s", address)
Try(resolver.send(query)) flatMap { message =>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that resolver.send is a potentially blocking call? If so, then I think DnsSrvNamer should probably be constructed with a com.twitter.util.FuturePool so that these calls don't block the timer thread.

Furthermore, it may be a good idea to record a stat around the lookup times.

case Throw(e) => Activity.Failed(e)
}
state.update(next)
}
Copy link
Member

@olix0r olix0r Sep 1, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to confirm i understand this... timer.schedule returns a closable and the timer is canceled when the Var observation is released?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, that is correct

@ccmtaylor
Copy link
Contributor Author

ccmtaylor commented Sep 4, 2017

lifting this to a top-level comment, so it doesn't get lost in revisions:

@olix0r: I imagine that resolver.send is a potentially blocking call? If so, then I think DnsSrvNamer should probably be constructed with a com.twitter.util.FuturePool so that these calls don't block the timer thread.

very good point, thanks @olix0r! I tried (and failed) to do this today. It's straight forward to make lookupSrv async with a FuturePool by doing

futurePool(resolver.send(query)) flatMap { message =>
  //...
  Future.value(NameTree.whatever)
  //...
  Future.exception(e)
}

but I can't figure out how to fit Futures into the Activity/Var/Timer APIs :(. Here's what I tried:

  • I found Activity.future(Future[T]), but that seems to run a future once, and then stop.
  • I tried to use val (act, witness) = Activity[NameTree[Name]() like this:
    // with lookupSrv modified to return Future[Nametree[Name]] as outlined above
    case id@Path.Utf8(address) =>
      val (act, witness) = Activity[NameTree[Name]]()
      def loop(): Future[NameTree[Name]] = lookupSrv(address, prefix ++ id, path.drop(1)).respond { result =>
        witness.notify(result)
        timer.doLater(refreshInterval)(loop())
      }
      act
    but I'm left with no place to start and evaluate the loop -- calling Await.result(loop()) would miss the point :)

I'm pretty fluent in using Futures, but the Activity and Var APIs are unfamiliar to me, so I'd appreciate any hints.

@olix0r
Copy link
Member

olix0r commented Sep 4, 2017

@ccmtaylor The Var/Activity apis are a bit nuanced and are difficult at times for all of us ;) If you poke around some other namers you may find an idiom like the following:

implicit private[this] val timer = DefaultTimer.twitter

// ...
  Activity(Var.async[Activity.State[NameTree[Name]]](Activity.Pending) { state =>

     @volatile var stopped: Boolean = false

      def loop(): Future[Unit] = {
        if (stopped) Future.Unit
        else pool(resolve(name)).transform { result =>
            result match {
              Ok(addrs) =>
                state() = Activity.Ok(NameTree.Leaf(Addr.Bound(addrs)))

              Throw(e) =>
                log.error(e, "resolution error: %s", name)
                // it may be appropriate to fail the resolution (i.e. Activity.Failed) if there was a pending state before
            }

            // Wait a TTL before resolving again.
            Future.sleep(ttl).before(loop())
        }
      }

      val pending = loop()
      Closable.make { _ =>
        stopped = true

        // Cancel the timer notification or pending requests
        pending.raise(...)

        Future.Unit
      }
  })

... or something roughly like that.

@ccmtaylor
Copy link
Contributor Author

@olix0r thanks for the feedback, that worked great! I found Future.whileDo, which is a nice wrapper around the loop, and I used an AtomicBoolean as shown in the twitter-util cookbook.

use a FuturePool to avoid blocking the timer thread
this should help mitigate effects of transient DNS failures.
Copy link
Member

@olix0r olix0r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for putting this together! looks really good

import com.twitter.util._
import org.xbill.DNS

class DnsSrvNamer(prefix: Path, resolver: DNS.Resolver, refreshInterval: Duration, stats: StatsReceiver, pool: FuturePool)(implicit val timer: Timer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: prefer to break this very long line into a line per argument

val query = DNS.Message.newQuery(question)
log.debug("looking up %s", address)
pool {
val message = Stat.time(latency, TimeUnit.SECONDS)(resolver.send(query))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably better to measure in milliseconds?

private val success = stats.counter("lookup_successes_total")
private val failure = stats.counter("lookup_failures_total")
private val zeroResults = stats.counter("lookup_zero_results_total")
private val latency = stats.stat("request_duration_seconds")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned below, this should probably be measured in millis (and have suffix _ms like other finagle metrics)

@hawkw
Copy link
Contributor

hawkw commented Sep 6, 2017

Hi @ccmtaylor, thanks for those last commits yesterday! I'm going to go ahead and merge this, and it'll show up in Linkerd 1.2.0 (which we're expecting to release by September 14th, if not before). Thanks again for your contribution!

@hawkw hawkw merged commit 3d31437 into linkerd:master Sep 6, 2017
@ccmtaylor
Copy link
Contributor Author

Thanks for your time and feedback, @hawkw, @adleong and @olix0r! I just noticed that I made an editing mistake while writing the docs. #1623 fixes it, sorry!

@hawkw
Copy link
Contributor

hawkw commented Sep 6, 2017

Thanks for the fix, @ccmtaylor!

hawkw pushed a commit that referenced this pull request Sep 6, 2017
While writing the docs for #1611, a line was accidentally removed from the Istio docs. This adds it back.
@ccmtaylor ccmtaylor deleted the dnssrv-namer branch September 6, 2017 18:55
@hawkw hawkw mentioned this pull request Sep 7, 2017
hawkw added a commit that referenced this pull request Sep 7, 2017
## 1.2.0 2017-09-07

* **Breaking Change**: `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd
  admin, and namerd admin now serve on 127.0.0.1 by default (instead of
  0.0.0.0).
* **Breaking Change**: Removed support for PKCS#1-formatted keys. PKCS#1 formatted keys must be converted to PKCS#8 format.
* Added experimental `io.l5d.dnssrv` namer for DNS SRV records (#1611)
* Kubernetes
  * Added an experimental `io.l5d.k8s.configMap` interpreter for reading dtabs from a Kubernetes ConfigMap (#1603). This interpreter will respond to changes in the ConfigMap, allowing for dynamic dtab updates without the need to run Namerd.
  * Made ingress controller's ingress class annotation configurable (#1584).
  * Fixed an issue where Linkerd would continue routing traffic to endpoints of a service after that service was removed (#1622).
  * Major refactoring and performance improvements to `io.l5d.k8s` and `io.l5d.k8s.ns` namers (#1603).
  * Ingress controller now checks all available ingress resources before using a default backend (#1607).
  * Ingress controller now correctly routes requests with host headers that contain ports (#1607).
* HTTP/2
  * Fixed an issue where long-running H2 streams would eventually hang (#1598).
  * Fixed a memory leak on long-running H2 streams (#1598)
  * Added a user-friendly error message when a HTTP/2 router receives a HTTP/1 request (#1618)
* HTTP/1
  * Removed spurious `ReaderDiscarded` exception logged on HTTP/1 retries (#1609)
* Consul
  * Added support for querying Consul by specific service health states (#1601)
  * Consul namers and Dtab store now fall back to a last known good state on Consul observation errors (#1597)
  * Improved log messages for Consul observation errors (#1597)
* TLS
  * Removed support for PKCS#1 keys (#1590)
  * Added validation to prevent incompatible `disableValidation: true` and `clientAuth` settings in TLS client configurations (#1621)
* Changed `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd
  admin, and namerd admin to serve on 127.0.0.1 by default (instead of
  0.0.0.0) (#1366)
* Deprecated `io.l5d.statsd` telemeter.
Tim-Brooks pushed a commit to Tim-Brooks/linkerd that referenced this pull request Dec 20, 2018
When scrollbars are set to always be visible in a browser, we see them appear in the sidebar component of the dashboard.

This PR adds CSS that hides the scrollbar for WebKit browsers, i.e., Chrome and Safari and uses an overflow: hidden technique inspired by this solution to hide the scrollbar in Firefox.

fixes linkerd#1611

Signed-off-by: Dennis Adjei-Baah <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants