Kubernetes API refactor #1603

hawkw · 2017-08-22T16:19:53Z

This commit contains three major features: first, a refactor of the Watchable trait in io.buoyant.k8s; second, a new interpreter for reading dtabs from a Kubernetes ConfigMap; and third, refactoring of the Kubernetes EndpointsNamer and ServiceNamers to watch individual Kubernetes API resources rather than the entire namespace.

Make Kubernetes Objects watchable (#1527)

Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the ?watch=true path param.

I've rewritten our code to establish watches using the $API/$NS/watch/$RESOURCE endpoint, instead, thus making the KubeObject type watchable as well as KubeList. I've also factored out some commonly repeated code for creating an Activity to watch a Kubernetes resource, and put that in the Watchable trait.

The unit tests inEndpointsNamerTest, ServiceNamerTest, and ApiTest have all been updated to cover the new Watchable implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster.

Add interpreter for reading from a k8s ConfigMap (#1532)

This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506.

I've built on @klingerf's work in 9a7c55f and my changes to Watchable e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new Watchable API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production.

I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the ConfigMapInterpreterInitializer configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to linkerd/docs/interpreters.md.

K8s namers watch individual objects, rather than the entire ns (#1587)

Right now, the io.l5d.k8s and io.l5d.k8s.ns namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. kube-system/ #1361).

My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the EndpointsNamer and ServiceNamers to use the new watch code added in #1527, both for the /namespaces/{ns}/endpoints/{service} API endpoint and for the /namespaces/{ns}/services/{service} endpoint that we use for resolving numbered port mappings.

The EndpointsNamer caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on .../services/... is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in EndpointsNamer as much as possible.

Note that I've also made a change to the Watchable.get() method signature; it now returns a Future[Option[G]] rather than Future[G]. The test suite for ServiceNamer dictates that an unknown service name should return a negative NameTree, but watch the service name in case it is created later. In order to fix this, I modified Watchable.get()to return None if the requested path returns 404. Watchable.activity() still starts the watch if None is returned. This required some modification to other existing classes, but appears to not have broken anything.

I've updated the existing unit tests in EndpointsNamerTestand ServiceNamerTest and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to Watchable.get() I mentioned above.

Closes #1534
Closes #1575
Closes #1361

hawkw · 2017-08-22T16:26:17Z

Since this is a pretty big set of changes, it'll need some serious scrutiny before we can merge this into master.

Each individual feature contained in this branch has already been reviewed by @adleong and @klingerf, and tested on the Kubernetes test cluster by @klingerf.

Per Alex's recommendation, before we merge branch, we should:

code review all the changes in this PR
run it for 24h in the test environment (at a minimum)
see if we can come up with a test that exercises rapidly-changing services?
see if we can get one or more of our users to test an RC1?

@klingerf

Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the `?watch=true` path param. I've rewritten our code to establish watches using the `$API/$NS/watch/$RESOURCE` endpoint, instead, thus making the `KubeObject` type watchable as well as `KubeList`. I've also factored out some commonly repeated code for creating an `Activity` to watch a Kubernetes resource, and put that in the `Watchable` trait. The unit tests in`EndpointsNamerTest`, `ServiceNamerTest`, and `ApiTest` have all been updated to cover the new `Watchable` implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster.

@klingerf

This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506. I've built on @klingerf's work in 9a7c55f and my changes to `Watchable` e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new `Watchable` API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production. I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the `ConfigMapInterpreterInitializer` configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to `linkerd/docs/interpreters.md`. Closes #1506

Right now, the `io.l5d.k8s` and `io.l5d.k8s.ns` namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. `kube-system`/ #1361). My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the `EndpointsNamer` and `ServiceNamer`s to use the new watch code added in #1527, both for the `/namespaces/{ns}/endpoints/{service}` API endpoint and for the `/namespaces/{ns}/services/{service}` endpoint that we use for resolving numbered port mappings. The `EndpointsNamer` caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on `.../services/...` is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in `EndpointsNamer` as much as possible. Note that I've also made a change to the `Watchable.get()` method signature; it now returns a `Future[Option[G]]` rather than `Future[G]`. The test suite for` ServiceNamer` dictates that an unknown service name should return a negative `NameTree`, _but_ watch the service name in case it is created later. In order to fix this, I modified `Watchable.get()`to return `None` if the requested path returns 404. `Watchable.activity()` still starts the watch if `None` is returned. This required some modification to other existing classes, but appears to not have broken anything. I've updated the existing unit tests in `EndpointsNamerTest`and `ServiceNamerTest` and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to `Watchable.get()` I mentioned above. Closes #1534 Closes #1575

This notice was displayed if dentries are from a configmap as well as from namerd.

esbie · 2017-08-24T16:51:24Z

admin/src/main/resources/io/buoyant/admin/js/template/delegator.handlebars

  </h4>
  <div id="dtab"></div>
-  <div class="namerd-dtab-warning hide">Note: the above Dentries are from namerd and can't be edited.</div>
+  <div class="namerd-dtab-warning hide">Note: the above Dentries are from an external source and can't be edited.</div>


thanks, this has been bothering me

(I think you'll need to recompile the templates in order for this change to take effect)

yeah, I realized that: d215b98 :)

klingerf

⭐️ Have done some manual testing on this branch, and it all looks good to me.

+ Add comments + Remove dead code + Small code style tweaks

@klingerf

This commit contains three major features: first, a refactor of the `Watchable` trait in `io.buoyant.k8s`; second, a new interpreter for reading dtabs from a Kubernetes ConfigMap; and third, refactoring of the Kubernetes `EndpointsNamer` and `ServiceNamer`s to watch individual Kubernetes API resources rather than the entire namespace. ### Make Kubernetes Objects watchable (#1527) Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the `?watch=true` path param. I've rewritten our code to establish watches using the `$API/$NS/watch/$RESOURCE` endpoint, instead, thus making the `KubeObject` type watchable as well as `KubeList`. I've also factored out some commonly repeated code for creating an `Activity` to watch a Kubernetes resource, and put that in the `Watchable` trait. The unit tests in`EndpointsNamerTest`, `ServiceNamerTest`, and `ApiTest` have all been updated to cover the new `Watchable` implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster. ### Add interpreter for reading from a k8s ConfigMap (#1532) This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506. I've built on @klingerf's work in 9a7c55f and my changes to `Watchable` e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new `Watchable` API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production. I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the `ConfigMapInterpreterInitializer` configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to `linkerd/docs/interpreters.md`. ### K8s namers watch individual objects, rather than the entire ns (#1587) Right now, the `io.l5d.k8s` and `io.l5d.k8s.ns` namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. `kube-system`/ #1361). My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the `EndpointsNamer` and `ServiceNamer`s to use the new watch code added in #1527, both for the `/namespaces/{ns}/endpoints/{service}` API endpoint and for the `/namespaces/{ns}/services/{service}` endpoint that we use for resolving numbered port mappings. The `EndpointsNamer` caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on `.../services/...` is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in `EndpointsNamer` as much as possible. Note that I've also made a change to the `Watchable.get()` method signature; it now returns a `Future[Option[G]]` rather than `Future[G]`. The test suite for` ServiceNamer` dictates that an unknown service name should return a negative `NameTree`, _but_ watch the service name in case it is created later. In order to fix this, I modified `Watchable.get()`to return `None` if the requested path returns 404. `Watchable.activity()` still starts the watch if `None` is returned. This required some modification to other existing classes, but appears to not have broken anything. I've updated the existing unit tests in `EndpointsNamerTest`and `ServiceNamerTest` and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to `Watchable.get()` I mentioned above. Closes #1534 Closes #1575 Closes #1361

## 1.2.0 2017-09-07 * **Breaking Change**: `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd admin, and namerd admin now serve on 127.0.0.1 by default (instead of 0.0.0.0). * **Breaking Change**: Removed support for PKCS#1-formatted keys. PKCS#1 formatted keys must be converted to PKCS#8 format. * Added experimental `io.l5d.dnssrv` namer for DNS SRV records (#1611) * Kubernetes * Added an experimental `io.l5d.k8s.configMap` interpreter for reading dtabs from a Kubernetes ConfigMap (#1603). This interpreter will respond to changes in the ConfigMap, allowing for dynamic dtab updates without the need to run Namerd. * Made ingress controller's ingress class annotation configurable (#1584). * Fixed an issue where Linkerd would continue routing traffic to endpoints of a service after that service was removed (#1622). * Major refactoring and performance improvements to `io.l5d.k8s` and `io.l5d.k8s.ns` namers (#1603). * Ingress controller now checks all available ingress resources before using a default backend (#1607). * Ingress controller now correctly routes requests with host headers that contain ports (#1607). * HTTP/2 * Fixed an issue where long-running H2 streams would eventually hang (#1598). * Fixed a memory leak on long-running H2 streams (#1598) * Added a user-friendly error message when a HTTP/2 router receives a HTTP/1 request (#1618) * HTTP/1 * Removed spurious `ReaderDiscarded` exception logged on HTTP/1 retries (#1609) * Consul * Added support for querying Consul by specific service health states (#1601) * Consul namers and Dtab store now fall back to a last known good state on Consul observation errors (#1597) * Improved log messages for Consul observation errors (#1597) * TLS * Removed support for PKCS#1 keys (#1590) * Added validation to prevent incompatible `disableValidation: true` and `clientAuth` settings in TLS client configurations (#1621) * Changed `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd admin, and namerd admin to serve on 127.0.0.1 by default (instead of 0.0.0.0) (#1366) * Deprecated `io.l5d.statsd` telemeter.

obeattie · 2017-09-22T17:32:48Z

Sorry to come so late to this and I'm happy to discuss this elsewhere if it's better for you. I was just looking at upgrading linkerd in our environment to 1.2.0 and saw this change. The "k8s namers watch individual objects, rather than the entire ns" part of this PR concerns me a little. It would be good to understand better what kind of testing has been done on it.

We have ≈300 distinct service deployments running in k8s and we use linkerd to mediate RPC communication between all of them, running as a DaemonSet on each of our ≈250 nodes. If I understand correctly, before this change there would have been 250 watches (one per linkerd), but with this change that will become 300 * 250 = 75,000 unique watches established against the k8s API. Granted, this is a worst case where every linkerd is processing requests bound for every service, but given that linkerd's are long-lived in our platform while services are fairly ephemeral, the number of watches does seem like it would trend toward this. As far as I can tell, once a watch is established for a destination it will never be evicted from the cache.

I've done no testing but this seems like rather a lot of watches which is likely to cause pressure on both the k8s apiservers and linkerd itself. I understand that in the mentioned ticket, linkerd is doing a lot of work processing events and caching name trees for destinations it'll never contact, but for environments in which linkerd's are dealing with requests to a large number of destinations this change seems like it could worsen performance significantly.

Of course, I may be missing something significant here 😇

adleong · 2017-09-22T18:45:47Z

👋 Very valid concerns. We considered these factors when doing this refactor and I think performance should still be acceptable for reasons that I will explain. However, this is only based on our intuition and reasoning since we don't have a k8s cluster of this magnitude to test against. If you are able to try this out, any data that you can report back about performance would be extremely helpful.

On the linkerd side, this change went from having linkerd establish a single watch that watched all services to establishing individual watches on the services it needed to route to. Even though this is a larger number of watches and connections, it should be a subset of the data volume and therefore decrease the load on linkerd. Furthermore, linkerd expires idle watches (with a TTL of 10 minutes by default) so it's unlikely that Linkerd will keep watches on every service in the cluster, even if Linkerd is long running.

As for load on the k8s api, namerd will effectively be used as a cache and should only establish 1 watch per service. The Linkerds, in turn, establish watches on namerd.

Hopefully this makes sense. But, as I said, theory and reasoning is no substitute for real world data; so please let us know if this doesn't match the actual behavior of Linkerd in prod.

If an input file is un-injectable, existing inject behavior is to simply output a copy of the input. Introduce a report, printed to stderr, that communicates the end state of the inject command. Currently this includes checking for hostNetwork and unsupported resources. Malformed YAML documents will continue to cause no YAML output, and return error code 1. This change also modifies integration tests to handle stdout and stderr separately. example outputs... some pods injected, none with host networking: ``` hostNetwork: pods do not use host networking...............................[ok] supported: at least one resource injected..................................[ok] Summary: 4 of 8 YAML document(s) injected deploy/emoji deploy/voting deploy/web deploy/vote-bot ``` some pods injected, one host networking: ``` hostNetwork: pods do not use host networking...............................[warn] -- deploy/vote-bot uses "hostNetwork: true" supported: at least one resource injected..................................[ok] Summary: 3 of 8 YAML document(s) injected deploy/emoji deploy/voting deploy/web ``` no pods injected: ``` hostNetwork: pods do not use host networking...............................[warn] -- deploy/emoji, deploy/voting, deploy/web, deploy/vote-bot use "hostNetwork: true" supported: at least one resource injected..................................[warn] -- no supported objects found Summary: 0 of 8 YAML document(s) injected ``` TODO: check for UDP and other init containers Part of linkerd#1516 Signed-off-by: Andrew Seigner <[email protected]>

hawkw added the kubernetes label Aug 22, 2017

hawkw added this to the 1.2.0 milestone Aug 22, 2017

hawkw self-assigned this Aug 22, 2017

hawkw requested review from klingerf and olix0r August 22, 2017 16:19

hawkw changed the title ~~Kubernetes API watch refactor~~ Kubernetes watch refactor Aug 22, 2017

hawkw changed the title ~~Kubernetes watch refactor~~ Kubernetes API refactor Aug 22, 2017

hawkw mentioned this pull request Aug 22, 2017

K8s: Fallback issue if there are no running pods in service #1549

Closed

hawkw added 3 commits August 22, 2017 10:43

klingerf force-pushed the kube-refactor branch from 0d0c998 to 03ec251 Compare August 22, 2017 17:59

hawkw and others added 2 commits August 24, 2017 10:21

Change note "Dentries are from namerd" to "...from an external source"

e36f7d5

This notice was displayed if dentries are from a configmap as well as from namerd.

Add changes to precompiled template

d215b98

esbie reviewed Aug 24, 2017

View reviewed changes

Review feedback + style fixes

faf3dc5

klingerf approved these changes Aug 24, 2017

View reviewed changes

hawkw and others added 3 commits August 25, 2017 10:28

Minor ServiceNamer improvements

9688dc4

+ Add comments + Remove dead code + Small code style tweaks

Add more comments to EndpointsNamer

899d938

Merge branch 'master' into kube-refactor

8e03a5a

hawkw merged commit 2dcac1f into master Aug 25, 2017

hawkw mentioned this pull request Aug 25, 2017

New interpreter that reads dtab from a k8s configmap #1198

Closed

klingerf mentioned this pull request Aug 25, 2017

Linkerd continues to send traffic to k8s endpoints after service is removed #1612

Closed

hawkw mentioned this pull request Sep 7, 2017

1.2.0 #1629

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Kubernetes API refactor #1603

Kubernetes API refactor #1603

Uh oh!

hawkw commented Aug 22, 2017 •

edited

Loading

Uh oh!

hawkw commented Aug 22, 2017 •

edited

Loading

Uh oh!

esbie Aug 24, 2017

Uh oh!

esbie Aug 24, 2017

Uh oh!

hawkw Aug 24, 2017

Uh oh!

klingerf left a comment

Uh oh!

obeattie commented Sep 22, 2017

Uh oh!

adleong commented Sep 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Kubernetes API refactor #1603

Kubernetes API refactor #1603

Uh oh!

Conversation

hawkw commented Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Make Kubernetes Objects watchable (#1527)

Add interpreter for reading from a k8s ConfigMap (#1532)

K8s namers watch individual objects, rather than the entire ns (#1587)

Uh oh!

hawkw commented Aug 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

esbie Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

esbie Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

hawkw Aug 24, 2017

Choose a reason for hiding this comment

Uh oh!

klingerf left a comment

Choose a reason for hiding this comment

Uh oh!

obeattie commented Sep 22, 2017

Uh oh!

adleong commented Sep 22, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

hawkw commented Aug 22, 2017 •

edited

Loading

hawkw commented Aug 22, 2017 •

edited

Loading