Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@hawkw
Copy link
Contributor

@hawkw hawkw commented Aug 22, 2017

This commit contains three major features: first, a refactor of the Watchable trait in io.buoyant.k8s; second, a new interpreter for reading dtabs from a Kubernetes ConfigMap; and third, refactoring of the Kubernetes EndpointsNamer and ServiceNamers to watch individual Kubernetes API resources rather than the entire namespace.

Make Kubernetes Objects watchable (#1527)

Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the ?watch=true path param.

I've rewritten our code to establish watches using the $API/$NS/watch/$RESOURCE endpoint, instead, thus making the KubeObject type watchable as well as KubeList. I've also factored out some commonly repeated code for creating an Activity to watch a Kubernetes resource, and put that in the Watchable trait.

The unit tests inEndpointsNamerTest, ServiceNamerTest, and ApiTest have all been updated to cover the new Watchable implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster.

Add interpreter for reading from a k8s ConfigMap (#1532)

This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506.

I've built on @klingerf's work in 9a7c55f and my changes to Watchable e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new Watchable API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production.

I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the ConfigMapInterpreterInitializer configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to linkerd/docs/interpreters.md.

K8s namers watch individual objects, rather than the entire ns (#1587)

Right now, the io.l5d.k8s and io.l5d.k8s.ns namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. kube-system/ #1361).

My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the EndpointsNamer and ServiceNamers to use the new watch code added in #1527, both for the /namespaces/{ns}/endpoints/{service} API endpoint and for the /namespaces/{ns}/services/{service} endpoint that we use for resolving numbered port mappings.

The EndpointsNamer caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on .../services/... is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in EndpointsNamer as much as possible.

Note that I've also made a change to the Watchable.get() method signature; it now returns a Future[Option[G]] rather than Future[G]. The test suite for ServiceNamer dictates that an unknown service name should return a negative NameTree, but watch the service name in case it is created later. In order to fix this, I modified Watchable.get()to return None if the requested path returns 404. Watchable.activity() still starts the watch if None is returned. This required some modification to other existing classes, but appears to not have broken anything.

I've updated the existing unit tests in EndpointsNamerTestand ServiceNamerTest and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to Watchable.get() I mentioned above.

Closes #1534
Closes #1575
Closes #1361

@hawkw hawkw added this to the 1.2.0 milestone Aug 22, 2017
@hawkw hawkw self-assigned this Aug 22, 2017
@hawkw hawkw requested review from klingerf and olix0r August 22, 2017 16:19
@hawkw
Copy link
Contributor Author

hawkw commented Aug 22, 2017

Since this is a pretty big set of changes, it'll need some serious scrutiny before we can merge this into master.

Each individual feature contained in this branch has already been reviewed by @adleong and @klingerf, and tested on the Kubernetes test cluster by @klingerf.

Per Alex's recommendation, before we merge branch, we should:

  • code review all the changes in this PR
  • run it for 24h in the test environment (at a minimum)
  • see if we can come up with a test that exercises rapidly-changing services?
  • see if we can get one or more of our users to test an RC1?

@hawkw hawkw changed the title Kubernetes API watch refactor Kubernetes watch refactor Aug 22, 2017
@hawkw hawkw changed the title Kubernetes watch refactor Kubernetes API refactor Aug 22, 2017
hawkw added 3 commits August 22, 2017 10:43
Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the `?watch=true` path param.

I've rewritten our code to establish watches using the `$API/$NS/watch/$RESOURCE` endpoint, instead, thus making the `KubeObject` type watchable as well as `KubeList`. I've also factored out some commonly repeated code for creating an `Activity` to watch a Kubernetes resource, and put that in the `Watchable` trait.

The unit tests in`EndpointsNamerTest`, `ServiceNamerTest`, and `ApiTest` have all been updated to cover the new `Watchable` implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster.
This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506.

I've built on @klingerf's work in 9a7c55f and my changes to `Watchable` e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new `Watchable` API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production.

I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the `ConfigMapInterpreterInitializer` configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to `linkerd/docs/interpreters.md`.

Closes #1506
Right now, the `io.l5d.k8s` and `io.l5d.k8s.ns` namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. `kube-system`/ #1361).

My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the `EndpointsNamer` and `ServiceNamer`s to use the new watch code added in #1527, both for the `/namespaces/{ns}/endpoints/{service}` API endpoint and for the `/namespaces/{ns}/services/{service}` endpoint that we use for resolving numbered port mappings.

The `EndpointsNamer` caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on `.../services/...` is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in `EndpointsNamer` as much as possible.

Note that I've also made a change to the `Watchable.get()` method signature; it now returns a `Future[Option[G]]` rather than `Future[G]`. The test suite for` ServiceNamer` dictates that an unknown service name should return a negative `NameTree`, _but_ watch the service name in case it is created later. In order to fix this, I modified `Watchable.get()`to return `None` if the requested path returns 404. `Watchable.activity()` still starts the watch if `None` is returned. This required some modification to other existing classes, but appears to not have broken anything.

I've updated the existing unit tests in `EndpointsNamerTest`and `ServiceNamerTest` and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to `Watchable.get()` I mentioned above.

Closes #1534
Closes #1575
hawkw and others added 2 commits August 24, 2017 10:21
This notice was displayed if dentries are from a configmap as well as
from namerd.
</h4>
<div id="dtab"></div>
<div class="namerd-dtab-warning hide">Note: the above Dentries are from namerd and can't be edited.</div>
<div class="namerd-dtab-warning hide">Note: the above Dentries are from an external source and can't be edited.</div>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, this has been bothering me

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think you'll need to recompile the templates in order for this change to take effect)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I realized that: d215b98 :)

Copy link
Contributor

@klingerf klingerf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⭐️ Have done some manual testing on this branch, and it all looks good to me.

@hawkw hawkw merged commit 2dcac1f into master Aug 25, 2017
hawkw added a commit that referenced this pull request Aug 31, 2017
This commit contains three major features: first, a refactor of the `Watchable` trait in `io.buoyant.k8s`; second, a new interpreter for reading dtabs from a Kubernetes ConfigMap; and third, refactoring of the Kubernetes `EndpointsNamer` and `ServiceNamer`s to watch individual Kubernetes API resources rather than the entire namespace.

### Make Kubernetes Objects watchable (#1527)

Previously, only Kubernetes list resources were watchable, since our code for establishing watches on Kubernetes resources used the `?watch=true` path param.

I've rewritten our code to establish watches using the `$API/$NS/watch/$RESOURCE` endpoint, instead, thus making the `KubeObject` type watchable as well as `KubeList`. I've also factored out some commonly repeated code for creating an `Activity` to watch a Kubernetes resource, and put that in the `Watchable` trait.

The unit tests in`EndpointsNamerTest`, `ServiceNamerTest`, and `ApiTest` have all been updated to cover the new `Watchable` implementation. Additionally, I believe @klingerf has tested this somewhat on a live Kubernetes cluster.

### Add interpreter for reading from a k8s ConfigMap (#1532)
This commit adds an interpreter for reading Dtabs from a Kubernetes ConfigMap – as described in issue #1506.

I've built on @klingerf's work in 9a7c55f and my changes to `Watchable` e8985e2 to refactor Kevin's old interpreter code. This interpreter uses the new `Watchable` API for Kubernetes object resources to establish a watch on the requested ConfigMap, access a key containing a dtab, and use that dtab to resolve routes. Updates to the dtab are reflected in the interpreter. The new interpreter is currently marked as experimental until it can be thoroughly tested in production.

I've added unit tests for getting ConfigMaps to ApiTest and new unit tests for the `ConfigMapInterpreterInitializer` configuration. Additionally, I believe that Kevin has tested this a bit in a live Kubernetes cluster. I've also added documentation on using the new interpreter to `linkerd/docs/interpreters.md`.

### K8s namers watch individual objects, rather than the entire ns (#1587)
Right now, the `io.l5d.k8s` and `io.l5d.k8s.ns` namers setup a watch on the List endpoints API, which streams update events for all of the endpoints in a given namespace. This can potentially lead to receiving a lot of events for endpoints that Linkerd is not trying to actually route to. It can also lead to a memory leak, in the case that Linkerd is watching a namespace with very frequent updates (e.g. `kube-system`/ #1361).

My PR #1527 added the ability to watch individual Kubernetes API objects, rather than watching every object in a namespace. This pull request rewrites the `EndpointsNamer` and `ServiceNamer`s to use the new watch code added in #1527, both for the `/namespaces/{ns}/endpoints/{service}` API endpoint and for the `/namespaces/{ns}/services/{service}` endpoint that we use for resolving numbered port mappings.

The `EndpointsNamer` caches watches on each (namespace, service name) combination and reuses existing watches so that we don't create a whole bunch of threads all watching the same API response. Furthermore, the watch on `.../services/...` is only established if it is needed (i.e. the namer was given a numbered rather than named port). I've also attempted to simplify the caches used in `EndpointsNamer` as much as possible.

Note that I've also made a change to the `Watchable.get()` method signature; it now returns a `Future[Option[G]]` rather than `Future[G]`. The test suite for` ServiceNamer` dictates that an unknown service name should return a negative `NameTree`, _but_ watch the service name in case it is created later. In order to fix this, I modified `Watchable.get()`to return `None` if the requested path returns 404. `Watchable.activity()` still starts the watch if `None` is returned. This required some modification to other existing classes, but appears to not have broken anything.

I've updated the existing unit tests in `EndpointsNamerTest`and `ServiceNamerTest` and confirmed they all work in dev. I've also modified other Kubernetes API tests to reflect the change to `Watchable.get()` I mentioned above.

Closes #1534
Closes #1575
Closes #1361
@hawkw hawkw mentioned this pull request Sep 7, 2017
hawkw added a commit that referenced this pull request Sep 7, 2017
## 1.2.0 2017-09-07

* **Breaking Change**: `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd
  admin, and namerd admin now serve on 127.0.0.1 by default (instead of
  0.0.0.0).
* **Breaking Change**: Removed support for PKCS#1-formatted keys. PKCS#1 formatted keys must be converted to PKCS#8 format.
* Added experimental `io.l5d.dnssrv` namer for DNS SRV records (#1611)
* Kubernetes
  * Added an experimental `io.l5d.k8s.configMap` interpreter for reading dtabs from a Kubernetes ConfigMap (#1603). This interpreter will respond to changes in the ConfigMap, allowing for dynamic dtab updates without the need to run Namerd.
  * Made ingress controller's ingress class annotation configurable (#1584).
  * Fixed an issue where Linkerd would continue routing traffic to endpoints of a service after that service was removed (#1622).
  * Major refactoring and performance improvements to `io.l5d.k8s` and `io.l5d.k8s.ns` namers (#1603).
  * Ingress controller now checks all available ingress resources before using a default backend (#1607).
  * Ingress controller now correctly routes requests with host headers that contain ports (#1607).
* HTTP/2
  * Fixed an issue where long-running H2 streams would eventually hang (#1598).
  * Fixed a memory leak on long-running H2 streams (#1598)
  * Added a user-friendly error message when a HTTP/2 router receives a HTTP/1 request (#1618)
* HTTP/1
  * Removed spurious `ReaderDiscarded` exception logged on HTTP/1 retries (#1609)
* Consul
  * Added support for querying Consul by specific service health states (#1601)
  * Consul namers and Dtab store now fall back to a last known good state on Consul observation errors (#1597)
  * Improved log messages for Consul observation errors (#1597)
* TLS
  * Removed support for PKCS#1 keys (#1590)
  * Added validation to prevent incompatible `disableValidation: true` and `clientAuth` settings in TLS client configurations (#1621)
* Changed `io.l5d.mesh`, `io.l5d.thriftNameInterpreter`, linkerd
  admin, and namerd admin to serve on 127.0.0.1 by default (instead of
  0.0.0.0) (#1366)
* Deprecated `io.l5d.statsd` telemeter.
@obeattie
Copy link
Contributor

Sorry to come so late to this and I'm happy to discuss this elsewhere if it's better for you. I was just looking at upgrading linkerd in our environment to 1.2.0 and saw this change. The "k8s namers watch individual objects, rather than the entire ns" part of this PR concerns me a little. It would be good to understand better what kind of testing has been done on it.

We have ≈300 distinct service deployments running in k8s and we use linkerd to mediate RPC communication between all of them, running as a DaemonSet on each of our ≈250 nodes. If I understand correctly, before this change there would have been 250 watches (one per linkerd), but with this change that will become 300 * 250 = 75,000 unique watches established against the k8s API. Granted, this is a worst case where every linkerd is processing requests bound for every service, but given that linkerd's are long-lived in our platform while services are fairly ephemeral, the number of watches does seem like it would trend toward this. As far as I can tell, once a watch is established for a destination it will never be evicted from the cache.

I've done no testing but this seems like rather a lot of watches which is likely to cause pressure on both the k8s apiservers and linkerd itself. I understand that in the mentioned ticket, linkerd is doing a lot of work processing events and caching name trees for destinations it'll never contact, but for environments in which linkerd's are dealing with requests to a large number of destinations this change seems like it could worsen performance significantly.

Of course, I may be missing something significant here 😇

@adleong
Copy link
Member

adleong commented Sep 22, 2017

👋 Very valid concerns. We considered these factors when doing this refactor and I think performance should still be acceptable for reasons that I will explain. However, this is only based on our intuition and reasoning since we don't have a k8s cluster of this magnitude to test against. If you are able to try this out, any data that you can report back about performance would be extremely helpful.

On the linkerd side, this change went from having linkerd establish a single watch that watched all services to establishing individual watches on the services it needed to route to. Even though this is a larger number of watches and connections, it should be a subset of the data volume and therefore decrease the load on linkerd. Furthermore, linkerd expires idle watches (with a TTL of 10 minutes by default) so it's unlikely that Linkerd will keep watches on every service in the cluster, even if Linkerd is long running.

As for load on the k8s api, namerd will effectively be used as a cache and should only establish 1 watch per service. The Linkerds, in turn, establish watches on namerd.

Hopefully this makes sense. But, as I said, theory and reasoning is no substitute for real world data; so please let us know if this doesn't match the actual behavior of Linkerd in prod.

Tim-Brooks pushed a commit to Tim-Brooks/linkerd that referenced this pull request Dec 20, 2018
If an input file is un-injectable, existing inject behavior is to simply
output a copy of the input.

Introduce a report, printed to stderr, that communicates the end state
of the inject command. Currently this includes checking for hostNetwork
and unsupported resources.

Malformed YAML documents will continue to cause no YAML output, and return
error code 1.

This change also modifies integration tests to handle stdout and stderr separately.

example outputs...

some pods injected, none with host networking:

```
hostNetwork: pods do not use host networking...............................[ok]
supported: at least one resource injected..................................[ok]

Summary: 4 of 8 YAML document(s) injected
  deploy/emoji
  deploy/voting
  deploy/web
  deploy/vote-bot
```

some pods injected, one host networking:

```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/vote-bot uses "hostNetwork: true"
supported: at least one resource injected..................................[ok]

Summary: 3 of 8 YAML document(s) injected
  deploy/emoji
  deploy/voting
  deploy/web
```

no pods injected:

```
hostNetwork: pods do not use host networking...............................[warn] -- deploy/emoji, deploy/voting, deploy/web, deploy/vote-bot use "hostNetwork: true"
supported: at least one resource injected..................................[warn] -- no supported objects found

Summary: 0 of 8 YAML document(s) injected
```

TODO: check for UDP and other init containers

Part of linkerd#1516

Signed-off-by: Andrew Seigner <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

5 participants