Tags: echallenge/akash
Tags
feat(hostname operator): ignore problematic namespaces (akash-network… …#1447) > There are sometimes CRDs from zombie or otherwise left over deployments that cause an error. Right now the hostname operator spins on that error. > > The goal of this PR is to allow the hostname operator to move past that error by eventually just ignoring the problematic CRDs. They still should be removed but for now this is probably a best case workaround > > **Changes** > > 1. Detect any errors correlating to a resource in kubernetes not found and track the count of them > 2. Once the threshold is reached (3 by default) ignore subsequent events from those leases, they are probably a zombie > 3. Ignored leases are tracked in a list that is pruned if it gets too large > 4. Make all values that make sense to be configurable via flags, but should probably be set environmental variables > 5. Add a read only HTTP interface > 6. Add an endpoint for getting the state of the managed hostnames > 7. Add an endpoint for getting the state of the ignored leases > > The only errors that are tracked are those corresponding to a resource not being found, this is usually a missing namespace in production. General errors such as network connectivity, etc. aren't considered permanent and could result in all CRDs being ignored which is bad. > > The ignored leases list is trimmed periodically if it is too large. By default the limit is over 100k entries, so hopefully we don't have many zombies running around. The pruning process tries to get rid of old entries first but always make sure to trim it down below the configured value. Otherwise an out of memory condition could happen. > > The HTTP listener is setup to just have two endpoints for exposing data that is already there. We have no real way presently other than using the logs to figure out what the operator is doing. This should help with that. It also gives the namespace in k8s that would be associated with any potential entry, to aid in debugging. > > The HTTP responses are not created dynamically. To avoid the use of any locks, the operator just periodically re-renders the responses if the data has changed. This avoids having to introduce any locks around the data and is probably more performant than rendering on each request anyways. The default interval for the speed at which the data is updated is 5 seconds. The `Last-Modified` header on the response indicates whenever the data was actually created, just to be clear to anyone looking at it. > > I updated the deployment stuff in `_docs` for the hostname operator to make sure the Ingress object exposes the new endpoint as intended.
feat(cli): provider get-cluster-ns command (akash-network#1452) fixes #1451
fix(provider): correct withdrawal period after 1st iteration
fix(provider): Count ports for a service only in reservations
fix(provider): prevent manifest watchdog from blocking on stop
feat: persistent storage Signed-off-by: Artur Troian <[email protected]>
PreviousNext