-
Notifications
You must be signed in to change notification settings - Fork 41.4k
stop logging killing connection/stream because serving request timed out and response had been started #95002
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop logging killing connection/stream because serving request timed out and response had been started #95002
Conversation
/lgtm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we should at least log why we dropped the request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if errors.Is(http.ErrAbortHandler) {
err := recover()
klog.Error(err)
panic(http.ErrAbortHandler)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from golang docs To abort a handler so the client sees an interrupted response but the server doesn't log an error, panic with the value ErrAbortHandler
maybe we should actually throw ErrAbortHandler
instead of errConnKilled
?
that would be seen as timeout_test.go:220: Get "http://127.0.0.1:61805": EOF
on the client side.
7f11683
to
06643be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
panicking with http.ErrAbortHandler
(well any panic would do) will actually close the underlying connection.
Previously errConnKilled
was captured, logged and the connection was left intact since calling WriteHeader
multiple times is not supported by the std library.
I think that before closing the connection (before panicking) we could actually try to send a message to the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that before closing the connection (before panicking) we could actually try to send a message to the client.
Even if we did, the msg would be ignored by the client (golang) library On error, any Response can be ignored
https://golang.org/src/net/http/client.go?s=20422:20474#L575
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where did we catch errConnKilled ?
06643be
to
1cc75f8
Compare
/assign @lavalamp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a metric, or a rate limit on this log message like that in https://github.com/kubernetes/kubernetes/pull/88600/files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for metric.
What do we have now for errors like EOF or tls errors with --v=2
? They are also visible, aren't they? A warning here is definitely too heavy. We won't warn about EOFs either, will we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What will we see in --v=2
for timeouts in general?
/approve I think my preference order is: metric, rate-limited log message, no log message, log message |
staging/src/k8s.io/apiserver/pkg/server/filters/timeout_test.go
Outdated
Show resolved
Hide resolved
staging/src/k8s.io/apiserver/pkg/server/filters/timeout_test.go
Outdated
Show resolved
Hide resolved
40761c2
to
5eeda83
Compare
/retest |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lavalamp, p0lyn0mial The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: due to a timeout, for each group ...
Otherwise it reads that it timed out for every group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...., but we want to supress a not helpful stacktrace in the logs.
3be9663
to
9001d67
Compare
/test pull-kubernetes-bazel-test |
Aborted requests are the ones that were disrupted with http.ErrAbortHandler. For example, the timeout handler will panic with http.ErrAbortHandler when a response to the client has been already sent and the timeout elapsed. Additionally, a new metric requestAbortsTotal was defined to count aborted requests. The new metric allows for aggregation for each group, version, verb, resource, subresource and scope.
9001d67
to
057986e
Compare
metrics.RecordRequestAbort(req, nil) | ||
return | ||
} | ||
metrics.RecordRequestAbort(req, info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see a rate limited log message about this. We rate limit runtime.HandleError by default, so I'll add that here as a followup because @p0lyn0mial is in Europe and on vacation this week.
/lgtm |
What type of PR is this?
/kind cleanup
What this PR does / why we need it: it stops persisting the following error in the server's log:
From now on the timeout handler will panic with
http.ErrAbortHandler
when a response to the client has been already sent and the timeout elapsed. As a result, the connection will be forcefully closedEOF
and we will recordrequestAbortsTotal
.Previously the stack trace was persisted in the logs which caused confusion and false alarms.
Additionally, a new metric
requestAbortsTotal
was defined to count aborted requests. The new metric allows for aggregation for eachgroup
,version
,verb
,resource
,subresource
andscope
.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer: Initially I wanted to use
errors.Is and errors.As
but that would require to have an additional function anyway as we need to casterr interface{}
first. Other than that that would require couplingerrConnKilled
withhttp.ErrAbortHandler
which I didn't like.Does this PR introduce a user-facing change?:
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: