-
Notifications
You must be signed in to change notification settings - Fork 41.5k
Closed
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.
Description
Running e2e tests on GCE, I saw a failure on "should provide DNS for the cluster". The test reported this error:
INFO: event for dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf: {scheduler } scheduled: Successfully assigned dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf to e2e-test-justinsb-minion-gbm3
INFO: event for dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf: {kubelet e2e-test-justinsb-minion-gbm3} pulled: Successfully pulled image "gcr.io/google_containers/pause:0.8.0"
INFO: event for dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf: {kubelet e2e-test-justinsb-minion-gbm3} created: Created with docker id 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320
INFO: event for dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf: {kubelet e2e-test-justinsb-minion-gbm3} failed: Failed to start with docker id 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320 with error: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
INFO: event for dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf: {kubelet e2e-test-justinsb-minion-gbm3} failedSync: Error syncing pod, skipping: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
...
This happened when dns was run immediately after "ResizeNodes / should be able to delete nodes."
I SSHed in to the minion, and saw that kubelet had restarted, but had this error around this time in /var/log/kubelet.log:
I0524 15:40:54.847548 3009 manager.go:230] Starting recovery of all containers
I0524 15:40:54.851445 3009 manager.go:235] Recovery completed
I0524 15:40:54.853060 3009 status_manager.go:56] Starting to sync pod status with apiserver
I0524 15:40:54.853078 3009 kubelet.go:1596] Starting kubelet main sync loop.
E0524 15:40:54.859528 3009 kubelet.go:1518] error getting node: node e2e-test-justinsb-minion-gbm3 not found
E0524 15:40:54.866972 3009 kubelet.go:2089] Cannot get host IP: cannot get node: node e2e-test-justinsb-minion-gbm3 not found
I0524 15:40:54.866995 3009 manager.go:1347] Need to restart pod infra container for "fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3_default" because it is not found
I0524 15:40:54.868450 3009 provider.go:91] Refreshing cache for provider: *credentialprovider.defaultDockerConfigProvider
I0524 15:40:54.868585 3009 provider.go:91] Refreshing cache for provider: *gcp_credentials.dockerConfigKeyProvider
I0524 15:40:54.869219 3009 config.go:119] body of failing http response: &{0xc208391140 {0 0} false <nil> 0x5ca9a0 0x5ca930}
E0524 15:40:54.869256 3009 metadata.go:109] while reading 'google-dockercfg' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg
I0524 15:40:54.869274 3009 provider.go:91] Refreshing cache for provider: *gcp_credentials.dockerConfigUrlKeyProvider
I0524 15:40:54.871380 3009 config.go:119] body of failing http response: &{0xc2083913c0 {0 0} false <nil> 0x5ca9a0 0x5ca930}
E0524 15:40:54.871408 3009 metadata.go:121] while reading 'google-dockercfg-url' metadata: http status code: 404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-dockercfg-url
I0524 15:40:55.146249 3009 kubelet.go:1779] Recording NodeReady event message for node e2e-test-justinsb-minion-gbm3
I0524 15:40:55.146291 3009 kubelet.go:731] Attempting to register node e2e-test-justinsb-minion-gbm3
I0524 15:40:55.146353 3009 event.go:203] Event(api.ObjectReference{Kind:"Node", Namespace:"", Name:"e2e-test-justinsb-minion-gbm3", UID:"e2e-test-justinsb-minion-gbm3", APIVersion:"", ResourceVersion:"", FieldPath:""}): reason: 'NodeReady' Node e2e-test-justinsb-minion-gbm3 status is now: NodeReady
I0524 15:40:55.175255 3009 kubelet.go:751] Successfully registered node e2e-test-justinsb-minion-gbm3
I0524 15:40:55.175270 3009 kubelet.go:764] Starting node status updates
I0524 15:40:55.516960 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3", UID:"66cf140c101765011818758029a443b7", APIVersion:"v1beta3", ResourceVersion:"", FieldPath:"implicitly required container POD"}): reason: 'pulled' Successfully pulled image "gcr.io/google_containers/pause:0.8.0"
I0524 15:40:55.602978 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3", UID:"66cf140c101765011818758029a443b7", APIVersion:"v1beta3", ResourceVersion:"", FieldPath:"implicitly required container POD"}): reason: 'created' Created with docker id 434e69d5640632250ec29a1565daa2a3740664eaae36f339907c2b2038cc1fcc
I0524 15:40:55.750577 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3", UID:"66cf140c101765011818758029a443b7", APIVersion:"v1beta3", ResourceVersion:"", FieldPath:"implicitly required container POD"}): reason: 'started' Started with docker id 434e69d5640632250ec29a1565daa2a3740664eaae36f339907c2b2038cc1fcc
I0524 15:41:01.785023 3009 server.go:588] POST /stats/container/: (3.123839ms) 0 [[Go 1.1 package http] 10.245.1.5:50852]
I0524 15:41:02.206085 3009 manager.go:1347] Need to restart pod infra container for "dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf_e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24" because it is not found
I0524 15:41:02.208733 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24", Name:"dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf", UID:"481fc3ad-022b-11e5-b444-42010af0abb3", APIVersion:"v1beta3", ResourceVersion:"11857", FieldPath:"implicitly required container POD"}): reason: 'pulled' Successfully pulled image "gcr.io/google_containers/pause:0.8.0"
I0524 15:41:02.303923 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24", Name:"dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf", UID:"481fc3ad-022b-11e5-b444-42010af0abb3", APIVersion:"v1beta3", ResourceVersion:"11857", FieldPath:"implicitly required container POD"}): reason: 'created' Created with docker id 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320
E0524 15:41:02.343079 3009 manager.go:1515] Failed to create pod infra container: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
; Skipping pod "dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf_e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24"
I0524 15:41:02.343306 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24", Name:"dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf", UID:"481fc3ad-022b-11e5-b444-42010af0abb3", APIVersion:"v1beta3", ResourceVersion:"11857", FieldPath:"implicitly required container POD"}): reason: 'failed' Failed to start with docker id 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320 with error: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
E0524 15:41:02.347866 3009 pod_workers.go:108] Error syncing pod 481fc3ad-022b-11e5-b444-42010af0abb3, skipping: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
I0524 15:41:02.347955 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"e2e-tests-dns-227bd4b9-c484-4584-8875-512747f44b24", Name:"dns-test-4c6fa09e-022b-11e5-aab4-00224d56fdcf", UID:"481fc3ad-022b-11e5-b444-42010af0abb3", APIVersion:"v1beta3", ResourceVersion:"11857", FieldPath:""}): reason: 'failedSync' Error syncing pod, skipping: API error (500): Cannot start container 7a4b23f8a0ca05b35c284a7e606f5c378792567b8ffcac756bf3ff67fc895320: no available ip addresses on network
I0524 15:41:05.262557 3009 container_bridge.go:32] Attempting to recreate cbr0 with address range: 10.245.0.1/24
I0524 15:41:05.337978 3009 container_bridge.go:62] Recreated cbr0 and restarted docker
W0524 15:41:11.461827 3009 manager.go:1527] Failed to pull image "gcr.io/google_containers/fluentd-elasticsearch:1.5" from pod "fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3_default" and container "fluentd-elasticsearch": [unexpected EOF, dial unix /var/run/docker.sock: no such file or directory]
I0524 15:41:11.461855 3009 event.go:203] Event(api.ObjectReference{Kind:"Pod", Namespace:"default", Name:"fluentd-elasticsearch-e2e-test-justinsb-minion-gbm3", UID:"66cf140c101765011818758029a443b7", APIVersion:"v1beta3", ResourceVersion:"", FieldPath:"spec.containers{fluentd-elasticsearch}"}): reason: 'failed' Failed to pull image "gcr.io/google_containers/fluentd-elasticsearch:1.5": [unexpected EOF, dial unix /var/run/docker.sock: no such file or directory]
Things I think are suspicious:
404 while fetching url http://metadata.google.internal./computeMetadata/v1/instance/attributes/google-doc kercfg
- It does somehow figure out that it should
recreate cbr0 with address range: 10.245.0.1/24
, but it then spews a lot of errors while Docker restarts (unable to reachdocker.sock
) - I feel we probably shouldn't start Docker at all until it has a valid cbr0
Metadata
Metadata
Assignees
Labels
kind/bugCategorizes issue or PR as related to a bug.Categorizes issue or PR as related to a bug.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.Must be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/nodeCategorizes an issue or PR as relevant to SIG Node.Categorizes an issue or PR as relevant to SIG Node.