Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

sttts
Copy link
Contributor

@sttts sttts commented Jun 26, 2015

Before this PR the endpoint IP was used to identify endpoint addresses. This
leads to wrong unification of endpoints of different pods having the same IP (e.g.
non container IP in case of Mesos). This PR takes the EndpointAddress.targetRef.UID
into consideration instead of the IP.

In combination with the followup PR mesosphere-backup#41 this fixes the following e2e tests for KUBERNETES_PROVIDER=mesos/docker:

@sttts
Copy link
Contributor Author

sttts commented Jun 26, 2015

/cc @jdef

@k8s-bot
Copy link

k8s-bot commented Jun 26, 2015

Can one of the admins verify that this patch is reasonable to test? (reply "ok to test", or if you trust the user, reply "add to whitelist")

If this message is too spammy, please complain to ixdy.

@jdef
Copy link
Contributor

jdef commented Jun 26, 2015

/cc @thockin

@jdef jdef added area/platform/mesos sig/network Categorizes an issue or PR as relevant to SIG Network. labels Jun 26, 2015
@jdef
Copy link
Contributor

jdef commented Jun 26, 2015

There's a documented known k8s-mesos issue that requires users to name service ports. Related?

@sttts
Copy link
Contributor Author

sttts commented Jun 26, 2015

@jdef it's related. This PR does two things:

  1. it fixes the e2e endpoints tests to pass with this documented, special behaviour of k8sm
  2. it fixes the endpoint subset repacking algorithm used by the endpoint controller and the apiserver for situations where endpoints with the same IP belong to different pods

@karlkfi
Copy link
Contributor

karlkfi commented Jun 26, 2015

Is this dependent on #10049?

@sttts
Copy link
Contributor Author

sttts commented Jun 26, 2015

No dependency on #10049.

@jdef
Copy link
Contributor

jdef commented Jun 26, 2015

@thockin
Copy link
Member

thockin commented Jun 27, 2015

Can you show me an example input that produces wrong output and explain? I am terrified of touching this code, given the bugs we had in crafting it in the first place.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this batch of changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If endpoints for a pod do not inherit container IP and container port, we have to use something like the service port name to find the correct endpoints for the test. That's why I changed the port into the port name and adapted the test condition accordingly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure I understand - is this because the port numbers in the pod.spec.containers[].ports[].port are not th actual port numbers exposed by endpoints (because of your host port remapping?)

This is a pretty significant change to this test and likely to get broken - most ports are NOT named, especially in tests - can you comment this requirement near the place we create pods in this test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The endpoint might have a different IP than the container IP and a different port than the service port. The change above uses named port in order to get the matching right in the test between the service and the endpoint b/c this is verified in the test. And because the port is not suitable anymore for the matching, we need these names as an alternative.

It's important to note that this change is only necessary for these very tests which check the existence of correct endpoints. Of course other services in different tests don't need named ports.

I will add proper comments.

@sttts
Copy link
Contributor Author

sttts commented Jun 27, 2015

@thockin an example of input for RepackSubsets which fails:

[
  {
    "Addresses": [
      { "IP": "192.168.0.1", "TargetRef": { "Kind": "Pod", "Name": "pod1", "UID": "1" } },
    ],
    "Ports": [ { "Name": "http", "Port": 8080, "Protocol": "TCP" } ]
  },  {
    "Addresses": [
      { "IP": "192.168.0.1", "TargetRef": { "Kind": "Pod", "Name": "pod2", "UID": "2" } },
    ],
    "Ports": [ { "Name": "redis", "Port": 6379, "Protocol": "TCP" } ]
  }
]

Before this PR this is repacked to

[
  {
    "Addresses": [
      { "IP": "192.168.0.1", "TargetRef": { "Kind": "Pod", "Name": "pod1", "UID": "1" } },
    ],
    "Ports": [ 
      { "Name": "http", "Port": 8080, "Protocol": "TCP" },
      { "Name": "redis", "Port": 6379, "Protocol": "TCP" }
    ]
  }
]

In the endpoint e2e tests (in service.go) log output you will see

STEP: Unexpected number of endpoints: found map[podname1:[100, 101, 100]], expected map[podname1:[100], podname2: [101], podname3: [100]] (ignoring for 1 second)

With this PR the UID is assumed to be different if the pod is different and taken into consideration when identifying addresses and when sorting them.

@thockin
Copy link
Member

thockin commented Jun 29, 2015

I see the problem now, of course.

@sttts
Copy link
Contributor Author

sttts commented Jun 29, 2015

@thockin can you send me or paste the shippable output which fails?

@jdef
Copy link
Contributor

jdef commented Jun 29, 2015

!!! 'gofmt -s' needs to be run on the following files:
./pkg/api/endpoints/util_test.go
./pkg/api/endpoints/util.go
./test/e2e/service.go

On Mon, Jun 29, 2015 at 6:53 AM, Dr. Stefan Schimanski <
[email protected]> wrote:

@thockin https://github.com/thockin can you send me or paste the
shippable output which fails?


Reply to this email directly or view it on GitHub
#10390 (comment)
.

@sttts
Copy link
Contributor Author

sttts commented Jun 29, 2015

Fixed the gofmt issues.

@thockin
Copy link
Member

thockin commented Jun 30, 2015

LGTM

@thockin
Copy link
Member

thockin commented Jun 30, 2015

Shippable says:

I0629 22:47:15.651947 32268 integration.go:538] Starting to update (foo, bar)
I0629 22:46:14.853330 32268 integration.go:553] Conflict: (d, w)
I0629 22:47:15.652037 32268 integration.go:538] Starting to update (d, w)
I0629 22:47:15.652050 32268 factory.go:312] Attempting to bind nginx-controller-a2vtz to 127.0.0.1
I0629 22:46:14.853678 32268 controller_utils.go:259] Controller nginx-controller created pod nginx-controller-jmgnm
I0629 22:46:14.855566 32268 endpoints_controller.go:258] Finished syncing service "default/atomicservice" endpoints. (2.527722ms)
F0629 22:46:17.683250 32268 integration.go:803] FAILED: unexpected endpoints: timed out waiting for the condition
I0629 22:46:19.846107 32268 nodecontroller.go:291] Nodes ReadyCondition updated. Updating timestamp: {Capacity:map[memory:{Amount:0 Format:DecimalSI} pods:{Amount:32.000 Format:DecimalSI} cpu:{Amount:0 Format:DecimalSI}] Phase: Conditions:[{Type:Ready Status:True LastHeartbeatTime:2015-06-29 22:44:52 +0000 UTC LastTransitionTime:2015-06-29 22:44:52 +0000 UTC Reason:kubelet is posting ready status Message:}] Addresses:[{Type:LegacyHostIP Address:127.0.0.1}] NodeInfo:{MachineID: SystemUUID: BootID: KernelVersion: OsImage: ContainerRuntimeVersion:docker:// KubeletVersion:v0.20.0-145-gd1f50c4d555003 KubeProxyVersion:v0.20.0-145-gd1f50c4d555003}}
vs {Capacity:map[pods:{Amount:32.000 Format:DecimalSI} cpu:{Amount:0 Format:DecimalSI} memory:{Amount:0 Format:DecimalSI}] Phase: Conditions:[{Type:Ready Status:True LastHeartbeatTime:2015-06-29 22:46:18 +0000 UTC LastTransitionTime:2015-06-29 22:44:52 +0000 UTC Reason:kubelet is posting ready status Message:}] Addresses:[{Type:LegacyHostIP Address:127.0.0.1}] NodeInfo:{MachineID: SystemUUID: BootID: KernelVersion: OsImage: ContainerRuntimeVersion:docker:// KubeletVersion:v0.20.0-145-gd1f50c4d555003 KubeProxyVersion:v0.20.0-145-gd1f50c4d555003}}.
!!! Error in ./hack/test-integration.sh:49
Call stack:
I0629 22:47:15.652459 32268 controller_utils.go:322] Updating replica count for rc: nginx-controller, 0->0 (need 2), sequence No: 0->1
'"${KUBE_OUTPUT_HOSTBIN}/integration" --v=${LOG_LEVEL} --api-version="$1" --max-concurrency="${KUBE_INTEGRATION_TEST_MAX_CONCURRENCY}"' exited with status 255
1: ./hack/test-integration.sh:49 runTests(...)
2: ./hack/test-integration.sh:63 main(...)
Exiting with status 1

@sttts sttts force-pushed the non-unique-endpoint-ip branch 2 times, most recently from c8f22d7 to a50e228 Compare June 30, 2015 14:52
@sttts sttts changed the title WIP: Don't wrongly identify endpoint addresses only due to equal IP Don't wrongly identify endpoint addresses only due to equal IP Jun 30, 2015
@sttts sttts force-pushed the non-unique-endpoint-ip branch 2 times, most recently from 2df3501 to 6d73a04 Compare June 30, 2015 16:17
@zmerlynn
Copy link
Member

zmerlynn commented Jul 1, 2015

Assigning to @thockin to sequester to post-v1 or review.

@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@sttts sttts force-pushed the non-unique-endpoint-ip branch from ca65323 to 1a099e2 Compare July 27, 2015 07:04
@k8s-bot
Copy link

k8s-bot commented Jul 27, 2015

GCE e2e build/test failed for commit 1a099e2ebe322c26bbafe39b4b2b6bed3aa1aadc.

@k8s-bot
Copy link

k8s-bot commented Jul 27, 2015

GCE e2e build/test failed for commit d3131eb56b40540240bdeb8edfffcb7202c11cd0.

@sttts
Copy link
Contributor Author

sttts commented Jul 30, 2015

@thockin could you paste the failing e2e test here?

@k8s-bot
Copy link

k8s-bot commented Aug 3, 2015

GCE e2e build/test failed for commit 1a08e4b0613b379a58368b2b9da199efa80815ff.

@sttts sttts force-pushed the non-unique-endpoint-ip branch from 1a08e4b to 6da82a9 Compare August 3, 2015 10:48
@k8s-bot
Copy link

k8s-bot commented Aug 3, 2015

GCE e2e build/test failed for commit 6da82a94d6d59304c2f4e77f3f9c8a41c2b1cb1c.

@gmarek
Copy link
Contributor

gmarek commented Aug 3, 2015

@k8s-bot test this please

@k8s-bot
Copy link

k8s-bot commented Aug 3, 2015

GCE e2e build/test failed for commit 6da82a94d6d59304c2f4e77f3f9c8a41c2b1cb1c.

@sttts
Copy link
Contributor Author

sttts commented Aug 4, 2015

I removed everything not directly related to the actual IP->UID change in the endpoint repacking algorithm.

Everything related to fixing the e2e tests is moved to the mesosphere-backup#41 PR.

@k8s-bot
Copy link

k8s-bot commented Aug 4, 2015

GCE e2e build/test passed for commit 82b4c114c5c0cce3d5e3e003e3576c007507cc41.

@sttts
Copy link
Contributor Author

sttts commented Aug 6, 2015

@jdef @karlkfi ptal

@jdef
Copy link
Contributor

jdef commented Aug 7, 2015

lgtm

@thockin, does this still look good?

@thockin
Copy link
Member

thockin commented Aug 7, 2015

needs rebase

@thockin thockin added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 7, 2015
sttts added 2 commits August 7, 2015 08:31
Before this patch the endpoint IP was used to identify endpoint addresses. This
leads to wrong unification of endpoints of different pods having the same IP (e.g.
non container IP in case of Mesos). This patch takes the EndpointAddress.targetRef.UID
into consideration as well.
@sttts sttts force-pushed the non-unique-endpoint-ip branch from 82b4c11 to 79e54c2 Compare August 7, 2015 06:35
@k8s-bot
Copy link

k8s-bot commented Aug 7, 2015

GCE e2e build/test passed for commit 79e54c2.

@sttts
Copy link
Contributor Author

sttts commented Aug 7, 2015

Rebased.

satnam6502 added a commit that referenced this pull request Aug 7, 2015
Don't wrongly identify endpoint addresses only due to equal IP
@satnam6502 satnam6502 merged commit fbb5ce6 into kubernetes:master Aug 7, 2015
@jdef jdef added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 14, 2015
@sttts sttts deleted the non-unique-endpoint-ip branch September 14, 2015 09:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/platform/mesos lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants