Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@ovidiutirla
Copy link
Contributor

@ovidiutirla ovidiutirla commented Jun 25, 2024

Add EnqueueTimeTracker and CIDDeletionTracker structures to manage enqueuing times and track CID deletion marks. This feature will be used by Operator Managing CIDs.

EnqueueTimeTracker is only used for metrics to meter the duration from enqueuing the reconciliation until the CID reconciliation is completed.
CIDDeletionTracker is used to handle CID deletion when the CID is no longer needed, this is only used by operator. The deletion tracker allows us to implement a cidDeleteDelay which is the delay to enqueue another CID event to be reconciled after CID is marked for deletion. This is required for simultaneous CID management by both cilium-operator and cilium-agent. Without the delay, operator might immediately clean up CIDs created by agent, before agent can finish CEP creation. The deletion tracker is not used in cilium-agent and is only used to enforce the delay for deletion by operator.

Related CFP #27752

Draft full implementation: #33204

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 25, 2024
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 0557b1c to b2c6736 Compare June 25, 2024 13:06
@ovidiutirla ovidiutirla marked this pull request as ready for review June 25, 2024 13:22
@ovidiutirla ovidiutirla requested review from a team as code owners June 25, 2024 13:22
@ovidiutirla ovidiutirla requested review from asauber and pippolo84 June 25, 2024 13:22
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from b2c6736 to 67b62bc Compare June 25, 2024 13:25
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 67b62bc to 0d299d2 Compare June 25, 2024 19:22
@ovidiutirla
Copy link
Contributor Author

/test

@pchaigno pchaigno added the release-note/misc This PR makes changes that have no direct user impact. label Jun 25, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Jun 25, 2024
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 0d299d2 to 81ddfbb Compare June 26, 2024 09:18
@pchaigno pchaigno requested a review from asauber June 27, 2024 14:09
@joestringer
Copy link
Member

Could you update the description to describe how this time tracker fits into the design, ie expand the following sentence to explain not just what but how it integrates into the broader solution?

This feature will be used by Operator Managing CIDs.

@ovidiutirla
Copy link
Contributor Author

ovidiutirla commented Jun 27, 2024

I assume Andrew had similar concern with CIDDeletionTracker, it shouldn't impact the NP enforcement as the NP is done on agent level while the tracker is used to add a configurable delay for CID clean-up.

Hey @joestringer , thanks for the feedback. Added the following to the PR description:

EnqueueTimeTracker is only used for metrics to meter the duration from enqueuing the reconciliation until the CID reconciliation is completed.
CIDDeletionTracker is used to handle CID deletion when the CID is no longer needed, this is only used by operator. The deletion tracker allows us to implement a cidDeleteDelay which is the delay to enqueue another CID event to be reconciled after CID is marked for deletion. This is required for simultaneous CID management by both cilium-operator and cilium-agent. Without the delay, operator might immediately clean up CIDs created by agent, before agent can finish CEP creation. The deletion tracker is not used in cilium-agent and is only used to enforce the delay for deletion by operator.

@joestringer
Copy link
Member

@ovidiutirla I still don't quite follow.

EnqueueTimeTracker is only used for metrics to meter the duration from enqueuing the reconciliation until the CID reconciliation is completed.

Which metrics? What does it mean to reconcile a CID?

@asauber
Copy link
Member

asauber commented Jun 27, 2024

In the draft implementation, there is a 30 second delay added to any CID deletion-via-operator here https://github.com/cilium/cilium/pull/33204/files#diff-a5b01710162b5dcb0c820096bd611ded7a00e14a8a60ebc382466ce30d9436caR249-R253

To me, this appears to introduce an [at least] 30 second delay between the operator first processing a delete event and the policy of that delete event being enforced by Agents. I do see the point that even if strong distributed concurrency were in place for CIDs, that would need to be coupled somehow with CEP operations.

@asauber
Copy link
Member

asauber commented Jun 27, 2024

Apologies for spam. The relevant diff is not linkable on GitHub due to the diff being collapsed by default.

In the draft implementation, there is a 30 second delay added to any CID deletion-via-operator here https://github.com/cilium/cilium/pull/33204/files#diff-a5b01710162b5dcb0c820096bd611ded7a00e14a8a60ebc382466ce30d9436caR249-R253

if !isMarked {
r.cidDeletionTracker.Mark(cidName)
r.queueOps.enqueueCIDReconciliation(cidResourceKey(cidName), cidDeleteDelay)
return nil
}

@ovidiutirla
Copy link
Contributor Author

The cidDeletionTracker is Marked in handleCIDDeletion (handleCIDDeletion marks or deletes already marked CID)
handleCIDDeletion is called only when:

  • CID only exists in the watcher's store, and it isn't used. (we could re-write that if statements to improve clarity though)
    cidIsUsed := r.cidIsUsedInPods(cidName) || r.cidIsUsedInCEPOrCES(cidName)
    if !existsInDesiredState {
    if cidIsUsed {
    return nil
    }
    r.cidCreateLock.Lock()
    defer r.cidCreateLock.Unlock()
    return r.handleCIDDeletion(cidName)
    }
    if !cidIsUsed {
    if existsInStore {
    r.cidCreateLock.Lock()
    defer r.cidCreateLock.Unlock()
    return r.handleCIDDeletion(cidName)
    }
    r.desiredCIDState.Remove(cidName)
    return nil
    }

So the time.Now() is only used to enforce the 30s delay for the CID deletion, and we are not relying on the exact system time, we are just enforcing the delay. But indeed, yes, once a CID is deleted we mark it for deletion and only after the delay we propagate the CID deletion(of the unused CID) to all agents.

Similarly, in the current implementation, we keep the CID for ~30 minutes (2 * defaults.KVstoreLeaseTTL) until we delete it,

if !igc.heartbeatStore.isAlive(identity.Name) {
ts, ok := identity.Annotations[identitybackend.HeartBeatAnnotation]
if !ok {
log.WithFields(logrus.Fields{
logfields.Identity: identity.Name,
logfields.K8sUID: identity.UID,
}).Info("Marking identity for later deletion")
// Deep copy so we get a version we are allowed to update
identity = identity.DeepCopy()
if identity.Annotations == nil {
identity.Annotations = make(map[string]string)
}
identity.Annotations[identitybackend.HeartBeatAnnotation] = timeNow.Format(time.RFC3339Nano)
if err := igc.updateIdentity(ctx, identity); err != nil {
log.WithError(err).
WithField(logfields.Identity, identity).
Error("Marking identity for later deletion")
return err
}
continue
}
log.WithFields(logrus.Fields{
logfields.Identity: identity,
}).Debugf("Deleting unused identity; marked for deletion at %s", ts)
err := igc.deleteIdentity(ctx, identity)
if err != nil {

I might not have a full understanding of your concerns, just wondering if this brought some clarity on how we plan to use it? I'm happy to walk you through our approach or if you have any better ideas on how to handle this part we are happy to adapt our solution.

To reconcile a CID I mean that we ensure that the desired state for the CID is reached,

// reconcileCID ensures that the desired state for the CID is reached, by
// comparing the CID in desired state cache and watcher's store and doing one of
// the following:
// 1. Nothing - If CID doesn't exist in both desired state cache and watcher's
// store.
// 2. Deletes CID - If CID only exists in the watcher's store, and it isn't used.
// 3. Creates CID - If CID only exists in the desired state cache.
// 4. Updates CID - If CIDs in the desired state cache and watcher's store are
// not the same.
func (r *reconciler) reconcileCID(cidResourceKey resource.Key) error {

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I've left some questions around some changes that were previously introduced and reviewed in #30649.
Also, some suggestions around the structure of the unit tests.

Comment on lines 338 to 340
// CIDDeletionTracker tracks which CIDs are marked for deletion.
// This is required for simultaneous CID management
// by both cilium-operator and cilium-agent.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to expand this comment explaining why is this required? It would be nice to detail the expected sequence of events that proves the tracker is needed when both operator and agent manage CIDs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry but I'm not following the comment.

AFAIU we need the delay (and thus the marking and deletion-tracking) to avoid scenarios where the agent created the CID but it does not have updated the CEP with the CID yet. If the operator sees the new CID and no CEP actually using it, it might be too aggressive and delete the CID. This is supposed to happen more frequently in case of high pod churn (I guess because this increases the delay between the operator receiving the CID creation event and the operator receiving the CEP update event).
Is my understanding correct? Either way, I suggest to rephrase the comment to be more clear and descriptive.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It address two main concerns:

  1. It facilitates CID reuse in scenarios with high pod churn. We are not deleting the CID and we are able to re-use it.
  2. Ensure correct CID association in scenarios where the agent rapidly receives create, delete events.
    e.g. we have 1 CID used by one pod, and we churn this pod (agents quickly receives create, delete CID events). If the agent receives the delete CID, schedules the CEP deletion, then a new create event comes, sees that the CEP is already existing, the scheduled CEP event deletes the CEP.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pippolo84, added more details in the comment. I think once we have the full implementation it will provide enough context.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also consider the delayed deletion is useful in high pod churn scenarios as we do not have to delete and re-create the same CID over and over..

sure, but that's already implemented in the current GC.

Copy link
Contributor Author

@ovidiutirla ovidiutirla Jul 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the docs.

@dlapcevic do you have any thoughts around the existing GC and the one proposed to be in operator managing CIDs?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to sum it up and let me know if my understanding is correct so @joestringer and @dlapcevic doesn't have to read all of that conversation 😅

We are considering how GC behaves in two different cases:
(1) During the transition period from cilium-agent creating CIDs to operator creating CIDs when both operator and agent can create CIDs
(2) How GC behaves once operators is fully managing CIDs

Let's think about the current GC and how it would behave:
(1) if CID was ever used, it will be marked for deletion after 30 minutes of not being used with annotation and deleted after another 15 minutes - this already addresses your concern with high-churn pods.
If CID was never used by any CEP, it will mark it for deletion and delete it after 15 minutes.
Sounds reasonable both for identities created by the operator and the agent.
CIDs created by the agent and not yet used by any CEPs won't be deleted immediately.

(2) It will still work the same, but adding an annotation mark for deletion won't make much sense as the agent won't be able to remove it anymore.
So proposed switch from deletion mark annotation to DeletionTracker is a nice optimization in terms of API calls, but not necessary for it to work correctly. It should be fairly easy to implement it in the current GC though.

It could probably even be instantly enabled ( deletion mark annotation -> DeletionTracker) when switching from agent creating CIDs to operator creating CIDs during migration and the current GC would still behave correctly in both cases (1) and (2), but would not react to deletion mark annotation, but used fixed delay for waiting on CEP.

Of course interval/timeouts (30/15 minutes) can be already changed IIRC, I was just using defaults as examples.

So the question is, what would be the purpose of implementing a new GC, instead of implementing this improvement to the current GC, which seems significantly easier?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, GC is part of CID management, and there is no reason for it to be separated from the CID controller. The CID controller anyway needs to know the desired state of CIDs, when CIDs are used and when not, and adding deletion to it is not a complex effort.

Beside that I see a few benefits:

  1. Performance and scalability
    a) Smaller number of calls to kube-apiserver without adding and removing the mark. Remember: every update needs to be sent to all nodes.
    b) Processing all CIDs at one time (each GC run) instead of continuously, creates a spike of requests that can significantly affect KCP and cilium-operator's k8s client’s rate limiting.
  2. Ease of use
    No need to configure GC.
  3. Security
    Remove write permission for CIDs from cilium-agent.
    When cilium-agent is not managing CIDs, it shouldn’t update CIDs, so it won’t need write permission to CIDs.

We had issues related to these points.
We also had issues where cilium-operator was having connection instability with kube-apiserver so it was restarting every 15-20 minutes (or leader was changed), and in this case we ended up having no CIDs cleaned up and eventually hitting the 65k CID limit.

Copy link
Member

@joestringer joestringer Jul 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this topic would be a lot more approachable with a more incremental approach. Use the current GC first, propose improvements to the current GC implementation and push on that independently. Then propose the PR that refactors/removes the current GC implementation to merge it into this CID controller. Each of those steps would be self-contained, provide value, and could be individually assessed for their benefits and drawbacks.

If we were looking at a PR that was titled "Improve cilium-operator Identity garbage collection" with the PR description containing the text in the post immediately above, then I think we would be in a much better position to critique the design and implementation, and debate how it affects the implementation.

Given how critical these operations are and how nuanced they can be, I don't think we're interested in having multiple implementations in the tree at the same time. Not unless there's something critical to the design that I'm missing which makes the garbage collection improvements inherently different.

EDIT: Let me adjust a bit on multiple GC: If we want to trial and incrementally roll out two implementations, then it could be useful to have a swappable implementation where you define which GC algorithm to use based on a flag. But for that, ideally we would just structure it so that the GC aspect fits into the broader code in a consistent way and the only difference is the details of the GC which are swapped in/out based on a flag. If that's what you're going for, I think that at least provides us a path from the current implementation to allowing the new implementation, enabling by default, deprecating the old one, then removing the old one. However if we have sufficient testing I don't think we necessarily need to go through all of that, we could just stick to one implementation that is incrementally improved/changed.

@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 81ddfbb to 37db6ca Compare July 3, 2024 14:15
@ovidiutirla

This comment was marked as outdated.

@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch 4 times, most recently from ac7ac99 to c63e252 Compare July 4, 2024 11:57
ovidiutirla

This comment was marked as outdated.

@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from c63e252 to 04e5547 Compare July 4, 2024 12:01
@ovidiutirla ovidiutirla requested a review from pippolo84 July 4, 2024 12:06
@pchaigno pchaigno enabled auto-merge July 4, 2024 12:15
auto-merge was automatically disabled July 4, 2024 16:08

Head branch was pushed to by a user without write access

@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 04e5547 to d8673e0 Compare July 4, 2024 16:08
@sypakine
Copy link
Contributor

sypakine commented Jul 4, 2024

note: I had been commenting on a previous commit version: ac7ac99

I'm not a github expert, so not sure if there is a way to reflect them in this commit...

Copy link
Member

@pippolo84 pippolo84 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the follow-up.
Left some additional comments in the previously opened conversations.

@ovidiutirla
Copy link
Contributor Author

note: I had been commenting on a previous commit version: ac7ac99

I'm not a github expert, so not sure if there is a way to reflect them in this commit...

Thanks both of you for the feedback!
I split in a new commit and addressed your changes Mark.

@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch 5 times, most recently from 9134c80 to 3524a4c Compare July 4, 2024 22:40
@ovidiutirla ovidiutirla requested a review from pippolo84 July 4, 2024 22:40
The field will be used mainly by operator managing CIDs.
Related cilium#27752

Signed-off-by: Ovidiu Tirla <[email protected]>
Replaces the existing logger to slog

Signed-off-by: Ovidiu Tirla <[email protected]>
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 3524a4c to 8d420ba Compare July 5, 2024 11:58
Copy link
Contributor

@dlapcevic dlapcevic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Add EnqueueTimeTracker and CIDDeletionTracker structures
to manage enqueuing times and track CID deletion marks.

Signed-off-by: Ovidiu Tirla <[email protected]>
@ovidiutirla ovidiutirla force-pushed the feature/op-id-cid-cache branch from 8d420ba to 9b2e980 Compare July 5, 2024 14:17
@joestringer joestringer added the dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. label Jul 9, 2024
@ovidiutirla
Copy link
Contributor Author

Closing this in favor of:

@ovidiutirla ovidiutirla closed this Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dont-merge/discussion A discussion is ongoing and should be resolved before merging, regardless of reviews & tests status. release-note/misc This PR makes changes that have no direct user impact.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants