Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Infinite reconciliation loop when destination.create is false and Secret lacks VSO labels #1212

@sarikaj2

Description

@sarikaj2

Describe the bug

When the Vault Secrets Operator (VSO) is configured to manage a pre-existing Kubernetes Secret (via spec.destination.create: false in the VaultStaticSecret), it can enter an infinite reconciliation loop.
This loop occurs if the pre-existing Secret does not have the VSO ownership labels (e.g., app.kubernetes.io/managed-by: hashicorp-vso).

The operator continuously detects a "drift" because its read client cannot find the secret, which then triggers a sync. However, the sync logic intentionally does not add the required labels, perpetuating the loop on every refreshAfter cycle.

This results in any configured rolloutRestartTargets being triggered on every reconciliation, causing constant restarts of the application pods.

This issue is particularly prevalent in two scenarios:

  1. When upgrading VSO from a version before v1.0.0 (which did not require these labels) to a newer version.
  2. When another system, such as an OAM (Open Application Model), creates the initial Secret resource.

To Reproduce

Steps to reproduce the behavior:

  1. Deploy the Vault Secrets Operator v1.0.0 or later.
  2. Manually create a Kubernetes Secret in an application namespace. This secret should not have the VSO ownership labels.
  3. Deploy a VaultStaticSecret resource that targets the pre-existing secret. This VaultStaticSecret must be configured with:
- spec.destination.create: false
- spec.rolloutRestartTargets pointing to a workload (e.g., a Deployment or Argo Rollout).
  1. Observe the VSO controller logs. You will see that on every reconciliation cycle (defined by refreshAfter), it fails to find the secret for drift detection and proceeds to sync it.
  2. Observe the target application workload. You will see that pods are continuously being restarted.

Application deployment:

# The following resources demonstrate a cohesive application setup that triggers the bug.
#
# 1. A Deployment for the application workload.
# Its labels match the application identity, and it will be the target for restarts.
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: myapp
  labels:
    app: myapp
spec:
  replicas: 1
  selector:
    matchLabels:
      app: myapp
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
      - name: myapp-container
        image: nginx:1.21 # A simple, common image for the example

---
# 2. A pre-existing Kubernetes Secret created without VSO labels.
# This is the secret that VSO will manage the data for.
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secret
  namespace: myapp
  labels:
    app: myapp
type: Opaque

---
# 3. A VaultStaticSecret targeting the secret above, with `create: false`.
# This is the resource that triggers the bug.
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
  name: myapp-vss
  namespace: myapp
  labels:
    app: myapp
spec:
  # This configuration tells VSO to manage a pre-existing secret
  destination:
    name: myapp-secret
    create: false # VSO is not the owner of the secret's lifecycle
    overwrite: true

  # Standard VSO configuration
  mount: applications
  path: myapp/0022
  type: kv-v2
  refreshAfter: 30s
  # Add the missing reference to the VaultAuth resource.
  # This tells VSO which authentication method to use for this secret.
  vaultAuthRef: myapp-vaultauth

  # This target will be restarted on every reconciliation cycle
  rolloutRestartTargets:
    - kind: "Deployment"
      name: "myapp"

---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultConnection
metadata:
  name: default
  namespace: myapp
spec:
  # The address of your Vault instance
  address: "https://vault.example.com:8200" # <-- Replace with your Vault address
  # Set to true to skip TLS verification, not recommended for production
  skipTLSVerify: true

---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
  labels:
    app: myapp
  name: myapp-vaultauth
  namespace: myapp
spec:
  kubernetes:
    role: myapp-role # <-- Role configured in Vault
    serviceAccount: default
    tokenExpirationSeconds: 600
  method: kubernetes
  mount: kubernetes # <-- Path where Kubernetes auth is enabled in Vault
  vaultConnectionRef: default

Expected behavior

The operator should successfully reconcile the VaultStaticSecret.
After the initial sync, the operator should stabilize. It should only trigger another sync and rollout restart if the secret data in Vault actually changes or if the Kubernetes Secret is tampered with.

Environment

  • Kubernetes version: 1.34
  • vault-secrets-operator version: v1.1.0 (and later)

Additional context

The root cause is a conflict between two design choices in the VSO codebase:

  1. The Two-Client System: For performance, the operator uses a special filtered secretsClient for reading/listing secrets during drift detection. This client's cache is filtered by the VSO ownership labels (app.kubernetes.io/managed-by: hashicorp-vso, etc.). For writing secrets, it uses a general, unfiltered client.

  2. Respect for External Ownership: The helpers.SyncSecret function contains logic that explicitly avoids adding labels or annotations to a Secret if spec.destination.create is false. The comment in the code explains this is to avoid interfering with metadata set by an external owner.

This creates a "catch-22":

  • The drift detection logic (hmacDestinationSecret) uses the filtered client and fails to find the secret, returning a NotFound error.
  • The reconciler interprets this as a drift and triggers a sync.
  • The sync logic (SyncSecret) uses the unfiltered client, finds the secret, but then explicitly refuses to add the labels that would make it visible to the drift detection logic.
  • The loop repeats.

We believe this can be fixed within VSO by either:

  • Option A: Modifying SyncSecret to always add the VSO ownership labels, even on an update when create: false. This would make the secret visible on the next cycle.
  • Option B: Modifying the VaultStaticSecretReconciler to conditionally use the unfiltered client for drift detection when create: false.

I would be happy to contribute a fix for this issue if the maintainers can provide some guidance on the preferred solution.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions