Describe the bug
When the Vault Secrets Operator (VSO) is configured to manage a pre-existing Kubernetes Secret (via spec.destination.create: false in the VaultStaticSecret), it can enter an infinite reconciliation loop.
This loop occurs if the pre-existing Secret does not have the VSO ownership labels (e.g., app.kubernetes.io/managed-by: hashicorp-vso).
The operator continuously detects a "drift" because its read client cannot find the secret, which then triggers a sync. However, the sync logic intentionally does not add the required labels, perpetuating the loop on every refreshAfter cycle.
This results in any configured rolloutRestartTargets being triggered on every reconciliation, causing constant restarts of the application pods.
This issue is particularly prevalent in two scenarios:
- When upgrading VSO from a version before v1.0.0 (which did not require these labels) to a newer version.
- When another system, such as an OAM (Open Application Model), creates the initial Secret resource.
To Reproduce
Steps to reproduce the behavior:
- Deploy the Vault Secrets Operator v1.0.0 or later.
- Manually create a Kubernetes Secret in an application namespace. This secret should not have the VSO ownership labels.
- Deploy a VaultStaticSecret resource that targets the pre-existing secret. This VaultStaticSecret must be configured with:
- spec.destination.create: false
- spec.rolloutRestartTargets pointing to a workload (e.g., a Deployment or Argo Rollout).
- Observe the VSO controller logs. You will see that on every reconciliation cycle (defined by refreshAfter), it fails to find the secret for drift detection and proceeds to sync it.
- Observe the target application workload. You will see that pods are continuously being restarted.
Application deployment:
# The following resources demonstrate a cohesive application setup that triggers the bug.
#
# 1. A Deployment for the application workload.
# Its labels match the application identity, and it will be the target for restarts.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
namespace: myapp
labels:
app: myapp
spec:
replicas: 1
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: nginx:1.21 # A simple, common image for the example
---
# 2. A pre-existing Kubernetes Secret created without VSO labels.
# This is the secret that VSO will manage the data for.
apiVersion: v1
kind: Secret
metadata:
name: myapp-secret
namespace: myapp
labels:
app: myapp
type: Opaque
---
# 3. A VaultStaticSecret targeting the secret above, with `create: false`.
# This is the resource that triggers the bug.
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultStaticSecret
metadata:
name: myapp-vss
namespace: myapp
labels:
app: myapp
spec:
# This configuration tells VSO to manage a pre-existing secret
destination:
name: myapp-secret
create: false # VSO is not the owner of the secret's lifecycle
overwrite: true
# Standard VSO configuration
mount: applications
path: myapp/0022
type: kv-v2
refreshAfter: 30s
# Add the missing reference to the VaultAuth resource.
# This tells VSO which authentication method to use for this secret.
vaultAuthRef: myapp-vaultauth
# This target will be restarted on every reconciliation cycle
rolloutRestartTargets:
- kind: "Deployment"
name: "myapp"
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultConnection
metadata:
name: default
namespace: myapp
spec:
# The address of your Vault instance
address: "https://vault.example.com:8200" # <-- Replace with your Vault address
# Set to true to skip TLS verification, not recommended for production
skipTLSVerify: true
---
apiVersion: secrets.hashicorp.com/v1beta1
kind: VaultAuth
metadata:
labels:
app: myapp
name: myapp-vaultauth
namespace: myapp
spec:
kubernetes:
role: myapp-role # <-- Role configured in Vault
serviceAccount: default
tokenExpirationSeconds: 600
method: kubernetes
mount: kubernetes # <-- Path where Kubernetes auth is enabled in Vault
vaultConnectionRef: default
Expected behavior
The operator should successfully reconcile the VaultStaticSecret.
After the initial sync, the operator should stabilize. It should only trigger another sync and rollout restart if the secret data in Vault actually changes or if the Kubernetes Secret is tampered with.
Environment
- Kubernetes version: 1.34
- vault-secrets-operator version: v1.1.0 (and later)
Additional context
The root cause is a conflict between two design choices in the VSO codebase:
-
The Two-Client System: For performance, the operator uses a special filtered secretsClient for reading/listing secrets during drift detection. This client's cache is filtered by the VSO ownership labels (app.kubernetes.io/managed-by: hashicorp-vso, etc.). For writing secrets, it uses a general, unfiltered client.
-
Respect for External Ownership: The helpers.SyncSecret function contains logic that explicitly avoids adding labels or annotations to a Secret if spec.destination.create is false. The comment in the code explains this is to avoid interfering with metadata set by an external owner.
This creates a "catch-22":
- The drift detection logic (hmacDestinationSecret) uses the filtered client and fails to find the secret, returning a NotFound error.
- The reconciler interprets this as a drift and triggers a sync.
- The sync logic (SyncSecret) uses the unfiltered client, finds the secret, but then explicitly refuses to add the labels that would make it visible to the drift detection logic.
- The loop repeats.
We believe this can be fixed within VSO by either:
- Option A: Modifying SyncSecret to always add the VSO ownership labels, even on an update when create: false. This would make the secret visible on the next cycle.
- Option B: Modifying the VaultStaticSecretReconciler to conditionally use the unfiltered client for drift detection when create: false.
I would be happy to contribute a fix for this issue if the maintainers can provide some guidance on the preferred solution.
Describe the bug
When the Vault Secrets Operator (VSO) is configured to manage a pre-existing Kubernetes Secret (via spec.destination.create: false in the VaultStaticSecret), it can enter an infinite reconciliation loop.
This loop occurs if the pre-existing Secret does not have the VSO ownership labels (e.g., app.kubernetes.io/managed-by: hashicorp-vso).
The operator continuously detects a "drift" because its read client cannot find the secret, which then triggers a sync. However, the sync logic intentionally does not add the required labels, perpetuating the loop on every refreshAfter cycle.
This results in any configured rolloutRestartTargets being triggered on every reconciliation, causing constant restarts of the application pods.
This issue is particularly prevalent in two scenarios:
To Reproduce
Steps to reproduce the behavior:
Application deployment:
Expected behavior
The operator should successfully reconcile the VaultStaticSecret.
After the initial sync, the operator should stabilize. It should only trigger another sync and rollout restart if the secret data in Vault actually changes or if the Kubernetes Secret is tampered with.
Environment
Additional context
The root cause is a conflict between two design choices in the VSO codebase:
The Two-Client System: For performance, the operator uses a special filtered secretsClient for reading/listing secrets during drift detection. This client's cache is filtered by the VSO ownership labels (app.kubernetes.io/managed-by: hashicorp-vso, etc.). For writing secrets, it uses a general, unfiltered client.
Respect for External Ownership: The helpers.SyncSecret function contains logic that explicitly avoids adding labels or annotations to a Secret if spec.destination.create is false. The comment in the code explains this is to avoid interfering with metadata set by an external owner.
This creates a "catch-22":
We believe this can be fixed within VSO by either:
I would be happy to contribute a fix for this issue if the maintainers can provide some guidance on the preferred solution.