fix(dotnet/policy): consolidate policies and external-backends under one lock#2181
Merged
imran-siddique merged 1 commit intoMay 12, 2026
Conversation
…one lock
PolicyEngine held two separate locks for its two state collections:
- _policyLock -> _policies
- _externalBackendLock -> _externalBackends
Evaluate() snapshotted _policies under one lock, then later snapshotted
_externalBackends under the other. ClearPolicies() cleared each list
under its own lock in sequence. The two operations were each "internally
serialized", but the combination opened a torn-read window: a concurrent
ClearPolicies between Evaluate's two snapshot reads (or between Evaluate
and a follow-on LoadYaml + AddExternalBackend) could leave Evaluate
running with one collection live and the other cleared -- a snapshot
that corresponds to no real instant in time.
Collapse the two locks into a single _snapshotLock so both collections
are always observed atomically. Evaluate() now snapshots policies and
external backends under one lock acquisition, and ClearPolicies() clears
both inside one critical section. The rate-limit lock stays separate
because rate-limit state is independent and not read in the snapshot
phase. The unused GetExternalBackendsSnapshot() helper is removed.
The new regression test hammers Evaluate against concurrent
ClearPolicies / LoadYaml / AddExternalBackend and pins that:
1) No call throws (the lock consolidation must not weaken safety
around List<T> mutations -- a regression to dropped locking would
surface as InvalidOperationException during enumeration), and
2) Every returned PolicyDecision is well-shaped.
🤖 AI Agent: code-reviewer — View detailsTL;DR: 1 blocker, 0 warnings. The lock consolidation introduces a potential race condition if not properly managed.
Action items: Ensure that all modifications to Warnings: None found, fine as follow-up PRs. |
🤖 AI Agent: docs-sync-checker — Docs SyncDocs Sync
|
🤖 AI Agent: breaking-change-detector — API CompatibilityAPI Compatibility
|
🤖 AI Agent: security-scanner — Security ReviewSecurity Review
|
🤖 AI Agent: test-generator — `PolicyEngine.cs`
|
|
🟡 Contributor Check: MEDIUM
Automated check by AGT Contributor Check. |
PR Review Summary
Verdict: ❌ Changes needed |
MohammadHaroonAbuomar
pushed a commit
to MohammadHaroonAbuomar/agt-acs
that referenced
this pull request
Jun 1, 2026
…one lock (microsoft#2181) PolicyEngine held two separate locks for its two state collections: - _policyLock -> _policies - _externalBackendLock -> _externalBackends Evaluate() snapshotted _policies under one lock, then later snapshotted _externalBackends under the other. ClearPolicies() cleared each list under its own lock in sequence. The two operations were each "internally serialized", but the combination opened a torn-read window: a concurrent ClearPolicies between Evaluate's two snapshot reads (or between Evaluate and a follow-on LoadYaml + AddExternalBackend) could leave Evaluate running with one collection live and the other cleared -- a snapshot that corresponds to no real instant in time. Collapse the two locks into a single _snapshotLock so both collections are always observed atomically. Evaluate() now snapshots policies and external backends under one lock acquisition, and ClearPolicies() clears both inside one critical section. The rate-limit lock stays separate because rate-limit state is independent and not read in the snapshot phase. The unused GetExternalBackendsSnapshot() helper is removed. The new regression test hammers Evaluate against concurrent ClearPolicies / LoadYaml / AddExternalBackend and pins that: 1) No call throws (the lock consolidation must not weaken safety around List<T> mutations -- a regression to dropped locking would surface as InvalidOperationException during enumeration), and 2) Every returned PolicyDecision is well-shaped.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PolicyEngineholds two separate locks for its two state collections:_policyLock->_policies_externalBackendLock->_externalBackendsEvaluate()snapshots_policiesunder one lock, then later snapshots_externalBackendsunder the other.ClearPolicies()clears each list under its own lock in sequence. The two operations are each internally serialized, but the combination opens a torn-read window: a concurrentClearPoliciesbetween Evaluate's two snapshot reads (or between Evaluate's snapshot and a follow-onLoadYaml+AddExternalBackend) can leave Evaluate running with one collection live and the other cleared -- a snapshot that corresponds to no real instant in time.Change
Collapse the two locks into a single
_snapshotLockso both collections are always observed atomically:Evaluate()snapshots policies and external backends under one lock acquisition.ClearPolicies()clears both inside one critical section.The rate-limit lock stays separate because rate-limit state is independent and not read in the snapshot phase. The unused
GetExternalBackendsSnapshot()helper is removed.Tests
The new regression test hammers
Evaluateagainst concurrentClearPolicies/LoadYaml/AddExternalBackendand pins that:List<T>mutations -- a regression to dropped locking would surface asInvalidOperationExceptionduring enumeration).PolicyDecisionis well-shaped.The torn-read itself is subtle and not directly externally observable as a "wrong" decision -- the engine still self-consistently routes through whichever pair of snapshots it got -- but the consolidation is still the right shape: snapshot consistency is a foundational invariant for any future audit / replay logic that wants to reason about "what state was the engine in when this request was evaluated."
Test plan
dotnet test-- 658 / 658 passing (previous baseline 657 + 1 new)ClearPolicies_*/Evaluate_*/ListExternalBackends_*tests unchangedSurfaced during independent audit conducted by @finnoybu (Ken Tannenbaum, AEGIS Initiative); [LOW, .NET].