Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(dotnet/policy): consolidate policies and external-backends under one lock#2181

Merged
imran-siddique merged 1 commit into
microsoft:mainfrom
aegis-initiative:fix/dotnet-policy-engine-consolidate-snapshot-lock
May 12, 2026
Merged

fix(dotnet/policy): consolidate policies and external-backends under one lock#2181
imran-siddique merged 1 commit into
microsoft:mainfrom
aegis-initiative:fix/dotnet-policy-engine-consolidate-snapshot-lock

Conversation

@finnoybu

Copy link
Copy Markdown
Contributor

Summary

PolicyEngine holds two separate locks for its two state collections:

  • _policyLock -> _policies
  • _externalBackendLock -> _externalBackends

Evaluate() snapshots _policies under one lock, then later snapshots _externalBackends under the other. ClearPolicies() clears each list under its own lock in sequence. The two operations are each internally serialized, but the combination opens a torn-read window: a concurrent ClearPolicies between Evaluate's two snapshot reads (or between Evaluate's snapshot and a follow-on LoadYaml + AddExternalBackend) can leave Evaluate running with one collection live and the other cleared -- a snapshot that corresponds to no real instant in time.

Change

Collapse the two locks into a single _snapshotLock so both collections are always observed atomically:

  • Evaluate() snapshots policies and external backends under one lock acquisition.
  • ClearPolicies() clears both inside one critical section.

The rate-limit lock stays separate because rate-limit state is independent and not read in the snapshot phase. The unused GetExternalBackendsSnapshot() helper is removed.

Tests

The new regression test hammers Evaluate against concurrent ClearPolicies / LoadYaml / AddExternalBackend and pins that:

  1. No call throws (the lock consolidation must not weaken safety around List<T> mutations -- a regression to dropped locking would surface as InvalidOperationException during enumeration).
  2. Every returned PolicyDecision is well-shaped.

The torn-read itself is subtle and not directly externally observable as a "wrong" decision -- the engine still self-consistently routes through whichever pair of snapshots it got -- but the consolidation is still the right shape: snapshot consistency is a foundational invariant for any future audit / replay logic that wants to reason about "what state was the engine in when this request was evaluated."

Test plan

  • dotnet test -- 658 / 658 passing (previous baseline 657 + 1 new)
  • Existing ClearPolicies_* / Evaluate_* / ListExternalBackends_* tests unchanged

Surfaced during independent audit conducted by @finnoybu (Ken Tannenbaum, AEGIS Initiative); [LOW, .NET].

…one lock

PolicyEngine held two separate locks for its two state collections:

  - _policyLock  -> _policies
  - _externalBackendLock -> _externalBackends

Evaluate() snapshotted _policies under one lock, then later snapshotted
_externalBackends under the other. ClearPolicies() cleared each list
under its own lock in sequence. The two operations were each "internally
serialized", but the combination opened a torn-read window: a concurrent
ClearPolicies between Evaluate's two snapshot reads (or between Evaluate
and a follow-on LoadYaml + AddExternalBackend) could leave Evaluate
running with one collection live and the other cleared -- a snapshot
that corresponds to no real instant in time.

Collapse the two locks into a single _snapshotLock so both collections
are always observed atomically. Evaluate() now snapshots policies and
external backends under one lock acquisition, and ClearPolicies() clears
both inside one critical section. The rate-limit lock stays separate
because rate-limit state is independent and not read in the snapshot
phase. The unused GetExternalBackendsSnapshot() helper is removed.

The new regression test hammers Evaluate against concurrent
ClearPolicies / LoadYaml / AddExternalBackend and pins that:
  1) No call throws (the lock consolidation must not weaken safety
     around List<T> mutations -- a regression to dropped locking would
     surface as InvalidOperationException during enumeration), and
  2) Every returned PolicyDecision is well-shaped.
@github-actions github-actions Bot added the tests label May 12, 2026
@github-actions

Copy link
Copy Markdown
🤖 AI Agent: code-reviewer — View details

TL;DR: 1 blocker, 0 warnings. The lock consolidation introduces a potential race condition if not properly managed.

# Sev Issue Where
1 CRITICAL Potential race condition due to lock consolidation allowing concurrent modifications PolicyEngine.cs

Action items: Ensure that all modifications to _policies and _externalBackends are properly synchronized under the new _snapshotLock to prevent race conditions.

Warnings: None found, fine as follow-up PRs.

@github-actions github-actions Bot added the size/M Medium PR (< 200 lines) label May 12, 2026
@github-actions

Copy link
Copy Markdown
🤖 AI Agent: docs-sync-checker — Docs Sync

Docs Sync

  • Evaluate() in PolicyEngine.cs -- missing docstring
  • ClearPolicies() in PolicyEngine.cs -- missing docstring
  • README.md -- section on concurrency and locking needs update
  • CHANGELOG.md -- missing entry for behavioral change regarding lock consolidation in PolicyEngine

@github-actions

Copy link
Copy Markdown
🤖 AI Agent: breaking-change-detector — API Compatibility

API Compatibility

Severity Change Impact
Breaking Consolidated two separate locks (_policyLock and _externalBackendLock) into a single lock (_snapshotLock) for both _policies and _externalBackends. Changes the locking mechanism, which may affect thread safety and concurrency behavior in existing implementations that rely on the previous locking strategy.

@github-actions

Copy link
Copy Markdown
🤖 AI Agent: security-scanner — Security Review

Security Review

Severity Finding Fix
LOW Potential race condition due to separate locks for _policies and _externalBackends leading to torn reads in Evaluate() and ClearPolicies(). Consolidated locks into a single _snapshotLock to ensure atomicity during state reads and modifications.

@github-actions

Copy link
Copy Markdown
🤖 AI Agent: test-generator — `PolicyEngine.cs`

PolicyEngine.cs

  • Evaluate_ConcurrentWithClearPolicies_DoesNotThrow_AndDecisionsRemainShaped -- Validates that concurrent modifications do not cause exceptions and decisions remain valid during policy evaluations.

@github-actions

Copy link
Copy Markdown

🟡 Contributor Check: MEDIUM

Check Result
Profile MEDIUM
Credential NONE
Overall MEDIUM

Automated check by AGT Contributor Check.

@github-actions github-actions Bot added the needs-review:MEDIUM Contributor check flagged MEDIUM risk label May 12, 2026
@github-actions

Copy link
Copy Markdown

PR Review Summary

Check Status Details
🔍 Code Review ❌ Failed Issues detected
🛡️ Security Scan ✅ Completed Analysis complete
🔄 Breaking Changes ✅ Completed Analysis complete
📝 Docs Sync ✅ Completed Analysis complete
🧪 Test Coverage ✅ Completed Analysis complete

Verdict: ❌ Changes needed

@imran-siddique imran-siddique merged commit 8578c42 into microsoft:main May 12, 2026
13 of 14 checks passed
MohammadHaroonAbuomar pushed a commit to MohammadHaroonAbuomar/agt-acs that referenced this pull request Jun 1, 2026
…one lock (microsoft#2181)

PolicyEngine held two separate locks for its two state collections:

  - _policyLock  -> _policies
  - _externalBackendLock -> _externalBackends

Evaluate() snapshotted _policies under one lock, then later snapshotted
_externalBackends under the other. ClearPolicies() cleared each list
under its own lock in sequence. The two operations were each "internally
serialized", but the combination opened a torn-read window: a concurrent
ClearPolicies between Evaluate's two snapshot reads (or between Evaluate
and a follow-on LoadYaml + AddExternalBackend) could leave Evaluate
running with one collection live and the other cleared -- a snapshot
that corresponds to no real instant in time.

Collapse the two locks into a single _snapshotLock so both collections
are always observed atomically. Evaluate() now snapshots policies and
external backends under one lock acquisition, and ClearPolicies() clears
both inside one critical section. The rate-limit lock stays separate
because rate-limit state is independent and not read in the snapshot
phase. The unused GetExternalBackendsSnapshot() helper is removed.

The new regression test hammers Evaluate against concurrent
ClearPolicies / LoadYaml / AddExternalBackend and pins that:
  1) No call throws (the lock consolidation must not weaken safety
     around List<T> mutations -- a regression to dropped locking would
     surface as InvalidOperationException during enumeration), and
  2) Every returned PolicyDecision is well-shaped.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs-review:MEDIUM Contributor check flagged MEDIUM risk size/M Medium PR (< 200 lines) tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants