Thanks to visit codestin.com
Credit goes to Github.com

Skip to content

Conversation

@rhnvrm
Copy link
Collaborator

@rhnvrm rhnvrm commented Aug 22, 2025

Summary

Root Cause

The koanf struct's internal maps (confMap, confMapFlat, keyMap) were accessed concurrently without synchronization, causing race conditions when multiple goroutines performed read/write operations.

Additionally, the file provider had a race condition when re-watching files after unwatching due to concurrent access to the fsnotify.Watcher field.

Solution

Core Koanf Thread Safety

  • Add sync.RWMutex to Koanf struct for optimal read-heavy performance
  • Exclusive locks (Lock) for write operations: merge(), Delete()
  • Shared locks (RLock) for read operations: Get(), Keys(), All(), etc.
  • Prevent deadlocks in Sprint() and Get("") methods with inline implementations
  • Zero breaking changes - identical public API

File Provider Thread Safety

  • Add sync.Mutex to File struct to protect watcher state
  • Synchronize Watch() and Unwatch() methods
  • Protect cleanup code in watch goroutine
  • Fix TestUnwatchFile race using atomic operations

Testing

Performance Impact

  • Zero overhead for single-threaded usage
  • Optimal for concurrent reads (multiple readers proceed simultaneously)
  • Minimal contention cost due to RWMutex design

Compatibility

  • Zero breaking changes
  • All existing functionality tests continue to pass
  • Thread safety is now guaranteed for all operations

rhnvrm added 2 commits August 22, 2025 18:23
Fixes knadh#305 and knadh#335 by implementing proper synchronization in koanf core.

## Issues Fixed
- Issue knadh#305: File watcher race condition causing empty string reads
- Issue knadh#335: "concurrent map writes" panic during concurrent Load() calls

## Root Cause
The koanf struct's internal maps (confMap, confMapFlat, keyMap) were
accessed concurrently without synchronization, causing race conditions
when multiple goroutines performed read/write operations.

## Solution
- Add sync.RWMutex to Koanf struct for thread safety
- Exclusive locks (Lock) for write operations: merge(), Delete()
- Shared locks (RLock) for read operations: Get(), Keys(), All(), etc.
- Zero breaking changes - identical public API
- Optimized for read-heavy workloads (RWMutex allows concurrent reads)

## Testing
- Added comprehensive race condition tests that reproduce both issues
- TestConcurrentLoadRaceCondition: Reproduces issue knadh#335
- TestFileWatcherRaceCondition: Reproduces issue knadh#305
- TestConcurrentReadWriteMix: Mixed read/write scenarios
- TestConcurrentEdgeCases: Edge case methods (Cut, Copy, etc.)
- All tests pass with -race flag
- All existing functionality tests continue to pass

## Performance Impact
- Zero overhead for single-threaded usage
- Optimal for concurrent reads (multiple readers can proceed simultaneously)
- Minimal contention cost due to RWMutex design

Created using prompts: "both options are not correct, can you look at the underlying code, and see if adding a mutex somewhere is needed" and "we should begin by writing tests that replicate this. and any other race conditions that could occur."
Fixes race conditions detected when running tests with -race flag.

## Root Cause
The file provider had a race condition where:
1. Watch() assigns to f.w (fsnotify.Watcher field)
2. Previous watcher's cleanup goroutine calls f.w.Close()
3. These operations could happen concurrently when re-watching after unwatch

Additionally, TestUnwatchFile had a race between:
- Main test goroutine reading/writing `reloaded` boolean variable
- Watch callback goroutine writing to `reloaded` variable

## Solution
1. **File Provider**: Add mutex protection around watcher state changes
   - Added `mu sync.Mutex` to File struct
   - Protected Watch() and Unwatch() methods with mutex
   - Protected cleanup code in watch goroutine with mutex
   - Added nil checks for defensive programming

2. **TestUnwatchFile**: Use atomic operations instead of plain boolean
   - Changed `reloaded bool` to `reloaded int32`
   - Use atomic.StoreInt32/LoadInt32 for thread-safe access
   - Test now properly verifies re-watching capability after unwatch

## Testing
- All watch-related tests pass with race detector: TestWatchFile, TestWatchFileSymlink, TestWatchFileDirectorySymlink, TestUnwatchFile
- All submodule tests continue to pass with race detection
- CI pattern `github.com/knadh/koanf...` properly tests all submodules

Created using prompt: "alright, lets then do a separate commit now to fix the test unwatch file race" and "maybe a better approach will be to add a mutex for file watcher instad?"
@rhnvrm rhnvrm force-pushed the fix-race-conditions-issues-305-335 branch from 6d74100 to 4f38b0b Compare August 22, 2025 13:09
@rhnvrm rhnvrm requested a review from Copilot August 22, 2025 13:14

This comment was marked as outdated.

@rhnvrm rhnvrm force-pushed the fix-race-conditions-issues-305-335 branch from 4f38b0b to 6991e4f Compare August 22, 2025 13:24
@rhnvrm rhnvrm requested a review from Copilot August 22, 2025 13:25
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds comprehensive thread safety to the koanf library to resolve race conditions in file watching and concurrent map access. The changes implement proper synchronization without breaking the existing API.

  • Adds sync.RWMutex to the Koanf struct for thread-safe concurrent operations
  • Replaces atomic operations in file provider with sync.Mutex for better state protection
  • Includes extensive race condition tests and CI integration for ongoing safety verification

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
koanf.go Adds RWMutex synchronization to all read/write operations on internal maps
providers/file/file.go Replaces atomic operations with mutex-based synchronization for watcher state
tests/koanf_test.go Adds comprehensive race condition tests and converts existing test to use atomic operations
.github/workflows/test.yml Adds dedicated CI job for race detection testing

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

- Remove atomic operations in favor of simple boolean flags
- Fix potential deadlock in Watch() by releasing lock before goroutine spawn
- Use minimal locking in cleanup goroutine
- Cleaner, more maintainable synchronization pattern
- All tests pass with race detector
@rhnvrm rhnvrm force-pushed the fix-race-conditions-issues-305-335 branch from 6991e4f to f37d738 Compare August 22, 2025 13:31
@rhnvrm rhnvrm requested a review from knadh August 22, 2025 13:32
… patterns

Switch from global defer-based locking to localized locking in read methods to reduce code duplication and improve readability. This eliminates the need for duplicate logic that was added to avoid deadlocks between Sprint() and Keys() methods.

- Keys(), KeyMap(), Get(), Exists() now use minimal lock duration
- Sprint() can safely call Keys() without deadlock concerns
- Remove duplicate key extraction logic from Sprint()
- Maintain defer pattern only where needed for maps.Copy() operations
- All existing tests pass including race condition and deadlock tests
koanf.go Outdated
b := bytes.Buffer{}
for _, k := range ko.Keys() {
b.WriteString(fmt.Sprintf("%s -> %v\n", k, ko.confMapFlat[k]))
ko.mu.RLock()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to lock/unlock continuously inside the loop? That must work out to be more expensive than a single lock/unlock outside the loop.

- Move RLock/RUnlock outside the iteration loop to reduce lock overhead
- Maintain alphabetical sorting behavior for API compatibility
- Eliminate repeated lock acquisitions from O(n) to O(1)
- All tests pass including race condition detection

"optimize Sprint method performance while maintaining thread safety and sorting"
@knadh knadh merged commit 4e55089 into knadh:master Sep 4, 2025
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants