-
Notifications
You must be signed in to change notification settings - Fork 184
Add thread safety to resolve race conditions in Issues #305 and #335 #377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes knadh#305 and knadh#335 by implementing proper synchronization in koanf core. ## Issues Fixed - Issue knadh#305: File watcher race condition causing empty string reads - Issue knadh#335: "concurrent map writes" panic during concurrent Load() calls ## Root Cause The koanf struct's internal maps (confMap, confMapFlat, keyMap) were accessed concurrently without synchronization, causing race conditions when multiple goroutines performed read/write operations. ## Solution - Add sync.RWMutex to Koanf struct for thread safety - Exclusive locks (Lock) for write operations: merge(), Delete() - Shared locks (RLock) for read operations: Get(), Keys(), All(), etc. - Zero breaking changes - identical public API - Optimized for read-heavy workloads (RWMutex allows concurrent reads) ## Testing - Added comprehensive race condition tests that reproduce both issues - TestConcurrentLoadRaceCondition: Reproduces issue knadh#335 - TestFileWatcherRaceCondition: Reproduces issue knadh#305 - TestConcurrentReadWriteMix: Mixed read/write scenarios - TestConcurrentEdgeCases: Edge case methods (Cut, Copy, etc.) - All tests pass with -race flag - All existing functionality tests continue to pass ## Performance Impact - Zero overhead for single-threaded usage - Optimal for concurrent reads (multiple readers can proceed simultaneously) - Minimal contention cost due to RWMutex design Created using prompts: "both options are not correct, can you look at the underlying code, and see if adding a mutex somewhere is needed" and "we should begin by writing tests that replicate this. and any other race conditions that could occur."
Fixes race conditions detected when running tests with -race flag. ## Root Cause The file provider had a race condition where: 1. Watch() assigns to f.w (fsnotify.Watcher field) 2. Previous watcher's cleanup goroutine calls f.w.Close() 3. These operations could happen concurrently when re-watching after unwatch Additionally, TestUnwatchFile had a race between: - Main test goroutine reading/writing `reloaded` boolean variable - Watch callback goroutine writing to `reloaded` variable ## Solution 1. **File Provider**: Add mutex protection around watcher state changes - Added `mu sync.Mutex` to File struct - Protected Watch() and Unwatch() methods with mutex - Protected cleanup code in watch goroutine with mutex - Added nil checks for defensive programming 2. **TestUnwatchFile**: Use atomic operations instead of plain boolean - Changed `reloaded bool` to `reloaded int32` - Use atomic.StoreInt32/LoadInt32 for thread-safe access - Test now properly verifies re-watching capability after unwatch ## Testing - All watch-related tests pass with race detector: TestWatchFile, TestWatchFileSymlink, TestWatchFileDirectorySymlink, TestUnwatchFile - All submodule tests continue to pass with race detection - CI pattern `github.com/knadh/koanf...` properly tests all submodules Created using prompt: "alright, lets then do a separate commit now to fix the test unwatch file race" and "maybe a better approach will be to add a mutex for file watcher instad?"
6d74100 to
4f38b0b
Compare
4f38b0b to
6991e4f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds comprehensive thread safety to the koanf library to resolve race conditions in file watching and concurrent map access. The changes implement proper synchronization without breaking the existing API.
- Adds
sync.RWMutexto the Koanf struct for thread-safe concurrent operations - Replaces atomic operations in file provider with
sync.Mutexfor better state protection - Includes extensive race condition tests and CI integration for ongoing safety verification
Reviewed Changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| koanf.go | Adds RWMutex synchronization to all read/write operations on internal maps |
| providers/file/file.go | Replaces atomic operations with mutex-based synchronization for watcher state |
| tests/koanf_test.go | Adds comprehensive race condition tests and converts existing test to use atomic operations |
| .github/workflows/test.yml | Adds dedicated CI job for race detection testing |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
- Remove atomic operations in favor of simple boolean flags - Fix potential deadlock in Watch() by releasing lock before goroutine spawn - Use minimal locking in cleanup goroutine - Cleaner, more maintainable synchronization pattern - All tests pass with race detector
6991e4f to
f37d738
Compare
… patterns Switch from global defer-based locking to localized locking in read methods to reduce code duplication and improve readability. This eliminates the need for duplicate logic that was added to avoid deadlocks between Sprint() and Keys() methods. - Keys(), KeyMap(), Get(), Exists() now use minimal lock duration - Sprint() can safely call Keys() without deadlock concerns - Remove duplicate key extraction logic from Sprint() - Maintain defer pattern only where needed for maps.Copy() operations - All existing tests pass including race condition and deadlock tests
koanf.go
Outdated
| b := bytes.Buffer{} | ||
| for _, k := range ko.Keys() { | ||
| b.WriteString(fmt.Sprintf("%s -> %v\n", k, ko.confMapFlat[k])) | ||
| ko.mu.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to lock/unlock continuously inside the loop? That must work out to be more expensive than a single lock/unlock outside the loop.
- Move RLock/RUnlock outside the iteration loop to reduce lock overhead - Maintain alphabetical sorting behavior for API compatibility - Eliminate repeated lock acquisitions from O(n) to O(1) - All tests pass including race condition detection "optimize Sprint method performance while maintaining thread safety and sorting"
Summary
Root Cause
The koanf struct's internal maps (confMap, confMapFlat, keyMap) were accessed concurrently without synchronization, causing race conditions when multiple goroutines performed read/write operations.
Additionally, the file provider had a race condition when re-watching files after unwatching due to concurrent access to the fsnotify.Watcher field.
Solution
Core Koanf Thread Safety
sync.RWMutexto Koanf struct for optimal read-heavy performanceFile Provider Thread Safety
sync.Mutexto File struct to protect watcher stateTesting
Performance Impact
Compatibility