-
Notifications
You must be signed in to change notification settings - Fork 791
Eliminate entry cloning when flushing index #8330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eliminate entry cloning when flushing index #8330
Conversation
5a37af8 to
cb65e7d
Compare
| } | ||
|
|
||
| possible_evictions.insert(0, *k, Arc::clone(v)); | ||
| possible_evictions.insert(0, *k); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my current idea is to put dirty as extra value and use that for deciding to obtain the entry under map lock and clear dirty, write to disk
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. it is a great idea.
cb65e7d to
996d94a
Compare
| return None; | ||
| } | ||
| let evictions_age: Vec<_> = { | ||
| let map = self.map_internal.read().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the tradeoff - the read lock is now held during disk I/O operations...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To alleviate this, we chunk up the items and release the lock after each chunk.
b8fb32e to
74ff955
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8330 +/- ##
=======================================
Coverage 83.2% 83.2%
=======================================
Files 838 838
Lines 368496 368539 +43
=======================================
+ Hits 306667 306742 +75
+ Misses 61829 61797 -32 🚀 New features to boost your workflow:
|
Process evictions_age_possible in chunks of 20 items instead of holding the map_internal read lock for the entire eviction set. This reduces lock contention by releasing and re-acquiring the lock between chunks.
74ff955 to
f1d6a0d
Compare
8526a05 to
7f6cf59
Compare
| possible_evictions.insert(0, *k, Arc::clone(v)); | ||
| // Capture dirty and ref_count early during scan | ||
| let is_dirty = v.dirty(); | ||
| let ref_count = v.ref_count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in 99% of case flush_internal will skip the entry if ref count != 1:
- for dirty case we will re-check the ref_count in
should_evict_from_memand return false for ref count != 1, so the only situation we would write to disk for ref count != 1 is when between gathering evictions and flushing it changed to 1 - for non-dirty case you actually do the check
if *ref_count != 1 {in that branch inflush_internal
so I think we could just skip it here and not put the ref_count into the vector at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah. done in f8577c5
|
|
||
| if !should_evict { | ||
| // not evicting, so don't write, even if dirty | ||
| drop(map); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be helping a bit in readability, but at the same time it's kind of confusing that we drop manually the map guard just before return None, which will drop it anyway...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in f8577c5
| // Entry was dirty at scan time, need to write to disk | ||
| // Lock the map briefly to get the full entry reference | ||
| let lock_measure = Measure::start("flush_read_lock"); | ||
| let map = self.map_internal.read().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
call it something like map_read_guard to highlight that we are holding locks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 5f62787
| // since we know slot_list.len() == 1, we can create a stack-allocated array for single element. | ||
| let (slot, info) = slot_list[0]; | ||
| let disk_entry = [(slot, info.into())]; | ||
| let disk_ref_count = ref_count; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a cleaner way to handle lifetime of the map guard would be do do something like
let (disk_entry, disk_ref_count) = {
let map_read_guard = ...
if !... {
return None;
}
([(slot, info.into())], ref_count)
};
// unconditionally write to disk
loop {
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively, we could move the map guard outside of the outer for loop as Option to re-use the lock for entries that we skip writing, so something like
let mut map_read_guard = Some(self.map_internal.read().unwrap());
for k, v in possible_evictions {
..
if ..should write.. {
map_read_guard = None;
..write..
map_read_guard = Some(self.map_internal.read().unwrap())
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes. I think option 1 is better in the way that we are yielding the read lock like before.
done in 8c566a0
Optimize eviction candidate filtering by rejecting entries with ref_count != 1 during the initial scan phase, before they reach flush_internal or evict_from_cache. Changes: - Updated FlushScanResult to store (Pubkey, bool) instead of (Pubkey, bool, RefCount) - Modified gather_possible_evictions to filter ref_count != 1 early with clear rationale - Removed ref_count parameter from PossibleEvictions::insert() - Simplified flush_internal non-dirty path (no ref_count check needed) - Removed redundant drop() calls before return statements (locks release automatically) - Added comments explaining automatic lock release for better readability - Updated test to match new tuple structure Rationale: In 99% of cases, entries with ref_count != 1 will be rejected later by: - should_evict_from_mem() for dirty entries - evict_from_cache() for non-dirty entries By filtering early, we: 1. Reduce unnecessary work processing candidates that will be rejected 2. Avoid write lock contention in evict_from_cache for non-dirty entries 3. Simplify the code by removing redundant checks and explicit drops
Rename the variable holding the read guard from 'map' to 'map_read_guard' to make it explicit that it's a guard holding a read lock on map_internal. This improves code readability and follows Rust naming conventions for guards.
Refactor to use a scope block for managing the map_read_guard lifetime, making the control flow clearer and ensuring locks are released as soon as data extraction is complete. Changes: - Wrapped map access and data extraction in a scope block - Locks (map_read_guard and slot_list) automatically release at block end - Removed explicit drop() calls - no longer needed - Disk write unconditionally happens after lock release - Inverted clear_dirty() check for early return on non-dirty path Benefits: - Cleaner control flow with explicit scope boundaries - Impossible to accidentally hold locks during disk I/O - More idiomatic Rust with automatic guard drop - Easier to understand lock lifetime at a glance
|
Is this PR running on a validator against mnb that has the disk index enabled? |
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
Co-authored-by: Brooks <[email protected]>
yes. it is running and the id is 9eNX7h5wHH4GTqESybhHWZVvEX7nzxB6e7Q4J8RfAH2u |
|
Also, I appreciate the PR title change! IMO when I see "refactor", I interpret that to mean no behavioral change. For this PR though, we are changing behavior quite a bit. Wdyt about a title like: "Eliminate entry cloning when flushing index" |
Preserve and reword the important comment from the original code that explains how concurrent modifications are handled when clearing the dirty flag and writing to disk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, looks good!
Problem
We want to avoid using
Arcin the in-memory index to save 16 bytes per entry. However, the current disk flush implementation clones entries (requiring Arc), preventing this optimization. This PR changes disk flush to avoid entry cloning and refactors the eviction logic to minimize lock contention.Summary of Changes
Refactor disk flush to avoid cloning entries and minimize read lock contention, enabling future memory optimization by removing Arc from in-memory index entries.
Performance Measurement
The mainnet results indicate that the read lock is held for only ~3 K µs, compared to 300 K–3.5 M µs total update time.
This shows that lock contention is minimal, and the time spent under the read lock is only a very small fraction of the overall update duration.
Fixes #