-
Notifications
You must be signed in to change notification settings - Fork 97
dbsp: tracking bloom filter #5500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Using the following SQL: ... varying runtime configuration
(The above vary per run) Possible alterations to the program is increasing the tuple size by adding: |
|
I was picturing this being reported as a metric on spines, in AsyncMerger::metadata, rather than through the log. If it's reported in the per-spine metadata then we'll get it in profiles "for free" without having to look through the log. |
|
I think I'd use a pair of AtomicU64s instead of Mutexes. They should be cheaper. |
f28c4d6 to
14c154a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces tracking capabilities for Bloom filters used in DBSP batch operations, capturing hit/miss statistics for performance analysis. The changes enable monitoring of Bloom filter effectiveness across the storage layer and make these metrics visible in the Web Console profiler.
Changes:
- Adds
TrackingBloomFilterwrapper aroundBloomFilterto count hits and misses using atomic counters - Refactors
filter_size()method tofilter_stats()throughout the codebase to return comprehensive statistics - Updates the profiler to display three new metadata fields: hits, misses, and hit rate
Reviewed changes
Copilot reviewed 23 out of 23 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| js-packages/profiler-lib/src/profile.ts | Adds new Bloom filter metrics to measurement categories |
| crates/dbsp/src/storage/tracking_bloom_filter.rs | Implements new TrackingBloomFilter with hit/miss tracking |
| crates/dbsp/src/storage/file/writer.rs | Updates writer to use TrackingBloomFilter |
| crates/dbsp/src/storage/file/reader.rs | Updates reader to use TrackingBloomFilter and expose stats |
| crates/dbsp/src/storage/file/format.rs | Updates format conversions for TrackingBloomFilter |
| crates/dbsp/src/trace.rs | Changes BatchReader trait method from filter_size() to filter_stats() |
| crates/dbsp/src/trace/spine_async.rs | Aggregates filter stats across batches and adds new metadata fields |
| crates/dbsp/src/trace/spine_async/snapshot.rs | Updates snapshot to use filter_stats() |
| crates/dbsp/src/trace/test/test_batch.rs | Updates test batch to return default stats |
| crates/dbsp/src/trace/ord/vec/*.rs | Updates vec-based batches to return default stats |
| crates/dbsp/src/trace/ord/file/*.rs | Updates file-based batches to delegate to file reader |
| crates/dbsp/src/trace/ord/fallback/*.rs | Updates fallback batches to delegate to inner implementation |
| crates/dbsp/src/circuit/metadata.rs | Adds Float variant to MetaItem enum |
| crates/dbsp/src/storage.rs | Exports new tracking_bloom_filter module |
|
Ready for review! |
| "bounds", | ||
| "Bloom filter size", | ||
| "Bloom filter bits/key"]); | ||
| "Bloom filter bits/key", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these will need to change once we merge #5514, but let's see which PR lands first
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's see, this one is current queued, so if it (hopefully) passes CI, the other PR will land after.
14c154a to
9e31ddf
Compare
| "Bloom filter bits/key", | ||
| "Bloom filter hits", | ||
| "Bloom filter misses", | ||
| "Bloom filter hit rate"]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hit rate should move below, at the PercentValue case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I think you meant line 489 needs to be moved upward, line 380 is the category mapping.
9e31ddf to
ba8cca4
Compare
blp
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well done, thank you
Tracks for each Bloom filter the number of hits and misses. They are included in the statistics that can be retrieved by the spine. The spine sums up over all the Bloom filter statistics it receives from its batches, which loses insight into individual batch statistics. The spine reports three new metadata fields: - "Bloom filter hits": usize integer - "Bloom filter misses": usize integer - "Bloom filter hit rate": percentage The Web Console profiler is updated to display the new metadata fields. Signed-off-by: Simon Kassing <[email protected]>
ba8cca4 to
13d975d
Compare


Tracks for each Bloom filter the number of hits and misses. They are included in the statistics that can be retrieved by the spine. The spine sums up over all the Bloom filter statistics it receives from its batches, which loses insight into individual batch statistics. The spine reports three new metadata fields:
The Web Console profiler is updated to display the new metadata fields.
PR information: