-
Notifications
You must be signed in to change notification settings - Fork 195
Use trie abstractions for batch implementations #616
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great; let's go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the bespoke trie logic in ord_neu.rs
into reusable “layer” abstractions (Vals
, Upds
, and the UpdsBuilder
), and updates both the value‐batch and key‐batch implementations (and the columnar.rs
example) to use these new layers instead of manual offset handling and singleton optimizations.
- Introduce
layers
module withVals
andUpds
containers and anUpdsBuilder
helper - Replace manual offset arrays and singleton‐optimization code in
OrdValStorage
,OrdKeyStorage
, and their builders - Update the
columnar.rs
example to use the new abstractions
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
differential-dataflow/src/trace/implementations/ord_neu.rs | Extracted trie layers into layers::{Vals, Upds} and rewrote val_batch /key_batch to use them |
differential-dataflow/examples/columnar.rs | Updated example builders to leverage Vals , Upds , and UpdsBuilder |
Comments suppressed due to low confidence (2)
differential-dataflow/src/trace/implementations/ord_neu.rs:161
- The doc comment refers to
self.vals
, but this method actually usesself.offs
to compute offsets intotimes
/diffs
. Update the comment to accurately describe what is being bounded (e.g., "offsets intotimes
anddiffs
").
/// Lower and upper bounds in `self.vals` of the indexed list.
differential-dataflow/src/trace/implementations/ord_neu.rs:143
- [nitpick] The abbreviation
Upds
may be unclear to new readers. Consider renaming to a more descriptive name (e.g.,Updates
) to align withVals
and improve readability.
pub struct Upds<O, T, D> {
Our
ord_neu.rs
batch/trace implementations use a manual implementation of a short trie. Rather than have the work happen in bespoke methods we need to copy/paste, extract the logic out into trie "layers" which can be composed. For example, the "singleton optimization" for updates lived in four locations, with a fifth inrhh.rs
. This change moves that to be one location, in the update trie layer, used by all four.This is the first step in trying to make these types more "trie-forward", revealing their layered structure rather than living behind abstractions that conceal the structure. The goal for the moment is to get a sense for what the code looks like when you compartmentalize and modularize the logic and data. So far, pretty good!
Historically we had something similar, though it was more complicated than it needed to be. The reason seems to be that we previously had as trie layers pairs
(Vec<T>, Vec<usize>)
to indicate a list of keys and their offsets in the next layer. It turns out that(Vec<usize>, Vec<T>)
is a better representation, with fewer cross-layer dependencies.