chore: fine tune frequencies and theta sketches#51
Conversation
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
Signed-off-by: tison <[email protected]>
| mod serialization; | ||
| mod sketch; | ||
|
|
||
| pub use self::serde::ItemsSerde; |
There was a problem hiding this comment.
To @PsiACE,
No need ItemsSerde if we already implement serde over concrete FrequentItemsSketch<i64> and FrequentItemsSketch<String>.
But the C++ impl does have a Serde abstraction you may investigate in https://github.com/apache/datasketches-cpp/blob/7bb979d3/common/include/serde.hpp
There was a problem hiding this comment.
cc @AlexanderSaydakov I'd appreciate it if you can share the story and usage of serde.hpp.
| load_threshold: usize, | ||
| keys: Vec<Option<T>>, | ||
| values: Vec<i64>, | ||
| values: Vec<u64>, |
There was a problem hiding this comment.
All offsets and counts are non-negative. Use u64 for all these places.
| /// Shifts all values by `adjust_amount`. | ||
| /// | ||
| /// This is used during purges to decrement counters. | ||
| pub fn adjust_all_values_by(&mut self, adjust_amount: i64) { |
There was a problem hiding this comment.
Revoke pub when it's only used in the same file. Less publicity, easier reasoning.
| #[should_panic(expected = "count may not be negative")] | ||
| fn test_longs_negative_count_panics() { | ||
| let mut sketch: FrequentItemsSketch<i64> = FrequentItemsSketch::new(8); | ||
| sketch.update_with_count(1, -1); | ||
| } | ||
|
|
||
| #[test] | ||
| #[should_panic(expected = "count may not be negative")] | ||
| fn test_items_negative_count_panics() { | ||
| let mut sketch = FrequentItemsSketch::new(8); | ||
| sketch.update_with_count("a".to_string(), -1); | ||
| } | ||
|
|
There was a problem hiding this comment.
Never panic by type check.
Signed-off-by: tison <[email protected]>
| pub use self::hash_table::ResizeFactor; | ||
| pub use self::sketch::ThetaSketch; | ||
| pub use self::sketch::ThetaSketchBuilder; |
There was a problem hiding this comment.
To @ZENOTME,
ThetaSketchBuilder and ResizeFactor are needed because ThetaSketch::builder exposes the builder, which has a resize_factor method that only accepts a ResizeFactor instance.
Signed-off-by: tison <[email protected]>
| #[derive(Debug, Clone, Copy, PartialEq, Eq)] | ||
| pub enum ResizeFactor { | ||
| /// Do not resize. Sketch will be configured to full size. | ||
| X1, | ||
| /// Resize by factor of 2 | ||
| X2, | ||
| /// Resize by factor of 4 | ||
| X4, | ||
| /// Resize by factor of 8 | ||
| X8, | ||
| } |
There was a problem hiding this comment.
Moved ResizeFactor as a top-level concept. This is shared among ThetaSketch, TupleSketch, and several sampling sketches.
cc @leerho for double checking that put ResizeFactor a top-level exported symbol looks good.
|
@leerho this PR is mergable. I'll have a follow-up to use the new codec utils for frequencies serde impls. But I don't extend this PR further to make it too mixed. |
cc @PsiACE @ZENOTME