Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit c9c8fb7

Browse files
authored
Pushdown some expressions to Dict layout reader (#8341)
When we access values of Dict layout reader, it canonicalizes them and stores them in a SharedArray. This means we always pay the cost of canonicalization which in turn means we can't do #8310 . In order to solve this issue, we need to apply some expressions to the values array before canonicalizing it. However, we can't push down arbitrary expressions as it may be beneficial to apply them over canonicalized array. One example of such expressions is LIKE over a Dict array with few codes used. Applying LIKE to whole values array is not beneficial. This PR adds a hardcoded internal is_negative_cost estimation for expressions that we want to push before canonicalization. A hint for these are expressions which don't depend on individual input size. As an example, for every string, len(string) doesn't read the string itself but reads the metadata and thus is O(1) on individual input. We don't push down fallible (like cast) or null sensitive (like IS NULL) expressions as well because we want to propagate the errors at call site rather than upfront. Signed-off-by: Mikhail Kot <[email protected]>
1 parent 31fda42 commit c9c8fb7

4 files changed

Lines changed: 283 additions & 9 deletions

File tree

vortex-array/src/expr/analysis/annotation.rs

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,25 @@ pub fn descendent_annotations<A: AnnotationFn>(
4242
let mut visitor = AnnotationVisitor {
4343
annotations: Default::default(),
4444
annotate,
45+
propagate_up: true,
46+
};
47+
expr.accept(&mut visitor).vortex_expect("Infallible");
48+
visitor.annotations
49+
}
50+
51+
/// Walk the expression tree and annotate each expression with zero or more
52+
/// annotations.
53+
///
54+
/// Returns a map of each expression to all annotations. Annotations of
55+
/// children are not propagated to parents.
56+
pub fn direct_annotations<A: AnnotationFn>(
57+
expr: &Expression,
58+
annotate: A,
59+
) -> Annotations<'_, A::Annotation> {
60+
let mut visitor = AnnotationVisitor {
61+
annotations: Default::default(),
62+
annotate,
63+
propagate_up: false,
4564
};
4665
expr.accept(&mut visitor).vortex_expect("Infallible");
4766
visitor.annotations
@@ -50,6 +69,7 @@ pub fn descendent_annotations<A: AnnotationFn>(
5069
struct AnnotationVisitor<'a, A: AnnotationFn> {
5170
annotations: Annotations<'a, A::Annotation>,
5271
annotate: A,
72+
propagate_up: bool,
5373
}
5474

5575
impl<'a, A: AnnotationFn> NodeVisitor<'a> for AnnotationVisitor<'a, A> {
@@ -70,6 +90,9 @@ impl<'a, A: AnnotationFn> NodeVisitor<'a> for AnnotationVisitor<'a, A> {
7090
}
7191

7292
fn visit_up(&mut self, node: &'a Expression) -> VortexResult<TraversalOrder> {
93+
if !self.propagate_up {
94+
return Ok(TraversalOrder::Continue);
95+
}
7396
let child_annotations = node
7497
.children()
7598
.iter()

vortex-array/src/expr/transform/partition.rs

Lines changed: 14 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
use std::fmt::Display;
55
use std::fmt::Formatter;
6+
use std::hash::Hash;
67

78
use itertools::Itertools;
89
use vortex_error::VortexExpect;
@@ -49,11 +50,22 @@ where
4950
{
5051
// Annotate each expression with the annotations that any of its descendent expressions have.
5152
let annotations = descendent_annotations(&expr, annotate_fn);
53+
partition_annotations(expr.clone(), scope, annotations)
54+
}
5255

56+
pub fn partition_annotations<A>(
57+
expr: Expression,
58+
scope: &DType,
59+
annotations: Annotations<A>,
60+
) -> VortexResult<PartitionedExpr<A>>
61+
where
62+
A: Display + Clone + Eq + Hash,
63+
FieldName: From<A>,
64+
{
5365
// Now we split the original expression into sub-expressions based on the annotations, and
5466
// generate a root expression to re-assemble the results.
55-
let mut splitter = StructFieldExpressionSplitter::<A::Annotation>::new(&annotations);
56-
let root = expr.clone().rewrite(&mut splitter)?.value;
67+
let mut splitter = StructFieldExpressionSplitter::<A>::new(&annotations);
68+
let root = expr.rewrite(&mut splitter)?.value;
5769

5870
let mut partitions = Vec::with_capacity(splitter.sub_expressions.len());
5971
let mut partition_annotations = Vec::with_capacity(splitter.sub_expressions.len());

vortex-array/src/scalar_fn/mod.rs

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@
99
1010
use vortex_session::registry::Id;
1111

12+
use crate::scalar_fn::fns::byte_length::ByteLength;
13+
use crate::scalar_fn::fns::get_item::GetItem;
14+
use crate::scalar_fn::fns::literal::Literal;
15+
1216
mod vtable;
1317
pub use vtable::*;
1418

@@ -48,3 +52,17 @@ mod sealed {
4852
/// This can be the **only** implementor for [`super::typed::DynScalarFn`].
4953
impl<V: ScalarFnVTable> Sealed for TypedScalarFnInstance<V> {}
5054
}
55+
56+
/// A scalar function has a negative cost if applying it to an array and
57+
/// canonicalizing is cheaper than canonicalizing an array and applying it.
58+
///
59+
/// Example of negative cost expressions are byte_length() and get_item() since
60+
/// they don't depend on input size.
61+
///
62+
/// Example of non-negative cost expression is like() as it's linear over
63+
/// individual input.
64+
pub fn is_negative_cost(id: ScalarFnId) -> bool {
65+
id == ScalarFnVTable::id(&ByteLength)
66+
|| id == ScalarFnVTable::id(&GetItem)
67+
|| id == ScalarFnVTable::id(&Literal)
68+
}

0 commit comments

Comments
 (0)