-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Derive effective predicate for Values #743
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54add8e
to
53e0e2a
Compare
@@ -15,12 +15,15 @@ | |||
|
|||
import com.google.common.collect.ImmutableBiMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do something about excessive filter:
- Filter[filterPredicate = ("field" IN ('F', 'O'))]
I saw this problem before with table scans and partition columns.
When partition values from one side of join were pushed to the other side, then there is always filter above table scan with col IN partition_values
.
I tried to tackle it in starburstdata/facebook-presto@7081f3e
but it didn't work unfortunately because inference cannot easily reduce
inheritedValue = 1 OR inheritedValue = 2
inheritedValue = tableScanValue
tableScanValue = 1 OR tableScanValue = 2
to simply true
(after simplification) and TupleDomain
doesn't have true contains
method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, there's not much we can do about it, and I haven't investigated it further. On the other hand, it's a filter over a ValuesNode, so it's inexpensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could be meaningful though if filter contains 100s of partition values, but it's out of scope of this PR
53e0e2a
to
650ff6f
Compare
return TRUE_LITERAL; | ||
} | ||
|
||
ExpressionInterpreter interpreter = ExpressionInterpreter.expressionOptimizer(value, metadata, session, typeAnalyzer.getTypes(session, types, value)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe ValuesNode
should only contain Literal
as rows (I think it implicitly does)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventuallly, we should probably have a physical vs logical values node or something else to represent that. Currently, a values node inside a correlated subquery could have a reference to the outer scope.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, a values node inside a correlated subquery could have a reference to the outer scope.
That's a problem on it's own. See: #473
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It’s not a “problem” per se. It’s legitimate for a query to reference variables from the enclosing scope. After all, filters, projections, subqueries, etc, are nothing but capturing lamba expressions (if you squint a little)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After all, filters, projections, subqueries, etc, are nothing but capturing lamba expressions (if you squint a little)
Still, I don't think correlated symbols should appear out of nothing in any node that contains Expression
, but should rather be injected into subquery via some kind of special ProjectCorrelations
node. This way subquery support can be narrowed down in various places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be a “bind” operation, just like lambda(x) { ... } binds x.
The next generation of the IR that I’ve been prototyping for a while will likely unify both concepts. Anyway, topic for another time :)
for (int row = 0; row < node.getRows().size(); row++) { | ||
Expression value = node.getRows().get(row).get(column); | ||
|
||
if (!DeterminismEvaluator.isDeterministic(value)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you could still derive domains for other symbols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, the predicate needs to be an “upper bound”. If we had a single row, we could do that, but otherwise, there’s no way to formulate such a predicate.
650ff6f
to
1a11e03
Compare
Use tables to store the points/polygons used in the tests. This is needed for an upcoming change to derivery predicates from inline tables, which causes the plans used in the tests to constant-fold all the way.
This allows Join inference to push the derived predicate to the other side when one side is based on a Values node. For a query like: SELECT 1 FROM orders JOIN (values 'O', 'F') t(x) ON orders.orderstatus = t.x; It changes the plan from: - Output[_col0] - Project[] - InnerJoin[("orderstatus" = "field")] - TableScan[tpch:orders:sf0.01] orderstatus := tpch:orderstatus :: [[F], [O], [P]] - Values ('O') ('F') to: - Output[_col0] - Project[] - InnerJoin[("orderstatus" = "field")] - ScanFilter[table = tpch:orders:sf0.01, filterPredicate = ("orderstatus" IN ('F', 'O'))] orderstatus := tpch:orderstatus :: [[F], [O]] - Filter[filterPredicate = ("field" IN ('F', 'O'))] - Values ('O') ('F')
1a11e03
to
4163de2
Compare
This allows Join inference to push the derived predicate
to the other side when one side is based on a Values node.
For a query like:
It changes the plan from:
to: