-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar #441
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Limit/TopN/Sort/DistinctLimit node if it's source is a scalar #441
Conversation
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneLimitOverScalar.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneTopNOverScalar.java
Outdated
Show resolved
Hide resolved
presto-main/src/main/java/io/prestosql/sql/planner/iterative/rule/PruneSortOverScalar.java
Outdated
Show resolved
Hide resolved
f32461d
to
f11830c
Compare
@findepi Have updated it. Please let me know if there are any more changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also DistinctLimitNode
public class PruneLimitOverScalar | ||
implements Rule<LimitNode> | ||
{ | ||
private static final Pattern<LimitNode> PATTERN = limit().matching(limitNode -> limitNode.getCount() != 0L); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -64,6 +64,7 @@ | |||
import io.prestosql.sql.planner.iterative.rule.PruneJoinChildrenColumns; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update commit message, here in other commits
|
||
/** | ||
* Remove TopN node when source is scalar | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove comments that do not add value to the code, the name PruneTopNOverScalar
says the same.
p.limit( | ||
10, | ||
p.aggregation( | ||
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #437 (comment)
p.limit( | ||
10, | ||
p.aggregation( | ||
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
p.limit( | ||
0, | ||
p.aggregation( | ||
(Consumer<PlanBuilder.AggregationBuilder>) aggregationBuilder -> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
import static io.prestosql.sql.planner.plan.Patterns.limit; | ||
|
||
/** | ||
* Remove Limit node when source is scalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... when the subplan is guaranteed to produce fewer rows than the limit"
/** | ||
* Remove Limit node when source is scalar | ||
*/ | ||
public class PruneLimitOverScalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd call this "RemoveRedundantLimit"
53d2872
to
38f4a2a
Compare
public class RemoveRedundantLimit | ||
implements Rule<LimitNode> | ||
{ | ||
private static final Pattern<LimitNode> PATTERN = limit().with(count().matching(limitCount -> limitCount != 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with limitCount == 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since both RemoveRedundantLimit
and EvaluateZeroLimit
rules are in the same optimizer , we don't want LimitNode
with 0 count to be removed if its source is scalar as it might not fire EvaluateZeroLimit
rule
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could merge these two rules together, they are very simple and they both remove redundant limit
.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0)
with Values
. Then order would not bother.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0) with Values. Then order would not bother.
I like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments. Also, can you add some tests similar to those in presto-main/src/test/java/io/prestosql/sql/query ?
public Result apply(TopNNode node, Captures captures, Context context) | ||
{ | ||
if (isAtMost(node.getSource(), context.getLookup(), node.getCount())) { | ||
return Result.ofPlanNode(node.getSource()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TopN cannot be removed blindly. The ordering matters. You can replace it with a SortNode in this case.
It's only safe to remove if the row count is guaranteed to be 1.
import static io.prestosql.sql.planner.optimizations.QueryCardinalityUtil.isScalar; | ||
import static io.prestosql.sql.planner.plan.Patterns.sort; | ||
|
||
public class PruneSortOverScalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to RemoveSingleRowSort
. "scalar" is no a proper classification for a subquery -- it's a feature of the context in which it's used. I.e., "in a place that expects a scalar value". (The "isScalar" method in QueryCardinalityUtil is misnamed)
import static io.prestosql.sql.planner.plan.Patterns.DistinctLimit.limit; | ||
import static io.prestosql.sql.planner.plan.Patterns.distinctLimit; | ||
|
||
public class PruneDistinctLimitOverScalar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to RemoveSingleRowDistinctLimit
5b1c546
to
3ec0abb
Compare
public class RemoveRedundantLimit | ||
implements Rule<LimitNode> | ||
{ | ||
private static final Pattern<LimitNode> PATTERN = limit().with(count().matching(limitCount -> limitCount != 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could merge these two rules together, they are very simple and they both remove redundant limit
.
Another option is add third function which replaces any plan that isAtMost(node, context.getLookup(), 0)
with Values
. Then order would not bother.
anyNot( | ||
LimitNode.class, | ||
anyTree( | ||
tableScan("orders"))))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another approach would be to use something similar to io.prestosql.sql.planner.TestLogicalPlanner#assertPlanContainsNoApplyOrAnyJoin
where you would check that there is no Limit
or TopN
in the plan. Plan assertion is a bit simpler that way. Reasoning of anyNot
that is wrapping anyTree
might be not trivial, and I am not sure if the pattern is correct.
@@ -79,7 +79,7 @@ public void testUnsupportedSubqueriesWithCoercions() | |||
{ | |||
// coercion from subquery symbol type to correlation type | |||
assertions.assertFails( | |||
"select (select count(*) from (values 1) t(a) where t.a=t2.b limit 1) from (values 1.0) t2(b)", | |||
"select (select count(*) from (values 1) t(a) where t.a=t2.b GROUP BY t.a limit 1) from (values 1.0) t2(b)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please upper case SQL keywords, here in other tests below as separate commit before this change.
// explicit LIMIT in subquery | ||
assertQueryFails( | ||
"SELECT (SELECT count(*) FROM (VALUES (7,1)) t(orderkey, value) WHERE orderkey = corr_key LIMIT 1) FROM (values 7) t(corr_key)", | ||
"SELECT (SELECT count(*) FROM (VALUES (7,1)) t(orderkey, value) WHERE orderkey = corr_key GROUP BY value LIMIT 1) FROM (values 7) t(corr_key)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, you have just extended support for correlated subqueries a bit ;)
{ | ||
assertPlan( | ||
"SELECT count(*) FROM orders ORDER BY 1", | ||
output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments
presto-main/src/test/java/io/prestosql/sql/planner/TestLogicalPlanner.java
Outdated
Show resolved
Hide resolved
{ | ||
assertPlan( | ||
"SELECT count(*) FROM orders ORDER BY 1 LIMIT 10", | ||
output( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comments
public class RemoveSingleRowDistinctLimit | ||
implements Rule<DistinctLimitNode> | ||
{ | ||
private static final Pattern<DistinctLimitNode> PATTERN = distinctLimit().with(limit().matching(limit -> limit != 0)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you extend this rule to support regular MarkDistinctNode
as well
assertPlan( | ||
"SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10", | ||
output( | ||
node(DistinctLimitNode.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DisitinctLimitNode
was not pruned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the subplan here is not a scalar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I overlooked GROUP BY
.
public Result apply(DistinctLimitNode node, Captures captures, Context context) | ||
{ | ||
if (isScalar(node.getSource(), context.getLookup())) { | ||
return Result.ofPlanNode(node.getSource()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are losing output symbols here. See hashSymbol
in DistinctLimitNode
. Also notice that DistinctLimitNode::getOutputSymbols
return distinctSymbols
which might be different than node.getSource().getOutputSymbols()
.
I wonder why test didn't find that already, so please make sure that there is test coverage for that. Can you please run your test from TestLogicalPlanner
with coverage or debugging to see if you rule was triggered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes but the hashSymbol
in DistinctLimitNode
will be added only if we set optimize_hash_generation
as true and IIRC it will be added in HashGenerationOptimizer
which will be invoked after this optimizer. So we can safely assume that hashSymbol
will be empty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then please verify in rule that hashSymbol
is empty. Also verify that DistinctLimitNode::getOutputSymbols
are same as node.getSource().getOutputSymbols()
.
6bd6771
to
fd9e2a9
Compare
{ | ||
assertFalse( | ||
searchFrom(plan("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10", LogicalPlanner.Stage.OPTIMIZED).getRoot()) | ||
.where(isInstanceOfAny(DistinctLimitNode.class)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also check there is no MarkDistinctNode
. You could also extract a method from this and reuse in assertion below, like:
assertFalse(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders) LIMIT 10");
assertTrue(planContainsDistinctNode("SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10"));
Please do the same for TopN
and Sort
.
assertPlan( | ||
"SELECT distinct(c) FROM (SELECT count(*) as c FROM orders GROUP BY orderkey) LIMIT 10", | ||
output( | ||
node(DistinctLimitNode.class, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I overlooked GROUP BY
.
searchFrom(plan("SELECT count(*) FROM orders ORDER BY 1", LogicalPlanner.Stage.OPTIMIZED).getRoot()) | ||
.where(isInstanceOfAny(SortNode.class)) | ||
.matches(), | ||
"Unexpected node for the above query"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unexpected node for the above query
-> format("Unexpected sort node for query: '%s'", query)
. To the same for all below and above.
fd9e2a9
to
3105607
Compare
582e861
to
354c49f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Praveen2112, I think there are still some comments that need to be addressed.
searchFrom(plan(query, LogicalPlanner.Stage.OPTIMIZED).getRoot()) | ||
.where(isInstanceOfAny(SortNode.class)) | ||
.matches(), | ||
format("Unexpected sor node for query: '%s'", query)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: "sor"
{ | ||
String query = "SELECT count(*) FROM orders LIMIT 10"; | ||
assertFalse( | ||
searchFrom(plan(query, LogicalPlanner.Stage.OPTIMIZED).getRoot()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static import for OPTIMIZED
@@ -79,7 +79,7 @@ public void testUnsupportedSubqueriesWithCoercions() | |||
{ | |||
// coercion FROM subquery symbol type to correlation type | |||
assertions.assertFails( | |||
"SELECT (SELECT count(*) FROM (VALUES 1) t(a) WHERE t.a=t2.b LIMIT 1) FROM (VALUES 1.0) t2(b)", | |||
"SELECT (SELECT count(*) from (VALUES 1) t(a) where t.a=t2.b GROUP BY t.a LIMIT 1) from (VALUES 1.0) t2(b)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should belong to previous commit
limit(10, | ||
anyTree( | ||
tableScan("orders"))))); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also test SELECT * FROM (VALUES 1,2,3,4,5,6) LIMIT 10
?
"SELECT (SELECT t.a FROM (VALUES 1, 2) t(a) WHERE t.a=t2.b LIMIT 2) FROM (VALUES 1) t2(b)", | ||
"VALUES 1"); | ||
// cannot enforce limit is less than cardinality of correlated subquery |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// cannot enforce LIMIT on correlated subquery
@Override | ||
public Result apply(TopNNode node, Captures captures, Context context) | ||
{ | ||
if (isScalar(node.getSource(), context.getLookup())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't you use isAtMost
here?
public Result apply(TopNNode node, Captures captures, Context context) | ||
{ | ||
if (isScalar(node.getSource(), context.getLookup())) { | ||
return Result.ofPlanNode(new SortNode(context.getIdAllocator().getNextId(), node.getSource(), node.getOrderingScheme())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but inlining this here would save us one iterative optimizer loop. Also I think you could handle the case with limit = 0 here.
However, as you pointed it could be a matter of taste. Up to you.
@Override | ||
public Result apply(SortNode node, Captures captures, Context context) | ||
{ | ||
if (isScalar(node.getSource(), context.getLookup())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also handle the case where cardinality is 0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
@@ -94,6 +94,7 @@ | |||
import io.prestosql.sql.planner.iterative.rule.RemoveFullSample; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, DistinctLimit with limit higher than cardinality of its source node can be rewritten to DistinctNode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will merge that feature to RemoveSingleRowDistinctLimit
febdab8
to
e4d4d1b
Compare
@@ -449,7 +451,7 @@ public void testCorrelatedSubqueries() | |||
{ | |||
assertPlan( | |||
"SELECT orderkey FROM orders WHERE 3 = (SELECT orderkey)", | |||
LogicalPlanner.Stage.OPTIMIZED, | |||
OPTIMIZED, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This could be extracted as separate commit.
@@ -42,7 +42,6 @@ | |||
import io.prestosql.sql.planner.iterative.rule.DetermineSemiJoinDistributionType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commit message
Prune unnecessary TopNNode
Replace TopN node
1. With a Sort node when the subplan is guaranteed to produce fewer rows than N
2. With it's source node when the subplan produces single row
3. With a Values node when N is 0
if (isScalar(node.getSource(), context.getLookup())) { | ||
return Result.ofPlanNode(node.getSource()); | ||
} | ||
else if (isAtMost(node.getSource(), context.getLookup(), node.getCount())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: else
is redundant
@@ -906,4 +906,35 @@ public void testRemoveSingleRowSort() | |||
anyTree( | |||
tableScan("orders"))))); | |||
} | |||
|
|||
public void testRemoveRedundantTopN() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
testRedundantTopNRemoval
?
output( | ||
node(ValuesNode.class))); | ||
|
||
query = "SELECT * FROM (VALUES 1,2,3,4,5,6) AS t1 ORDER BY 1 LIMIT 10"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please extract each test case as separate test method?
node(AggregationNode.class, | ||
node(ValuesNode.class))); | ||
|
||
tester().assertThat(new RemoveRedundantTopN()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please extract each test case as separate test method?
@Override | ||
public Result apply(SortNode node, Captures captures, Context context) | ||
{ | ||
if (isScalar(node.getSource(), context.getLookup())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bump
.matches(), | ||
format("Unexpected sort node for query: '%s'", query)); | ||
|
||
query = "SELECT orderkey, count(*) FROM orders GROUP BY orderkey ORDER BY 1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please extract each test case as separate test method?
return Result.ofPlanNode(node.getSource()); | ||
} | ||
else if (isAtMost(node.getSource(), context.getLookup(), node.getLimit())) { | ||
return Result.ofPlanNode(new AggregationNode(node.getId(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why replacing Distinct with Aggregation is better? Shouldn't you use regular DistinctNode here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using MarkDistinct
node requires an additional FilterNode
and ProjectNode
so used the AggregatioNode
with no aggregation functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's ok. MarkDistinct serves a different purpose. @kokosing, there's no explicit DistinctNode -- it's planned as an GROUP BY with no aggregation functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good.
anyTree( | ||
tableScan("orders"))))); | ||
|
||
query = "SELECT distinct(id) FROM (VALUES 1, 2, 3, 4, 5, 6) as t1 (id) LIMIT 10"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please extract each test case as separate test method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have added each test method for the optimizer we implemented. So should we write each pattern of queries as separate method ?
…n is know to single row or less rows than requested Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <[email protected]>
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <[email protected]>
In addition to the Cherry-pick for removing redundant Limit/TopN/Sort/DistinctLimit, there are a few more rules added to replace any input that is zero-TopN/DistinctLimit/Limit Cherry-pick of trinodb/trino#441 Co-authored-by: praveenkrishna <[email protected]>
No description provided.