-
Notifications
You must be signed in to change notification settings - Fork 23
ADBDEV-6886: Add support parameterized clauses in the subplan with volatile functions #1240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: adb-7.2.0
Are you sure you want to change the base?
Conversation
silent-observer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like an incomplete solution, for example SELECT (SELECT a + random() + few.id FROM generate_series(1, 10) a LIMIT 1 OFFSET few.id) FROM few; causes ERROR: illegal rescan of motion node. This is because + few.id part is placed into the Function Scan node, and so Materialize tries to rescan it when parameter value changes. Is this patch only supposed to fix the narrow issue with parameters in LIMIT/OFFSET clauses?
This is a common problem when running volatile functions in a subplan. Apparently this query does not work anyway (after merge 9a1e48c)
Yes, I'm trying to fix Limit-Offset. In my case it is possible to pull up the parameterized Limit above the Motion. For the underlying nodes a different approach is needed. |
There are probably several plan nodes similar to LIMIT-OFFSET parametrization behaviour (I.E. nodes, for which it is possible to isolate parametrization as it's done in current patch, without parametrizing underlying nodes). Shouldn't we generalize them as well? Or this should be done in separate ticket? Like (without error throwng patches): |
It looks like the parameterized Order By operator can also be raised above Motion. We can add Motion to Order By when creating its path (similar to the current patch). Or when creating the table scan path (I assume). If we choose the first method, then this can be done in a separate ticket. |
First of all, I'd suggest to research the existance of similar cases with other nodes (probably its not only order by), and research the possibility to apply your logic in more general manner, without concentration on the specific edge case. Draw some conclusions, then decide whether we should leave everything as it is and cover only the limit case or make the planning of such queries more wise |
This query still doesn't work. The parameter is passed inside the scan. explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a
ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
QUERY PLAN
-------------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: ((SubPlan 1))
-> Seq Scan on public.limit_tbl
Output: (SubPlan 1)
SubPlan 1
-> Limit
Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
-> Result
Output: f(a.a), (abs((a.a - limit_tbl.i)))
-> Sort
Output: (abs((a.a - limit_tbl.i))), a.a
Sort Key: (abs((a.a - limit_tbl.i)))
-> Materialize
Output: (abs((a.a - limit_tbl.i))), a.a
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: (abs((a.a - limit_tbl.i))), a.a
-> Function Scan on pg_catalog.generate_series a
Output: abs((a.a - limit_tbl.i)), a.a
Function Call: generate_series(1, 10)
Optimizer: Postgres-based planner
Settings: optimizer = 'off'
(21 rows)
|
|
It works: explain (verbose, costs off)
select (select f(a) from generate_series(1,10) a order by f(a) limit 1 offset limit_tbl.i) from limit_tbl;
QUERY PLAN
-------------------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: ((SubPlan 1))
-> Seq Scan on public.limit_tbl
Output: (SubPlan 1)
SubPlan 1
-> Limit
Output: (f(a.a))
-> Sort
Output: (f(a.a))
Sort Key: (f(a.a))
-> Materialize
Output: (f(a.a))
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: (f(a.a))
-> Function Scan on pg_catalog.generate_series a
Output: f(a.a)
Function Call: generate_series(1, 10)
Optimizer: Postgres-based planner
Settings: optimizer = 'off'
select (select f(a) from generate_series(1,10) a order by f(a) limit 1 offset limit_tbl.i) from limit_tbl;
f
---
2
3
4
|
|
Should we do something with that cases:
|
| * For non-top slice, if this motion is QE singleton and subplan's locus | ||
| * is CdbLocusType_SegmentGeneral, omit this motion. | ||
| */ | ||
| shouldOmit |= context->sliceDepth > 0 && | ||
| context->currentPlanFlow->flotype == FLOW_SINGLETON && | ||
| shouldOmit |= context->currentPlanFlow->flotype == FLOW_SINGLETON && | ||
| context->currentPlanFlow->segindex == 0 && | ||
| motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral; | ||
| (motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral || | ||
| motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment says "non-top slice". Is there the case when we could mistakenly omit the motion in case of context->sliceDepth > 0 && motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the condition is weakened. But is there a way to strictly identify our specific case with motion?
| One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1) | ||
| Optimizer: Postgres query optimizer | ||
| (9 rows) | ||
| -> Materialize |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
I'll ask again, is tuplestore of Material node refilled during SubPlan rescan?
-
If so, is there the simple way to omit material node?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a check when we omit Motion so that the upper Materilize node is also omitted (in fix_outer_query_motions_mutator()).
| if (CdbPathLocus_IsGeneral(origpath->locus) || | ||
| CdbPathLocus_IsOuterQuery(origpath->locus)) | ||
| if (CdbPathLocus_IsGeneral(origpath->locus) || CdbPathLocus_IsOuterQuery(origpath->locus) || | ||
| ((CdbPathLocus_IsSegmentGeneral(origpath->locus) || CdbPathLocus_IsSingleQE(origpath->locus)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What CdbPathLocus_IsSingleQE(origpath->locus) condition stands for?
-
Should be difference ( i mean plans with and without patch differ) for join plans like
explain (costs off, verbose) SELECT (SELECT f(t_repl.i) from t_repl join t_strewn using(i) where t_repl.j < few.id) FROM few;
taken into account? I suggest just to test join plans to find some side effects of this condition. At first glance nothing drastic happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- We postpone changing the locus and adding Motion if we initially have a General or SingleQE locus and we also have a volatile function. Otherwise we may capture unnecessary cases. For example this one.
- It seems that the principle of adding Motion on top of Result does not always work. Although the query works without the patch, it doesn't fit into the "single data set" principle, so there should probably be a different plan here:
without patch:
postgres=# explain (costs off, verbose) SELECT (SELECT f(t_repl.i) from t_repl join t_strewn using(i) where t_repl.j < few.id) FROM few;
QUERY PLAN
-----------------------------------------------------------------------------
Gather Motion 3:1 (slice1; segments: 3)
Output: ((SubPlan 1))
-> Seq Scan on public.few
Output: (SubPlan 1)
SubPlan 1
-> Hash Join
Output: f(t_repl.i)
Hash Cond: (t_repl.i = t_strewn.i)
-> Result
Output: t_repl.i, t_repl.j
Filter: (t_repl.j < few.id)
-> Materialize
Output: t_repl.i, t_repl.j
-> Broadcast Motion 1:3 (slice2; segments: 1)
Output: t_repl.i, t_repl.j
-> Seq Scan on public.t_repl
Output: t_repl.i, t_repl.j
-> Hash
Output: t_strewn.i
-> Materialize
Output: t_strewn.i
-> Broadcast Motion 3:3 (slice3; segments: 3)
Output: t_strewn.i
-> Seq Scan on public.t_strewn
Output: t_strewn.iwith (checkMotionWithParam is off):
Gather Motion 3:1 (slice1; segments: 3)
Output: ((SubPlan 1))
-> Seq Scan on public.few
Output: (SubPlan 1)
SubPlan 1
-> Result
Output: f(t_repl.i)
-> Materialize
Output: t_repl.i
-> Broadcast Motion 3:3 (slice2; segments: 3)
Output: t_repl.i
-> Hash Join
Output: t_repl.i
Hash Cond: (t_repl.i = t_strewn.i)
-> Result
Output: t_repl.i, t_repl.j
Filter: (t_repl.j < few.id)
-> Seq Scan on public.t_repl
Output: t_repl.i, t_repl.j
-> Hash
Output: t_strewn.i
-> Seq Scan on public.t_strewn
Output: t_strewn.i| * For non-top slice, if this motion is QE singleton and subplan's locus | ||
| * is CdbLocusType_SegmentGeneral, omit this motion. | ||
| */ | ||
| shouldOmit |= context->sliceDepth > 0 && | ||
| context->currentPlanFlow->flotype == FLOW_SINGLETON && | ||
| shouldOmit |= context->currentPlanFlow->flotype == FLOW_SINGLETON && | ||
| context->currentPlanFlow->segindex == 0 && | ||
| motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral; | ||
| (motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral || | ||
| motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, the condition is weakened. But is there a way to strictly identify our specific case with motion?
|
Also, add comments to all newly added code. The logic is really unclear and should be described in details |
Add support parameterized clauses in the subplan with volatile function (#1240)
Problem:
Queries with parameterized clauses (e.g. LIMIT, ORDER BY) and volotile
functions could return incorrect results. This was due to the construction of
an invalid plan when Motion was added on top of a subplan with volatile
functions. Since parameters are not passed via Motion, parameterized clause did
not work correctly. Motion was added late in the subplan processing in
fix_subplan_motion() because the subplan had an Entry locus. Initially, after
the subplan is built, it has a General locus. However, due to the presence of a
volatile function in the subplan, the locus is changed to Entry. This is done to
ensure that the data set in the subplan is identical on all segments (more
information d1f9b96). However, in the case of the parameterized subplan, adding
Motion on top was too late.
Changes:
First, after "current_rel" is generated and a Result node (with a volatile
function) is added on top of its path, the locus is changed to SingleQE.
This is necessary in order to calculate the dataset on one segment. The change
occurs only for the General locus (data is available on any segment) and if
the root locus of the subplane is not Replicated (exclude this locus).
Further, if the subquery is correlated (has parameterized operators), then
distribute the dataset to all segments by adding a Motion.
To eliminate Motion(1:1) the fix_outer_query_motions_mutator function has been
fixed.
Also added a test case.
Ticket: ADBDEV-6886