ADBDEV-6886: Add support parameterized clauses in the subplan with volatile functions #1240

mos65o2 · 2025-03-02T21:18:31Z

Add support parameterized clauses in the subplan with volatile function (#1240)

Problem:
Queries with parameterized clauses (e.g. LIMIT, ORDER BY) and volotile
functions could return incorrect results. This was due to the construction of
an invalid plan when Motion was added on top of a subplan with volatile
functions. Since parameters are not passed via Motion, parameterized clause did
not work correctly. Motion was added late in the subplan processing in
fix_subplan_motion() because the subplan had an Entry locus. Initially, after
the subplan is built, it has a General locus. However, due to the presence of a
volatile function in the subplan, the locus is changed to Entry. This is done to
ensure that the data set in the subplan is identical on all segments (more
information d1f9b96). However, in the case of the parameterized subplan, adding
Motion on top was too late.

Changes:
First, after "current_rel" is generated and a Result node (with a volatile
function) is added on top of its path, the locus is changed to SingleQE.
This is necessary in order to calculate the dataset on one segment. The change
occurs only for the General locus (data is available on any segment) and if
the root locus of the subplane is not Replicated (exclude this locus).
Further, if the subquery is correlated (has parameterized operators), then
distribute the dataset to all segments by adding a Motion.
To eliminate Motion(1:1) the fix_outer_query_motions_mutator function has been
fixed.
Also added a test case.

Ticket: ADBDEV-6886

silent-observer

This seems like an incomplete solution, for example SELECT (SELECT a + random() + few.id FROM generate_series(1, 10) a LIMIT 1 OFFSET few.id) FROM few; causes ERROR: illegal rescan of motion node. This is because + few.id part is placed into the Function Scan node, and so Materialize tries to rescan it when parameter value changes. Is this patch only supposed to fix the narrow issue with parameters in LIMIT/OFFSET clauses?

src/backend/optimizer/util/pathnode.c

src/test/regress/sql/limit.sql

mos65o2 · 2025-03-03T12:59:54Z

This seems like an incomplete solution, for example SELECT (SELECT a + random() + few.id FROM generate_series(1, 10) a LIMIT 1 OFFSET few.id) FROM few; causes ERROR: illegal rescan of motion node. This is because + few.id part is placed into the Function Scan node, and so Materialize tries to rescan it when parameter value changes. Is this patch only supposed to fix the narrow issue with parameters in LIMIT/OFFSET clauses?

This is a common problem when running volatile functions in a subplan.
Motion is added inside such a subplan so that there is one data set for all segments.
If something contains parameters below Motion, this causes problems. As for the query you provided.

Apparently this query does not work anyway (after merge 9a1e48c)
postgres=# SELECT (SELECT a + random() + few.id FROM generate_series(1, 10) a LIMIT 1 OFFSET few.id) FROM few;
ERROR: Passing parameters across motion is not supported. (cdbmutate.c:2051)

Is this patch only supposed to fix the narrow issue with parameters in LIMIT/OFFSET clauses?

Yes, I'm trying to fix Limit-Offset. In my case it is possible to pull up the parameterized Limit above the Motion. For the underlying nodes a different approach is needed.

src/test/regress/sql/limit.sql

src/test/regress/expected/limit.out

bimboterminator1 · 2025-03-12T16:11:39Z

Yes, I'm trying to fix Limit-Offset. In my case it is possible to pull up the parameterized Limit above the Motion. For the underlying nodes a different approach is needed.

There are probably several plan nodes similar to LIMIT-OFFSET parametrization behaviour (I.E. nodes, for which it is possible to isolate parametrization as it's done in current patch, without parametrizing underlying nodes). Shouldn't we generalize them as well? Or this should be done in separate ticket? Like (without error throwng patches):

explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;

explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.limit_tbl
         Output: (SubPlan 1)
         SubPlan 1
           ->  Materialize
                 Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                 ->  Broadcast Motion 1:3  (slice2)
                       Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                       ->  Limit
                             Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                             ->  Result
                                   Output: f(a.a), (abs((a.a - limit_tbl.i)))
                                   ->  Sort
                                         Output: (abs((a.a - limit_tbl.i))), a.a
                                         Sort Key: (abs((a.a - limit_tbl.i)))
                                         ->  Function Scan on pg_catalog.generate_series a
                                               Output: abs((a.a - limit_tbl.i)), a.a
                                               Function Call: generate_series(1, 10)
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'
(21 rows)

postgres=# SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
 f 
---
 1
 1
 1
(3 rows)

src/test/regress/sql/limit.sql

src/backend/optimizer/util/pathnode.c

mos65o2 · 2025-03-13T07:35:42Z

Shouldn't we generalize them as well? Or this should be done in separate ticket?

It looks like the parameterized Order By operator can also be raised above Motion. We can add Motion to Order By when creating its path (similar to the current patch). Or when creating the table scan path (I assume). If we choose the first method, then this can be done in a separate ticket.

bimboterminator1 · 2025-03-13T07:44:09Z

It looks like the parameterized Order By operator can also be raised above Motion. We can add Motion to Order By when creating its path (similar to the current patch). Or when creating the table scan path (I assume). If we choose the first method, then this can be done in a separate ticket.

First of all, I'd suggest to research the existance of similar cases with other nodes (probably its not only order by), and research the possibility to apply your logic in more general manner, without concentration on the specific edge case. Draw some conclusions, then decide whether we should leave everything as it is and cover only the limit case or make the planning of such queries more wise

…nction

mos65o2 · 2025-04-16T09:44:17Z

There are probably several plan nodes similar to LIMIT-OFFSET parametrization behaviour (I.E. nodes, for which it is possible to isolate parametrization as it's done in current patch, without parametrizing underlying nodes). Shouldn't we generalize them as well? Or this should be done in separate ticket? Like (without error throwng patches):

explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;

explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.limit_tbl
         Output: (SubPlan 1)
         SubPlan 1
           ->  Materialize
                 Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                 ->  Broadcast Motion 1:3  (slice2)
                       Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                       ->  Limit
                             Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                             ->  Result
                                   Output: f(a.a), (abs((a.a - limit_tbl.i)))
                                   ->  Sort
                                         Output: (abs((a.a - limit_tbl.i))), a.a
                                         Sort Key: (abs((a.a - limit_tbl.i)))
                                         ->  Function Scan on pg_catalog.generate_series a
                                               Output: abs((a.a - limit_tbl.i)), a.a
                                               Function Call: generate_series(1, 10)
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'
(21 rows)

postgres=# SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
 f 
---
 1
 1
 1
(3 rows)

This query still doesn't work. The parameter is passed inside the scan.

explain (verbose, costs off) SELECT (SELECT f(a) FROM generate_series(1,10) a 
        ORDER BY abs(a - limit_tbl.i) limit 1 )
FROM limit_tbl;
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.limit_tbl
         Output: (SubPlan 1)
         SubPlan 1
           ->  Limit
                 Output: (f(a.a)), (abs((a.a - limit_tbl.i)))
                 ->  Result
                       Output: f(a.a), (abs((a.a - limit_tbl.i)))
                       ->  Sort
                             Output: (abs((a.a - limit_tbl.i))), a.a
                             Sort Key: (abs((a.a - limit_tbl.i)))
                             ->  Materialize
                                   Output: (abs((a.a - limit_tbl.i))), a.a
                                   ->  Broadcast Motion 1:3  (slice2; segments: 1)
                                         Output: (abs((a.a - limit_tbl.i))), a.a
                                         ->  Function Scan on pg_catalog.generate_series a
                                               Output: abs((a.a - limit_tbl.i)), a.a
                                               Function Call: generate_series(1, 10)
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'
(21 rows)

mos65o2 · 2025-04-16T09:46:40Z

It works:

explain (verbose, costs off)
select (select f(a) from generate_series(1,10) a order by f(a) limit 1 offset limit_tbl.i) from limit_tbl;
                                     QUERY PLAN                                      
-------------------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.limit_tbl
         Output: (SubPlan 1)
         SubPlan 1
           ->  Limit
                 Output: (f(a.a))
                 ->  Sort
                       Output: (f(a.a))
                       Sort Key: (f(a.a))
                       ->  Materialize
                             Output: (f(a.a))
                             ->  Broadcast Motion 1:3  (slice2; segments: 1)
                                   Output: (f(a.a))
                                   ->  Function Scan on pg_catalog.generate_series a
                                         Output: f(a.a)
                                         Function Call: generate_series(1, 10)
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'

select (select f(a) from generate_series(1,10) a order by f(a) limit 1 offset limit_tbl.i) from limit_tbl;
 f 
---
 2
 3
 4

bimboterminator1 · 2025-04-18T05:46:11Z

Should we do something with that cases:
1.

explain (costs off, verbose)  SELECT (SELECT (f(a))* random() from generate_series(1, 10)a where a > random() limit 1 offset few.id) FROM few;
                              QUERY PLAN                               
-----------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.few
         Output: (SubPlan 1)
         SubPlan 1
           ->  Limit
                 Output: (((f(a.a))::double precision * random()))
                 ->  Function Scan on pg_catalog.generate_series a
                       Output: ((f(a.a))::double precision * random())
                       Function Call: generate_series(1, 10)
                       Filter: ((a.a)::double precision > random())
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'
(13 rows)

SegmentGeneral

explain (costs off, verbose)  SELECT (SELECT (f(i)) from t_repl limit 1 offset few.id) FROM few;
                                 QUERY PLAN                                  
-----------------------------------------------------------------------------
 Gather Motion 3:1  (slice1; segments: 3)
   Output: ((SubPlan 1))
   ->  Seq Scan on public.few
         Output: (SubPlan 1)
         SubPlan 1
           ->  Limit
                 Output: (f(t_repl.i))
                 ->  Result
                       Output: f(t_repl.i)
                       ->  Materialize
                             Output: t_repl.i
                             ->  Broadcast Motion 1:3  (slice2; segments: 1)
                                   Output: t_repl.i
                                   ->  Seq Scan on public.t_repl
                                         Output: t_repl.i
 Optimizer: Postgres-based planner
 Settings: optimizer = 'off'
(17 rows)

bimboterminator1 · 2025-04-18T05:37:21Z

src/backend/cdb/cdbllize.c

 		 * For non-top slice, if this motion is QE singleton and subplan's locus
 		 * is CdbLocusType_SegmentGeneral, omit this motion.
 		 */
-		shouldOmit |= context->sliceDepth > 0 &&
-					  context->currentPlanFlow->flotype == FLOW_SINGLETON &&
+		shouldOmit |= context->currentPlanFlow->flotype == FLOW_SINGLETON &&
 					  context->currentPlanFlow->segindex == 0 &&
-					  motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral;
+					  (motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral ||
+					   motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE);


Comment says "non-top slice". Is there the case when we could mistakenly omit the motion in case of context->sliceDepth > 0 && motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE?

Yep, the condition is weakened. But is there a way to strictly identify our specific case with motion?

src/backend/optimizer/plan/planner.c

src/backend/optimizer/path/allpaths.c

bimboterminator1 · 2025-05-05T23:01:55Z

src/test/regress/expected/subselect.out

-                 One-Time Filter: ("*VALUES*".column1 = "*VALUES*".column1)
- Optimizer: Postgres query optimizer
-(9 rows)
+           ->  Materialize


I'll ask again, is tuplestore of Material node refilled during SubPlan rescan?

If so, is there the simple way to omit material node?

Added a check when we omit Motion so that the upper Materilize node is also omitted (in fix_outer_query_motions_mutator()).

bimboterminator1 · 2025-05-05T23:20:17Z

src/backend/optimizer/path/allpaths.c

-		if (CdbPathLocus_IsGeneral(origpath->locus) ||
-			CdbPathLocus_IsOuterQuery(origpath->locus))
+		if (CdbPathLocus_IsGeneral(origpath->locus) || CdbPathLocus_IsOuterQuery(origpath->locus) ||
+		   ((CdbPathLocus_IsSegmentGeneral(origpath->locus) || CdbPathLocus_IsSingleQE(origpath->locus))


What CdbPathLocus_IsSingleQE(origpath->locus) condition stands for?

Should be difference ( i mean plans with and without patch differ) for join plans like

explain (costs off, verbose) SELECT (SELECT f(t_repl.i) from t_repl join t_strewn using(i) where t_repl.j < few.id) FROM few;

taken into account? I suggest just to test join plans to find some side effects of this condition. At first glance nothing drastic happens.

We postpone changing the locus and adding Motion if we initially have a General or SingleQE locus and we also have a volatile function. Otherwise we may capture unnecessary cases. For example this one.

It seems that the principle of adding Motion on top of Result does not always work. Although the query works without the patch, it doesn't fit into the "single data set" principle, so there should probably be a different plan here:

without patch:

postgres=# explain (costs off, verbose) SELECT (SELECT f(t_repl.i) from t_repl join t_strewn using(i) where t_repl.j < few.id) FROM few; QUERY PLAN ----------------------------------------------------------------------------- Gather Motion 3:1 (slice1; segments: 3) Output: ((SubPlan 1)) -> Seq Scan on public.few Output: (SubPlan 1) SubPlan 1 -> Hash Join Output: f(t_repl.i) Hash Cond: (t_repl.i = t_strewn.i) -> Result Output: t_repl.i, t_repl.j Filter: (t_repl.j < few.id) -> Materialize Output: t_repl.i, t_repl.j -> Broadcast Motion 1:3 (slice2; segments: 1) Output: t_repl.i, t_repl.j -> Seq Scan on public.t_repl Output: t_repl.i, t_repl.j -> Hash Output: t_strewn.i -> Materialize Output: t_strewn.i -> Broadcast Motion 3:3 (slice3; segments: 3) Output: t_strewn.i -> Seq Scan on public.t_strewn Output: t_strewn.i

with (checkMotionWithParam is off):

Gather Motion 3:1 (slice1; segments: 3) Output: ((SubPlan 1)) -> Seq Scan on public.few Output: (SubPlan 1) SubPlan 1 -> Result Output: f(t_repl.i) -> Materialize Output: t_repl.i -> Broadcast Motion 3:3 (slice2; segments: 3) Output: t_repl.i -> Hash Join Output: t_repl.i Hash Cond: (t_repl.i = t_strewn.i) -> Result Output: t_repl.i, t_repl.j Filter: (t_repl.j < few.id) -> Seq Scan on public.t_repl Output: t_repl.i, t_repl.j -> Hash Output: t_strewn.i -> Seq Scan on public.t_strewn Output: t_strewn.i

bimboterminator1 · 2025-05-05T23:21:45Z

src/backend/cdb/cdbllize.c

 		 * For non-top slice, if this motion is QE singleton and subplan's locus
 		 * is CdbLocusType_SegmentGeneral, omit this motion.
 		 */
-		shouldOmit |= context->sliceDepth > 0 &&
-					  context->currentPlanFlow->flotype == FLOW_SINGLETON &&
+		shouldOmit |= context->currentPlanFlow->flotype == FLOW_SINGLETON &&
 					  context->currentPlanFlow->segindex == 0 &&
-					  motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral;
+					  (motion->plan.lefttree->flow->locustype == CdbLocusType_SegmentGeneral ||
+					   motion->plan.lefttree->flow->locustype == CdbLocusType_SingleQE);


Yep, the condition is weakened. But is there a way to strictly identify our specific case with motion?

bimboterminator1 · 2025-05-05T23:23:25Z

Also, add comments to all newly added code. The logic is really unclear and should be described in details

mos65o2 marked this pull request as ready for review March 3, 2025 07:52

silent-observer reviewed Mar 3, 2025

View reviewed changes

src/backend/optimizer/util/pathnode.c Outdated Show resolved Hide resolved

src/test/regress/sql/limit.sql Outdated Show resolved Hide resolved

silent-observer previously approved these changes Mar 4, 2025

View reviewed changes

RekGRpth reviewed Mar 6, 2025

View reviewed changes

src/test/regress/sql/limit.sql Outdated Show resolved Hide resolved

mos65o2 dismissed silent-observer’s stale review via 177eda6 March 6, 2025 07:16

RekGRpth reviewed Mar 6, 2025

View reviewed changes

src/test/regress/expected/limit.out Outdated Show resolved Hide resolved

bimboterminator1 reviewed Mar 12, 2025

View reviewed changes

src/test/regress/sql/limit.sql Outdated Show resolved Hide resolved

src/backend/optimizer/util/pathnode.c Outdated Show resolved Hide resolved

src/backend/optimizer/util/pathnode.c Outdated Show resolved Hide resolved

src/backend/optimizer/util/pathnode.c Outdated Show resolved Hide resolved

Add support for parameterized clauses in a subplan with a volatile fu…

2883755

…nction

mos65o2 force-pushed the ADBDEV-6886 branch from 1640fe0 to 2883755 Compare April 14, 2025 21:04

mos65o2 added 2 commits April 15, 2025 09:08

Fix output of bad testcases

f8a4d94

Fix adding Motion(1:1) and correct tests output

0586525

mos65o2 changed the title ~~ADBDEV-6886: Add support parameterized LIMIT in the sub plan with volatile functions~~ ADBDEV-6886: Add support parameterized clauses in the subplan with volatile functions Apr 16, 2025

bimboterminator1 reviewed Apr 18, 2025

View reviewed changes

mos65o2 added 3 commits April 20, 2025 14:54

Consider the case with SegmentGeneral and check tests

00bb1b4

Fix tests output

6ab3bad

Fix rpt.out

37994ac

bimboterminator1 reviewed Apr 23, 2025

View reviewed changes

src/backend/optimizer/plan/planner.c Outdated Show resolved Hide resolved

src/backend/optimizer/path/allpaths.c Outdated Show resolved Hide resolved

mos65o2 added 2 commits April 24, 2025 14:22

Add cdbpath_create_motion_to_outer_query function

cec1a6d

Make the condition for adding motion in bring_to_outer_query more strict

f30f637

bimboterminator1 reviewed May 5, 2025

View reviewed changes

mos65o2 added 2 commits June 2, 2025 15:44

Add comments

84cd382

Omit unnecessary Materialize and fix test output

2b22a31

mos65o2 added 2 commits June 2, 2025 20:21

Fix test outputs (remove unnecessary Materialize)

47b8dee

Merge branch 'adb-7.2.0' into ADBDEV-6886

1f09942

ADBDEV-6886: Add support parameterized clauses in the subplan with volatile functions #1240

Are you sure you want to change the base?

ADBDEV-6886: Add support parameterized clauses in the subplan with volatile functions #1240

Uh oh!

Conversation

mos65o2 commented Mar 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

silent-observer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mos65o2 commented Mar 3, 2025

Uh oh!

Uh oh!

Uh oh!

bimboterminator1 commented Mar 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mos65o2 commented Mar 13, 2025

Uh oh!

bimboterminator1 commented Mar 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mos65o2 commented Apr 16, 2025

Uh oh!

mos65o2 commented Apr 16, 2025

Uh oh!

bimboterminator1 commented Apr 18, 2025

Uh oh!

bimboterminator1 Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

bimboterminator1 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bimboterminator1 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

mos65o2 Jun 2, 2025

Choose a reason for hiding this comment

Uh oh!

bimboterminator1 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

mos65o2 May 14, 2025

Choose a reason for hiding this comment

Uh oh!

bimboterminator1 May 5, 2025

Choose a reason for hiding this comment

Uh oh!

bimboterminator1 commented May 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mos65o2 commented Mar 2, 2025 •

edited

Loading

bimboterminator1 commented Mar 12, 2025 •

edited

Loading

bimboterminator1 commented Mar 13, 2025 •

edited

Loading