Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lukapeschke
Copy link
Contributor

@lukapeschke lukapeschke commented Oct 2, 2025

This is a tricky one. Basically, what happens on main right now is that when a sort step in a CTE uses a column that is not part of the final columns in the main pipeline, CId redirection does not work, which results on the ORDER BY clause being done on a column of a table that might not be available in the final statement.

For example, the following PRQL

from albums
select { this.`title`, this.`artist_id` }
# This gets pushed to the main relation
sort { this.`artist_id` }
# `artist_id` gets rewritten here
derive { `artist_id` = as `double precision` this.`artist_id` }
filter (this.`artist_id` != null)
join side:left artists (this.`artist_id` == that.`artist_id`)
select {this.`artist_id`, this.`title`, this.`name`}

Results in the following SQL right now on main:

WITH table_1 AS (
  SELECT
    CAST(artist_id AS double precision) AS artist_id,
    title,
    artist_id AS _expr_0
  FROM
    albums
),
table_2 AS (
  SELECT
    artist_id,
    title,
    _expr_0
  FROM
    table_1
  WHERE
    artist_id IS NOT NULL
),
table_0 AS (
  SELECT
    artist_id,
    title,
    _expr_0
  FROM
    table_2
)
SELECT
  table_0.artist_id,
  table_0.title,
  artists.name
FROM
  table_0
  LEFT OUTER JOIN artists ON table_0.artist_id = artists.artist_id
ORDER BY
  -- WRONG, should be table0._expr_0
  table_2._expr_0

This is because fold_column_sorts which is called by fold_cte during the postprocess step will add the required columns sort to intermediate Select statements to ensure they are available in the main pipeline (in this case, it adds table2._expr0). However, since cid_redirects are not updated, a column from an anterior table is used for the final ORDER BY statement.

I'm not quite happy with the approach though, I feel like this should be done earlier, probably in the anchor step. However that would require pushing of sort columns to also happen during that step, which is a bigger change.

@kgutwin @max-sixty It'd be great to have your input on this one, as you are much more familiar with this code than I am 🙂

@max-sixty
Copy link
Member

thanks!!

I agree this could be done earlier, but also given the constrains this seems quite reasonable?

I'm generally up for working in an imperfect local minimum; hopefully later we can refactor.

to make it easier to review: can we reduce the size of the diff by using a couple of standard inline tests for the SQL compilation rather than the full integration tests? the full integration tests are not for individual features; otherwise we'd have 100K lines of them :)

@lukapeschke
Copy link
Contributor Author

@max-sixty Thanks for the quick response! I had a second thought about it, and I think you're right, although the code is almost the same as in the anchoring step, it shouldn't change much in the future, so having it like this should be enough for now :)

I also updated the tests to be inline tests

let mut new_name = old_name;
if let Some(new) = &mut new_name {
if used_new_names.contains(new) {
*new = self.ctx.anchor.col_name.gen();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lukapeschke if you have the patience, can we adjust the test so we hit this line too? I don't think we hit it currently

either way let's merge, feel free to ping either way

@lukapeschke lukapeschke force-pushed the ensure-redirection-for-columns-required-by-sorts branch from 85407ff to 111f1c5 Compare October 8, 2025 08:16
@lukapeschke
Copy link
Contributor Author

@max-sixty I've twisted the tests and tried to hit the bug fixed by #2960 but I have not been able to hit the uncovered line, so I've simplified the code to remove the used_new_names guard in 111f1c5 . Please have a second look and merge if it looks good to you

@max-sixty max-sixty merged commit 6639b8c into PRQL:main Oct 8, 2025
36 checks passed
@max-sixty
Copy link
Member

thank you @lukapeschke !

@lukapeschke lukapeschke deleted the ensure-redirection-for-columns-required-by-sorts branch October 9, 2025 09:46
lukapeschke added a commit to lukapeschke/prql that referenced this pull request Oct 9, 2025
… from the matching table_ref

It was done by converting the CTE's table id into an RIId for now, but those can differ in case
some CTEs get erased. This resulted in the wrong CId redirect being updated.

Fixing this by finding a relation instance whose table_ref's source matches the CTE's table_id instead.

Follow up of PRQL#5464

Signed-off-by: Luka Peschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants