Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Improve sqllogicteset speed by creating only a single large file rather than 2#20586

Merged
alamb merged 1 commit intoapache:mainfrom
Tim-53:perf/reuse-parquet-file-push-down-filter-regression
Mar 2, 2026
Merged

Improve sqllogicteset speed by creating only a single large file rather than 2#20586
alamb merged 1 commit intoapache:mainfrom
Tim-53:perf/reuse-parquet-file-push-down-filter-regression

Conversation

@Tim-53
Copy link
Contributor

@Tim-53 Tim-53 commented Feb 26, 2026

Draft as it builds on #20576

Which issue does this PR close?

Rationale for this change

Execution time of the test is dominated by the time writing the parquet files. By reusing the file we can gain around 30% improvement on the execution time here.

What changes are included in this PR?

Building on #20576 we reuse the needed parquet file for the test instead of recreating it.

Are these changes tested?

Ran the test with following results:

Baseline (2 files) Optimized (1 file)
Min 33.000s 22.653s
Max 37.662s 25.489s
Avg 34.427s 24.092s

One open question: does the correctness of this regression test rely on having two physically separate files? The race condition in #17197 was in the execution layer — both scans would still be independent DataSourceExec nodes with independent readers, so I believe the behavior is preserved. But if there's any concern, we could use system cp to copy the file and register two physical files while still only paying the generate_series cost once.

Are there any user-facing changes?

@github-actions github-actions bot added optimizer Optimizer rules sqllogictest SQL Logic Tests (.slt) labels Feb 26, 2026
@alamb alamb changed the title Perf/reuse parquet file push down filter regression Improve sqllogicteset speed by creating only a single large file rather than 2 Feb 27, 2026
@alamb
Copy link
Contributor

alamb commented Feb 27, 2026

Thank you 🙏

I left a note on

Asking the original authors if they could double check

}
}

// trigger ci test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed? Also just in case it's helpful: git commit -m "ci" --allow-empty --no-verify

Copy link
Contributor

@alamb alamb Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think this is left over from #20566 -- when this PR gets rebased it should be removed)

Copy link
Contributor

@adriangb adriangb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the test needs two physically distinct files. As long as it's two different execution nodes that should be good enough!

@alamb alamb force-pushed the perf/reuse-parquet-file-push-down-filter-regression branch from acce9a4 to 6a36b9e Compare March 1, 2026 13:40
@github-actions github-actions bot removed the optimizer Optimizer rules label Mar 1, 2026
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of rebasing this PR against main.

I think it looks good to me

@alamb alamb marked this pull request as ready for review March 1, 2026 13:41
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took the liberty of rebasing this PR against main.

I think it looks good to me

@alamb alamb added this pull request to the merge queue Mar 2, 2026
@alamb
Copy link
Contributor

alamb commented Mar 2, 2026

Thanks again @Tim-53

Merged via the queue into apache:main with commit 0af9ff5 Mar 2, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants