Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Parquet: Add dedicated Select method that can be used to push selection vectors into the read#16174

Merged
Mytherin merged 1 commit intoduckdb:mainfrom
Mytherin:parquetselect
Feb 11, 2025
Merged

Parquet: Add dedicated Select method that can be used to push selection vectors into the read#16174
Mytherin merged 1 commit intoduckdb:mainfrom
Mytherin:parquetselect

Conversation

@Mytherin
Copy link
Collaborator

This effectively restores a previous optimization where we would skip reading elements if they were previously filtered out. For now we only enable this for strings - that has by far the highest performance benefits as we can skip UTF8 validation for any strings that we don't need to read.

For simple types like integers this optimization is not so straightforwardly useful - as we effectively replace a memcpy with a branchy lookup. I haven't run any benchmarks on this yet but I suspect that the usefulness of this optimization depends on selectivity - i.e. it might perform better when the selectivity is <10% (or some other to be determined threshold). I will leave that for a future PR.

@Mytherin Mytherin changed the title Add dedicated Select method that can be used to push selection vectors into the read Parquet: Add dedicated Select method that can be used to push selection vectors into the read Feb 11, 2025
@Mytherin Mytherin merged commit 4c77e9c into duckdb:main Feb 11, 2025
47 checks passed
Antonov548 added a commit to Antonov548/duckdb-r that referenced this pull request Feb 27, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr pushed a commit to duckdb/duckdb-r that referenced this pull request Mar 5, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
@Mytherin Mytherin deleted the parquetselect branch April 2, 2025 09:25
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 15, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 17, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
krlmlr added a commit to duckdb/duckdb-r that referenced this pull request May 18, 2025
Parquet: Add dedicated Select method that can be used to push selection vectors into the read (duckdb/duckdb#16174)
[CI] Avoid Linux CLI jobs to fail-fast (duckdb/duckdb#16173)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments