Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@lukekim
Copy link
Contributor

@lukekim lukekim commented Oct 14, 2025

This pull request adds robust validation and error handling to the GraphQL client and GitHub data connector, aiming to make failures easier to diagnose and prevent problematic queries or responses from causing issues. The changes include stricter checks on query inputs, pagination, JSON pointer formats, and response data, with improved logging and error messages for debugging.

Validation and Error Handling Improvements:

  • Added checks to ensure GraphQL queries are not empty or whitespace-only, with clear error messages and debug logging. [1] [2]
  • Improved validation for cursors and pagination, including limits on cursor length, detection of infinite loops, and a hard cap on pagination iterations to prevent runaway queries. [1] [2] [3] [4]
  • Enhanced JSON pointer validation, warning if the format is incorrect and ensuring pointers are not empty before extracting data. [1] [2] [3]
  • Added checks for unnest depth and JSON object size to prevent excessive recursion and memory issues. [1] [2]

Logging and Diagnostics:

  • Improved error messages for failed JSON parsing, record batch creation, and schema inference, including previews of problematic data and full response logs for easier debugging. [1] [2] [3]
  • Updated GitHub rate limit error messages to provide actionable advice and reference documentation. [1] [2]

These changes will make the data connector more resilient to malformed queries and unexpected API responses, and provide better context for troubleshooting when errors occur.

@lukekim lukekim self-assigned this Oct 14, 2025
@lukekim lukekim requested a review from a team as a code owner October 14, 2025 03:30
Copilot AI review requested due to automatic review settings October 14, 2025 03:30
@github-actions
Copy link
Contributor

github-actions bot commented Oct 14, 2025

✅ Pull with Spice Passed

Passing checks:

  • ✅ Title meets minimum length requirement (10 characters)
  • ✅ Has at least one of the required labels: kind/refactor, kind/bug, kind/enhancement, kind/documentation, kind/optimization, kind/dependencies, kind/endgame, kind/task, kind/performance
  • ✅ No banned labels detected
  • ✅ Has at least one assignee: lukekim

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves the robustness and error handling of the GraphQL client and GitHub data connector by adding comprehensive input validation, better error diagnostics, and protection against infinite loops. The changes focus on making the system more resilient to malformed data and providing more actionable error messages to users.

  • Enhanced GraphQL client validation with checks for empty queries, malformed cursors, and excessive recursion depth
  • Improved GitHub data connector with installation access validation and better rate limit messaging
  • Added pagination loop protection and enhanced error handling throughout the system

Reviewed Changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
crates/runtime/src/dataconnector/github/rate_limit.rs Added rate limit usage warnings when usage exceeds 80%
crates/runtime/src/dataconnector/github/projects.rs Added GraphQL error checking and fixed query field naming
crates/runtime/src/dataconnector/github/mod.rs Added comprehensive GitHub App installation access validation
crates/data_components/src/graphql/mod.rs Updated error types and improved error message formatting
crates/data_components/src/graphql/client.rs Enhanced validation, pagination protection, and error handling
crates/data_components/src/github.rs Updated rate limit error messages with actionable guidance

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@lukekim lukekim added this to the v1.9.0 milestone Oct 14, 2025
@lukekim lukekim added the kind/bug Something isn't working label Oct 14, 2025
ewgenius
ewgenius previously approved these changes Oct 14, 2025
Copilot AI review requested due to automatic review settings October 14, 2025 05:53
@lukekim lukekim enabled auto-merge October 14, 2025 05:53
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 14, 2025 19:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

lukekim and others added 4 commits October 17, 2025 23:59
* Optimize builds for speed

* Update .github/workflows/pr.yml

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>
Copilot AI review requested due to automatic review settings October 19, 2025 04:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 15 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 19, 2025 04:58
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 19, 2025 23:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

crates/data_components/src/graphql/mod.rs:87

  • The display message 'Server returned an error' for InvalidGraphQLQuery is misleading when this variant is used for client-side validation and parse failures (e.g., empty or syntactically invalid queries). Suggest changing to a neutral message like 'Invalid GraphQL query: {message}' so it accurately covers both parse-time and server-reported query errors.
    #[snafu(display("Server returned an error: {message}"))]
    InvalidGraphQLQuery {
        message: String,
        line: usize,
        column: usize,
        query: String,

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copilot AI review requested due to automatic review settings October 20, 2025 11:47
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Comments suppressed due to low confidence (1)

crates/data_components/src/graphql/mod.rs:1

  • The display message for InvalidGraphQLQuery was simplified and no longer includes line, column, or contextual query excerpt, reducing diagnostic value. Consider restoring a more informative format that includes line/column and a query snippet to aid quick debugging.
/*

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@lukekim lukekim added this pull request to the merge queue Oct 20, 2025
Merged via the queue into trunk with commit 7e4f962 Oct 20, 2025
81 checks passed
@lukekim lukekim deleted the lukim/github-data-connector branch October 20, 2025 17:27
lukekim added a commit that referenced this pull request Oct 20, 2025
…andling (#7547)

* Add better graphql validation

* WIP

* Formatting

* Fix

* Fix query

* Add validation for GitHub API

* More error handling improvements

* Revert "Merge branch 'trunk' into lukim/github-data-connector"

This reverts commit 8982710, reversing
changes made to 7b8f8bc.

* Fix issues

* Updares

* Consolidate

* Add Debug

* Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530)

Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0.
- [Commits](golang/mod@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Luke Kim <[email protected]>

* Hive-style partitioning for DuckDB file mode (#7563)

* more advanced partition_by config

* add tests

* wip

* rename name

* add PartitionedBy to vector

* use new PartitionedBy in partition_by_expressions

* modify DataAccelerator::create_external_table

* use PartitionedBy more

* create hive style files

* discover hive style partitions

* discover hive partitions for duckdb

* remove unwraps

* fix clippy lints

* more clippy lints

* Update crates/spicepod/src/partitioning.rs

Co-authored-by: Copilot <[email protected]>

* fix spicepod tests

---------

Co-authored-by: Copilot <[email protected]>

* Vortex Data Accelerator (Dev grade) (#7566)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* fix lint

* Update crates/vortex-datafusion/src/persistent/opener.rs

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Only load eval scorers when eval defined (#7549)

* Only load eval scorers when eval defined

* Reinstate eval verification in async workflow

* Bump octocrab from 0.45.0 to 0.47.0 (#7531)

Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0.
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0)

---
updated-dependencies:
- dependency-name: octocrab
  dependency-version: 0.47.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump regex from 1.11.3 to 1.12.1 (#7532)

Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](rust-lang/regex@1.11.3...1.12.1)

---
updated-dependencies:
- dependency-name: regex
  dependency-version: 1.12.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix custom file path for Vortex Data Accelerator (#7570)

* Only support append refresh

* Remove validation

* fix lint

* Fix the tests

* Add List type support to Vortex Data Accelerator (#7569)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* WIP

* Only support append mode for Vortex

* Add additional validation

* Add List to vortex supported types

* Fix linting issues

* Remove memory mode test, not supported anymore.

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* Bump parking_lot from 0.12.4 to 0.12.5 (#7534)

Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-version: 0.12.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533)

Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15.
- [Release notes](https://github.com/rust-postgres/rust-postgres/releases)
- [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15)

---
updated-dependencies:
- dependency-name: tokio-postgres
  dependency-version: 0.7.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove duplicate line from 1.8.1 release notes (#7580)

* Upgrade Go from v1.24.2 to v1.25.3 (#7582)

* check if index/bucket exists after ConflictException (#7577)

* Add `runtime-async` crate with managed Tokio runtime (#7575)

* Add `runtime-async` crate with managed Tokio runtime

* fix

* remove test

* fix

* fix lint

* Optimize GitHub Actions workflows (#7584)

* Optimize builds for speed

* Update .github/workflows/pr.yml

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* Add prepared statements

* Remove dupe

* Revert "Add prepared statements"

This reverts commit 5f8a36b.

* Update crates/runtime/src/dataconnector/github/projects.rs

Co-authored-by: Copilot <[email protected]>

* Fix copilot's complaints

* Fixes

* Filter out empty segments

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Zimmerman <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Jack Eadie <[email protected]>
Co-authored-by: Viktor Yershov <[email protected]>
github-merge-queue bot pushed a commit that referenced this pull request Oct 21, 2025
* Initial Pepper data accelerator
This renames Vortex to Pepper

* Fixes

* Update crates/pepper/README.md

* Update crates/runtime/src/component/dataset/acceleration.rs

Co-authored-by: Copilot <[email protected]>

* Update name

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/src/lib.rs

Co-authored-by: Copilot <[email protected]>

* Improvements

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/README.md

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/mod.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/README.md

Co-authored-by: Copilot <[email protected]>

* Apply suggestions from code review

Co-authored-by: Phillip LeBlanc <[email protected]>

* fix score order for one test case (#7595)

* `ObjectMeta` filter pushdown for `ObjectStoreTextTable` (#7572)

* setup code for document table filtering

* pushdown ObjectMeta filters to ObjectStoreTextTable

* fix filter

* Prefetch ObjectMeta to improve execution plan statistics

* PR comment refactors

* unit tests

* bad merge

* clippy

* clppy

* Return `TableProvider` from `CandidateGeneration::search`.  (#7559)

* Remove 'SearchIndex::metadata_columns'

* add non-filterable metadata to FTS index

* integration tests

* clppy

* clppy

* clppy

* clppy

* clppy

* compiles

* clppy

* clppy

* working

* docs etc

* revert

* fix match projection; nan in scores

* clppy

* fmt

* snapshots

* clppy

* multi-thread some tokio tests

---------

Co-authored-by: Luke Kim <[email protected]>

* EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider (#7587)

* EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider

* 'datafusion-optimizer-rules' crate

* move more to datafusion-optimizer-rules

* clppy

* testing

* snapshots

* PR comments

* update snapshots

* Update official Docker builds to use release binaries (#7597)

* Update official Docker builds to use release binaries

* update endgame

* fix docker builds

* Fix cuda build

* New Generate Changelog workflow (#7562)

* New Generate Changelog workflow

* Set default versions for reference

* Improvements for changelog

* Add comments

* Update scripts/generate_changelog.py

Co-authored-by: Copilot <[email protected]>

* Improvements to make it more reliable

* Update scripts/generate_changelog.py

Co-authored-by: Copilot <[email protected]>

* remove old changelog generator

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>

* BytesProcessedExec to allow optimizer to do limit pushdown (#7539)

* fix limit pushdown for children of bytesprocessedexec

* accept: limits push down lower after bytesprocessed allows passthrough

---------

Co-authored-by: Luke Kim <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>

* GitHub Data Connector add Projects, improve rate-limiting and error handling (#7547)

* Add better graphql validation

* WIP

* Formatting

* Fix

* Fix query

* Add validation for GitHub API

* More error handling improvements

* Revert "Merge branch 'trunk' into lukim/github-data-connector"

This reverts commit 8982710, reversing
changes made to 7b8f8bc.

* Fix issues

* Updares

* Consolidate

* Add Debug

* Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530)

Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0.
- [Commits](golang/mod@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Luke Kim <[email protected]>

* Hive-style partitioning for DuckDB file mode (#7563)

* more advanced partition_by config

* add tests

* wip

* rename name

* add PartitionedBy to vector

* use new PartitionedBy in partition_by_expressions

* modify DataAccelerator::create_external_table

* use PartitionedBy more

* create hive style files

* discover hive style partitions

* discover hive partitions for duckdb

* remove unwraps

* fix clippy lints

* more clippy lints

* Update crates/spicepod/src/partitioning.rs

Co-authored-by: Copilot <[email protected]>

* fix spicepod tests

---------

Co-authored-by: Copilot <[email protected]>

* Vortex Data Accelerator (Dev grade) (#7566)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* fix lint

* Update crates/vortex-datafusion/src/persistent/opener.rs

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Only load eval scorers when eval defined (#7549)

* Only load eval scorers when eval defined

* Reinstate eval verification in async workflow

* Bump octocrab from 0.45.0 to 0.47.0 (#7531)

Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0.
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0)

---
updated-dependencies:
- dependency-name: octocrab
  dependency-version: 0.47.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump regex from 1.11.3 to 1.12.1 (#7532)

Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](rust-lang/regex@1.11.3...1.12.1)

---
updated-dependencies:
- dependency-name: regex
  dependency-version: 1.12.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix custom file path for Vortex Data Accelerator (#7570)

* Only support append refresh

* Remove validation

* fix lint

* Fix the tests

* Add List type support to Vortex Data Accelerator (#7569)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* WIP

* Only support append mode for Vortex

* Add additional validation

* Add List to vortex supported types

* Fix linting issues

* Remove memory mode test, not supported anymore.

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* Bump parking_lot from 0.12.4 to 0.12.5 (#7534)

Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-version: 0.12.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533)

Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15.
- [Release notes](https://github.com/rust-postgres/rust-postgres/releases)
- [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15)

---
updated-dependencies:
- dependency-name: tokio-postgres
  dependency-version: 0.7.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove duplicate line from 1.8.1 release notes (#7580)

* Upgrade Go from v1.24.2 to v1.25.3 (#7582)

* check if index/bucket exists after ConflictException (#7577)

* Add `runtime-async` crate with managed Tokio runtime (#7575)

* Add `runtime-async` crate with managed Tokio runtime

* fix

* remove test

* fix

* fix lint

* Optimize GitHub Actions workflows (#7584)

* Optimize builds for speed

* Update .github/workflows/pr.yml

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* Add prepared statements

* Remove dupe

* Revert "Add prepared statements"

This reverts commit 5f8a36b.

* Update crates/runtime/src/dataconnector/github/projects.rs

Co-authored-by: Copilot <[email protected]>

* Fix copilot's complaints

* Fixes

* Filter out empty segments

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Zimmerman <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Jack Eadie <[email protected]>
Co-authored-by: Viktor Yershov <[email protected]>

* Add copilot-instructions to help improve Copilot reviews (#7606)

* Add copilot-instructions to help improve Copilot reviews.

* Updates

* Fixes

* Add support for DuckDB table-based partitioning (#7581)

* Support for partitioning based on table names

* Simplify infer_existing_partitions

* Better structure

* Add DuckDBPartitionedDataSink

* Update insert overwrite

* insert_append

* Update

* Update

* logic to delete old internal tables for full refresh

* Include partitioned_duckdb param

* Use statement for list_partitioned_tables

* Fix schema mismatch error

* lint

* Add tests for the DuckDBPartitionedDataSink

* Add test for TablesModePartitionedDuckDBAccelerator

* Update

* Primary key support

* on-conflict support

* Indexes support for append and full refresh

* Update crates/runtime/src/dataaccelerator/partitioned_duckdb/tables_mode/mod.rs

Co-authored-by: Phillip LeBlanc <[email protected]>

* Update

* Make PassThruExec public

* Update to the latest table-providers version

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* Add clarification

* Fix build

* Update deny.toml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Jack Eadie <[email protected]>
Co-authored-by: Viktor Yershov <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: David Stancu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Zimmerman <[email protected]>
Co-authored-by: Sergei Grebnov <[email protected]>
krinart added a commit that referenced this pull request Oct 21, 2025
* Initial Pepper data accelerator
This renames Vortex to Pepper

* Fixes

* Update crates/pepper/README.md

* Update crates/runtime/src/component/dataset/acceleration.rs

Co-authored-by: Copilot <[email protected]>

* Update name

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/src/lib.rs

Co-authored-by: Copilot <[email protected]>

* Improvements

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/pepper.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/README.md

Co-authored-by: Copilot <[email protected]>

* Update crates/runtime/src/dataaccelerator/mod.rs

Co-authored-by: Copilot <[email protected]>

* Update crates/pepper/README.md

Co-authored-by: Copilot <[email protected]>

* Apply suggestions from code review

Co-authored-by: Phillip LeBlanc <[email protected]>

* fix score order for one test case (#7595)

* `ObjectMeta` filter pushdown for `ObjectStoreTextTable` (#7572)

* setup code for document table filtering

* pushdown ObjectMeta filters to ObjectStoreTextTable

* fix filter

* Prefetch ObjectMeta to improve execution plan statistics

* PR comment refactors

* unit tests

* bad merge

* clippy

* clppy

* Return `TableProvider` from `CandidateGeneration::search`.  (#7559)

* Remove 'SearchIndex::metadata_columns'

* add non-filterable metadata to FTS index

* integration tests

* clppy

* clppy

* clppy

* clppy

* clppy

* compiles

* clppy

* clppy

* working

* docs etc

* revert

* fix match projection; nan in scores

* clppy

* fmt

* snapshots

* clppy

* multi-thread some tokio tests

---------

Co-authored-by: Luke Kim <[email protected]>

* EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider (#7587)

* EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider

* 'datafusion-optimizer-rules' crate

* move more to datafusion-optimizer-rules

* clppy

* testing

* snapshots

* PR comments

* update snapshots

* Update official Docker builds to use release binaries (#7597)

* Update official Docker builds to use release binaries

* update endgame

* fix docker builds

* Fix cuda build

* New Generate Changelog workflow (#7562)

* New Generate Changelog workflow

* Set default versions for reference

* Improvements for changelog

* Add comments

* Update scripts/generate_changelog.py

Co-authored-by: Copilot <[email protected]>

* Improvements to make it more reliable

* Update scripts/generate_changelog.py

Co-authored-by: Copilot <[email protected]>

* remove old changelog generator

---------

Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>

* BytesProcessedExec to allow optimizer to do limit pushdown (#7539)

* fix limit pushdown for children of bytesprocessedexec

* accept: limits push down lower after bytesprocessed allows passthrough

---------

Co-authored-by: Luke Kim <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>

* GitHub Data Connector add Projects, improve rate-limiting and error handling (#7547)

* Add better graphql validation

* WIP

* Formatting

* Fix

* Fix query

* Add validation for GitHub API

* More error handling improvements

* Revert "Merge branch 'trunk' into lukim/github-data-connector"

This reverts commit 8982710, reversing
changes made to 7b8f8bc.

* Fix issues

* Updares

* Consolidate

* Add Debug

* Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530)

Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0.
- [Commits](golang/mod@v0.28.0...v0.29.0)

---
updated-dependencies:
- dependency-name: golang.org/x/mod
  dependency-version: 0.29.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Luke Kim <[email protected]>

* Hive-style partitioning for DuckDB file mode (#7563)

* more advanced partition_by config

* add tests

* wip

* rename name

* add PartitionedBy to vector

* use new PartitionedBy in partition_by_expressions

* modify DataAccelerator::create_external_table

* use PartitionedBy more

* create hive style files

* discover hive style partitions

* discover hive partitions for duckdb

* remove unwraps

* fix clippy lints

* more clippy lints

* Update crates/spicepod/src/partitioning.rs

Co-authored-by: Copilot <[email protected]>

* fix spicepod tests

---------

Co-authored-by: Copilot <[email protected]>

* Vortex Data Accelerator (Dev grade) (#7566)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* fix lint

* Update crates/vortex-datafusion/src/persistent/opener.rs

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Copilot <[email protected]>

* Only load eval scorers when eval defined (#7549)

* Only load eval scorers when eval defined

* Reinstate eval verification in async workflow

* Bump octocrab from 0.45.0 to 0.47.0 (#7531)

Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0.
- [Release notes](https://github.com/XAMPPRocky/octocrab/releases)
- [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md)
- [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0)

---
updated-dependencies:
- dependency-name: octocrab
  dependency-version: 0.47.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump regex from 1.11.3 to 1.12.1 (#7532)

Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1.
- [Release notes](https://github.com/rust-lang/regex/releases)
- [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md)
- [Commits](rust-lang/regex@1.11.3...1.12.1)

---
updated-dependencies:
- dependency-name: regex
  dependency-version: 1.12.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Fix custom file path for Vortex Data Accelerator (#7570)

* Only support append refresh

* Remove validation

* fix lint

* Fix the tests

* Add List type support to Vortex Data Accelerator (#7569)

* Vortex Data Accelerator

* Fix imports

* Add back feature

* fix vortex

* fix build

* Remove memory

* Fix tests

* Add tests

* Update tests

* Fixes

* Update

* Fix

* Update tests

* Use StreamTable instead of ListingTable

* Update tests

* Use async buffered writes

* Update tests

* Works!

* Perf improvements

* Fix

* Add check for partition_by

* Fix memory leak

* fix lint issues

* fmt

* Improve benchmark tests

* fix lint

* Fix duplicate code.

* vendor vortex-datafusion

* fix

* finally clean lint

* Don't create dummy file, just specify the schema

* fix lint

* Property integrate vendored vortex-datafusion

* WIP

* Fix tests

* remove custom writing code

* WIP

* Only support append mode for Vortex

* Add additional validation

* Add List to vortex supported types

* Fix linting issues

* Remove memory mode test, not supported anymore.

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* Bump parking_lot from 0.12.4 to 0.12.5 (#7534)

Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5.
- [Release notes](https://github.com/Amanieu/parking_lot/releases)
- [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md)
- [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5)

---
updated-dependencies:
- dependency-name: parking_lot
  dependency-version: 0.12.5
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533)

Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15.
- [Release notes](https://github.com/rust-postgres/rust-postgres/releases)
- [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15)

---
updated-dependencies:
- dependency-name: tokio-postgres
  dependency-version: 0.7.15
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Remove duplicate line from 1.8.1 release notes (#7580)

* Upgrade Go from v1.24.2 to v1.25.3 (#7582)

* check if index/bucket exists after ConflictException (#7577)

* Add `runtime-async` crate with managed Tokio runtime (#7575)

* Add `runtime-async` crate with managed Tokio runtime

* fix

* remove test

* fix

* fix lint

* Optimize GitHub Actions workflows (#7584)

* Optimize builds for speed

* Update .github/workflows/pr.yml

Co-authored-by: Copilot <[email protected]>

---------

Co-authored-by: Copilot <[email protected]>

* Add prepared statements

* Remove dupe

* Revert "Add prepared statements"

This reverts commit 5f8a36b.

* Update crates/runtime/src/dataconnector/github/projects.rs

Co-authored-by: Copilot <[email protected]>

* Fix copilot's complaints

* Fixes

* Filter out empty segments

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Zimmerman <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Jack Eadie <[email protected]>
Co-authored-by: Viktor Yershov <[email protected]>

* Add copilot-instructions to help improve Copilot reviews (#7606)

* Add copilot-instructions to help improve Copilot reviews.

* Updates

* Fixes

* Add support for DuckDB table-based partitioning (#7581)

* Support for partitioning based on table names

* Simplify infer_existing_partitions

* Better structure

* Add DuckDBPartitionedDataSink

* Update insert overwrite

* insert_append

* Update

* Update

* logic to delete old internal tables for full refresh

* Include partitioned_duckdb param

* Use statement for list_partitioned_tables

* Fix schema mismatch error

* lint

* Add tests for the DuckDBPartitionedDataSink

* Add test for TablesModePartitionedDuckDBAccelerator

* Update

* Primary key support

* on-conflict support

* Indexes support for append and full refresh

* Update crates/runtime/src/dataaccelerator/partitioned_duckdb/tables_mode/mod.rs

Co-authored-by: Phillip LeBlanc <[email protected]>

* Update

* Make PassThruExec public

* Update to the latest table-providers version

---------

Co-authored-by: Phillip LeBlanc <[email protected]>

* Add clarification

* Fix build

* Update deny.toml

---------

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: Jack Eadie <[email protected]>
Co-authored-by: Viktor Yershov <[email protected]>
Co-authored-by: Phillip LeBlanc <[email protected]>
Co-authored-by: David Stancu <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Kevin Zimmerman <[email protected]>
Co-authored-by: Sergei Grebnov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/data-connectors kind/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants