-
Notifications
You must be signed in to change notification settings - Fork 150
GitHub Data Connector add Projects, improve rate-limiting and error handling #7547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
✅ Pull with Spice PassedPassing checks:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves the robustness and error handling of the GraphQL client and GitHub data connector by adding comprehensive input validation, better error diagnostics, and protection against infinite loops. The changes focus on making the system more resilient to malformed data and providing more actionable error messages to users.
- Enhanced GraphQL client validation with checks for empty queries, malformed cursors, and excessive recursion depth
- Improved GitHub data connector with installation access validation and better rate limit messaging
- Added pagination loop protection and enhanced error handling throughout the system
Reviewed Changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| crates/runtime/src/dataconnector/github/rate_limit.rs | Added rate limit usage warnings when usage exceeds 80% |
| crates/runtime/src/dataconnector/github/projects.rs | Added GraphQL error checking and fixed query field naming |
| crates/runtime/src/dataconnector/github/mod.rs | Added comprehensive GitHub App installation access validation |
| crates/data_components/src/graphql/mod.rs | Updated error types and improved error message formatting |
| crates/data_components/src/graphql/client.rs | Enhanced validation, pagination protection, and error handling |
| crates/data_components/src/github.rs | Updated rate limit error messages with actionable guidance |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
* Optimize builds for speed * Update .github/workflows/pr.yml Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
This reverts commit 5f8a36b.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 15 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
crates/data_components/src/graphql/mod.rs:87
- The display message 'Server returned an error' for InvalidGraphQLQuery is misleading when this variant is used for client-side validation and parse failures (e.g., empty or syntactically invalid queries). Suggest changing to a neutral message like 'Invalid GraphQL query: {message}' so it accurately covers both parse-time and server-reported query errors.
#[snafu(display("Server returned an error: {message}"))]
InvalidGraphQLQuery {
message: String,
line: usize,
column: usize,
query: String,
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.
Comments suppressed due to low confidence (1)
crates/data_components/src/graphql/mod.rs:1
- The display message for InvalidGraphQLQuery was simplified and no longer includes line, column, or contextual query excerpt, reducing diagnostic value. Consider restoring a more informative format that includes line/column and a query snippet to aid quick debugging.
/*
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…andling (#7547) * Add better graphql validation * WIP * Formatting * Fix * Fix query * Add validation for GitHub API * More error handling improvements * Revert "Merge branch 'trunk' into lukim/github-data-connector" This reverts commit 8982710, reversing changes made to 7b8f8bc. * Fix issues * Updares * Consolidate * Add Debug * Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0. - [Commits](golang/mod@v0.28.0...v0.29.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Luke Kim <[email protected]> * Hive-style partitioning for DuckDB file mode (#7563) * more advanced partition_by config * add tests * wip * rename name * add PartitionedBy to vector * use new PartitionedBy in partition_by_expressions * modify DataAccelerator::create_external_table * use PartitionedBy more * create hive style files * discover hive style partitions * discover hive partitions for duckdb * remove unwraps * fix clippy lints * more clippy lints * Update crates/spicepod/src/partitioning.rs Co-authored-by: Copilot <[email protected]> * fix spicepod tests --------- Co-authored-by: Copilot <[email protected]> * Vortex Data Accelerator (Dev grade) (#7566) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * fix lint * Update crates/vortex-datafusion/src/persistent/opener.rs Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Copilot <[email protected]> * Only load eval scorers when eval defined (#7549) * Only load eval scorers when eval defined * Reinstate eval verification in async workflow * Bump octocrab from 0.45.0 to 0.47.0 (#7531) Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0. - [Release notes](https://github.com/XAMPPRocky/octocrab/releases) - [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md) - [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0) --- updated-dependencies: - dependency-name: octocrab dependency-version: 0.47.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump regex from 1.11.3 to 1.12.1 (#7532) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.11.3...1.12.1) --- updated-dependencies: - dependency-name: regex dependency-version: 1.12.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix custom file path for Vortex Data Accelerator (#7570) * Only support append refresh * Remove validation * fix lint * Fix the tests * Add List type support to Vortex Data Accelerator (#7569) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * WIP * Only support append mode for Vortex * Add additional validation * Add List to vortex supported types * Fix linting issues * Remove memory mode test, not supported anymore. --------- Co-authored-by: Phillip LeBlanc <[email protected]> * Bump parking_lot from 0.12.4 to 0.12.5 (#7534) Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5) --- updated-dependencies: - dependency-name: parking_lot dependency-version: 0.12.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove duplicate line from 1.8.1 release notes (#7580) * Upgrade Go from v1.24.2 to v1.25.3 (#7582) * check if index/bucket exists after ConflictException (#7577) * Add `runtime-async` crate with managed Tokio runtime (#7575) * Add `runtime-async` crate with managed Tokio runtime * fix * remove test * fix * fix lint * Optimize GitHub Actions workflows (#7584) * Optimize builds for speed * Update .github/workflows/pr.yml Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * Add prepared statements * Remove dupe * Revert "Add prepared statements" This reverts commit 5f8a36b. * Update crates/runtime/src/dataconnector/github/projects.rs Co-authored-by: Copilot <[email protected]> * Fix copilot's complaints * Fixes * Filter out empty segments --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Zimmerman <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Jack Eadie <[email protected]> Co-authored-by: Viktor Yershov <[email protected]>
* Initial Pepper data accelerator This renames Vortex to Pepper * Fixes * Update crates/pepper/README.md * Update crates/runtime/src/component/dataset/acceleration.rs Co-authored-by: Copilot <[email protected]> * Update name * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/src/lib.rs Co-authored-by: Copilot <[email protected]> * Improvements * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/README.md Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/mod.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/README.md Co-authored-by: Copilot <[email protected]> * Apply suggestions from code review Co-authored-by: Phillip LeBlanc <[email protected]> * fix score order for one test case (#7595) * `ObjectMeta` filter pushdown for `ObjectStoreTextTable` (#7572) * setup code for document table filtering * pushdown ObjectMeta filters to ObjectStoreTextTable * fix filter * Prefetch ObjectMeta to improve execution plan statistics * PR comment refactors * unit tests * bad merge * clippy * clppy * Return `TableProvider` from `CandidateGeneration::search`. (#7559) * Remove 'SearchIndex::metadata_columns' * add non-filterable metadata to FTS index * integration tests * clppy * clppy * clppy * clppy * clppy * compiles * clppy * clppy * working * docs etc * revert * fix match projection; nan in scores * clppy * fmt * snapshots * clppy * multi-thread some tokio tests --------- Co-authored-by: Luke Kim <[email protected]> * EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider (#7587) * EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider * 'datafusion-optimizer-rules' crate * move more to datafusion-optimizer-rules * clppy * testing * snapshots * PR comments * update snapshots * Update official Docker builds to use release binaries (#7597) * Update official Docker builds to use release binaries * update endgame * fix docker builds * Fix cuda build * New Generate Changelog workflow (#7562) * New Generate Changelog workflow * Set default versions for reference * Improvements for changelog * Add comments * Update scripts/generate_changelog.py Co-authored-by: Copilot <[email protected]> * Improvements to make it more reliable * Update scripts/generate_changelog.py Co-authored-by: Copilot <[email protected]> * remove old changelog generator --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> * BytesProcessedExec to allow optimizer to do limit pushdown (#7539) * fix limit pushdown for children of bytesprocessedexec * accept: limits push down lower after bytesprocessed allows passthrough --------- Co-authored-by: Luke Kim <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> * GitHub Data Connector add Projects, improve rate-limiting and error handling (#7547) * Add better graphql validation * WIP * Formatting * Fix * Fix query * Add validation for GitHub API * More error handling improvements * Revert "Merge branch 'trunk' into lukim/github-data-connector" This reverts commit 8982710, reversing changes made to 7b8f8bc. * Fix issues * Updares * Consolidate * Add Debug * Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0. - [Commits](golang/mod@v0.28.0...v0.29.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Luke Kim <[email protected]> * Hive-style partitioning for DuckDB file mode (#7563) * more advanced partition_by config * add tests * wip * rename name * add PartitionedBy to vector * use new PartitionedBy in partition_by_expressions * modify DataAccelerator::create_external_table * use PartitionedBy more * create hive style files * discover hive style partitions * discover hive partitions for duckdb * remove unwraps * fix clippy lints * more clippy lints * Update crates/spicepod/src/partitioning.rs Co-authored-by: Copilot <[email protected]> * fix spicepod tests --------- Co-authored-by: Copilot <[email protected]> * Vortex Data Accelerator (Dev grade) (#7566) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * fix lint * Update crates/vortex-datafusion/src/persistent/opener.rs Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Copilot <[email protected]> * Only load eval scorers when eval defined (#7549) * Only load eval scorers when eval defined * Reinstate eval verification in async workflow * Bump octocrab from 0.45.0 to 0.47.0 (#7531) Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0. - [Release notes](https://github.com/XAMPPRocky/octocrab/releases) - [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md) - [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0) --- updated-dependencies: - dependency-name: octocrab dependency-version: 0.47.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump regex from 1.11.3 to 1.12.1 (#7532) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.11.3...1.12.1) --- updated-dependencies: - dependency-name: regex dependency-version: 1.12.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix custom file path for Vortex Data Accelerator (#7570) * Only support append refresh * Remove validation * fix lint * Fix the tests * Add List type support to Vortex Data Accelerator (#7569) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * WIP * Only support append mode for Vortex * Add additional validation * Add List to vortex supported types * Fix linting issues * Remove memory mode test, not supported anymore. --------- Co-authored-by: Phillip LeBlanc <[email protected]> * Bump parking_lot from 0.12.4 to 0.12.5 (#7534) Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5) --- updated-dependencies: - dependency-name: parking_lot dependency-version: 0.12.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove duplicate line from 1.8.1 release notes (#7580) * Upgrade Go from v1.24.2 to v1.25.3 (#7582) * check if index/bucket exists after ConflictException (#7577) * Add `runtime-async` crate with managed Tokio runtime (#7575) * Add `runtime-async` crate with managed Tokio runtime * fix * remove test * fix * fix lint * Optimize GitHub Actions workflows (#7584) * Optimize builds for speed * Update .github/workflows/pr.yml Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * Add prepared statements * Remove dupe * Revert "Add prepared statements" This reverts commit 5f8a36b. * Update crates/runtime/src/dataconnector/github/projects.rs Co-authored-by: Copilot <[email protected]> * Fix copilot's complaints * Fixes * Filter out empty segments --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Zimmerman <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Jack Eadie <[email protected]> Co-authored-by: Viktor Yershov <[email protected]> * Add copilot-instructions to help improve Copilot reviews (#7606) * Add copilot-instructions to help improve Copilot reviews. * Updates * Fixes * Add support for DuckDB table-based partitioning (#7581) * Support for partitioning based on table names * Simplify infer_existing_partitions * Better structure * Add DuckDBPartitionedDataSink * Update insert overwrite * insert_append * Update * Update * logic to delete old internal tables for full refresh * Include partitioned_duckdb param * Use statement for list_partitioned_tables * Fix schema mismatch error * lint * Add tests for the DuckDBPartitionedDataSink * Add test for TablesModePartitionedDuckDBAccelerator * Update * Primary key support * on-conflict support * Indexes support for append and full refresh * Update crates/runtime/src/dataaccelerator/partitioned_duckdb/tables_mode/mod.rs Co-authored-by: Phillip LeBlanc <[email protected]> * Update * Make PassThruExec public * Update to the latest table-providers version --------- Co-authored-by: Phillip LeBlanc <[email protected]> * Add clarification * Fix build * Update deny.toml --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Jack Eadie <[email protected]> Co-authored-by: Viktor Yershov <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: David Stancu <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Zimmerman <[email protected]> Co-authored-by: Sergei Grebnov <[email protected]>
* Initial Pepper data accelerator This renames Vortex to Pepper * Fixes * Update crates/pepper/README.md * Update crates/runtime/src/component/dataset/acceleration.rs Co-authored-by: Copilot <[email protected]> * Update name * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/src/lib.rs Co-authored-by: Copilot <[email protected]> * Improvements * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/pepper.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/README.md Co-authored-by: Copilot <[email protected]> * Update crates/runtime/src/dataaccelerator/mod.rs Co-authored-by: Copilot <[email protected]> * Update crates/pepper/README.md Co-authored-by: Copilot <[email protected]> * Apply suggestions from code review Co-authored-by: Phillip LeBlanc <[email protected]> * fix score order for one test case (#7595) * `ObjectMeta` filter pushdown for `ObjectStoreTextTable` (#7572) * setup code for document table filtering * pushdown ObjectMeta filters to ObjectStoreTextTable * fix filter * Prefetch ObjectMeta to improve execution plan statistics * PR comment refactors * unit tests * bad merge * clippy * clppy * Return `TableProvider` from `CandidateGeneration::search`. (#7559) * Remove 'SearchIndex::metadata_columns' * add non-filterable metadata to FTS index * integration tests * clppy * clppy * clppy * clppy * clppy * compiles * clppy * clppy * working * docs etc * revert * fix match projection; nan in scores * clppy * fmt * snapshots * clppy * multi-thread some tokio tests --------- Co-authored-by: Luke Kim <[email protected]> * EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider (#7587) * EmptyHashJoinExecPhysicalOptimization, and use in VectorScanTableProvider * 'datafusion-optimizer-rules' crate * move more to datafusion-optimizer-rules * clppy * testing * snapshots * PR comments * update snapshots * Update official Docker builds to use release binaries (#7597) * Update official Docker builds to use release binaries * update endgame * fix docker builds * Fix cuda build * New Generate Changelog workflow (#7562) * New Generate Changelog workflow * Set default versions for reference * Improvements for changelog * Add comments * Update scripts/generate_changelog.py Co-authored-by: Copilot <[email protected]> * Improvements to make it more reliable * Update scripts/generate_changelog.py Co-authored-by: Copilot <[email protected]> * remove old changelog generator --------- Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> * BytesProcessedExec to allow optimizer to do limit pushdown (#7539) * fix limit pushdown for children of bytesprocessedexec * accept: limits push down lower after bytesprocessed allows passthrough --------- Co-authored-by: Luke Kim <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> * GitHub Data Connector add Projects, improve rate-limiting and error handling (#7547) * Add better graphql validation * WIP * Formatting * Fix * Fix query * Add validation for GitHub API * More error handling improvements * Revert "Merge branch 'trunk' into lukim/github-data-connector" This reverts commit 8982710, reversing changes made to 7b8f8bc. * Fix issues * Updares * Consolidate * Add Debug * Bump golang.org/x/mod from 0.28.0 to 0.29.0 (#7530) Bumps [golang.org/x/mod](https://github.com/golang/mod) from 0.28.0 to 0.29.0. - [Commits](golang/mod@v0.28.0...v0.29.0) --- updated-dependencies: - dependency-name: golang.org/x/mod dependency-version: 0.29.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Luke Kim <[email protected]> * Hive-style partitioning for DuckDB file mode (#7563) * more advanced partition_by config * add tests * wip * rename name * add PartitionedBy to vector * use new PartitionedBy in partition_by_expressions * modify DataAccelerator::create_external_table * use PartitionedBy more * create hive style files * discover hive style partitions * discover hive partitions for duckdb * remove unwraps * fix clippy lints * more clippy lints * Update crates/spicepod/src/partitioning.rs Co-authored-by: Copilot <[email protected]> * fix spicepod tests --------- Co-authored-by: Copilot <[email protected]> * Vortex Data Accelerator (Dev grade) (#7566) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * fix lint * Update crates/vortex-datafusion/src/persistent/opener.rs Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Copilot <[email protected]> * Only load eval scorers when eval defined (#7549) * Only load eval scorers when eval defined * Reinstate eval verification in async workflow * Bump octocrab from 0.45.0 to 0.47.0 (#7531) Bumps [octocrab](https://github.com/XAMPPRocky/octocrab) from 0.45.0 to 0.47.0. - [Release notes](https://github.com/XAMPPRocky/octocrab/releases) - [Changelog](https://github.com/XAMPPRocky/octocrab/blob/main/CHANGELOG.md) - [Commits](XAMPPRocky/octocrab@v0.45.0...v0.47.0) --- updated-dependencies: - dependency-name: octocrab dependency-version: 0.47.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump regex from 1.11.3 to 1.12.1 (#7532) Bumps [regex](https://github.com/rust-lang/regex) from 1.11.3 to 1.12.1. - [Release notes](https://github.com/rust-lang/regex/releases) - [Changelog](https://github.com/rust-lang/regex/blob/master/CHANGELOG.md) - [Commits](rust-lang/regex@1.11.3...1.12.1) --- updated-dependencies: - dependency-name: regex dependency-version: 1.12.1 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Fix custom file path for Vortex Data Accelerator (#7570) * Only support append refresh * Remove validation * fix lint * Fix the tests * Add List type support to Vortex Data Accelerator (#7569) * Vortex Data Accelerator * Fix imports * Add back feature * fix vortex * fix build * Remove memory * Fix tests * Add tests * Update tests * Fixes * Update * Fix * Update tests * Use StreamTable instead of ListingTable * Update tests * Use async buffered writes * Update tests * Works! * Perf improvements * Fix * Add check for partition_by * Fix memory leak * fix lint issues * fmt * Improve benchmark tests * fix lint * Fix duplicate code. * vendor vortex-datafusion * fix * finally clean lint * Don't create dummy file, just specify the schema * fix lint * Property integrate vendored vortex-datafusion * WIP * Fix tests * remove custom writing code * WIP * Only support append mode for Vortex * Add additional validation * Add List to vortex supported types * Fix linting issues * Remove memory mode test, not supported anymore. --------- Co-authored-by: Phillip LeBlanc <[email protected]> * Bump parking_lot from 0.12.4 to 0.12.5 (#7534) Bumps [parking_lot](https://github.com/Amanieu/parking_lot) from 0.12.4 to 0.12.5. - [Release notes](https://github.com/Amanieu/parking_lot/releases) - [Changelog](https://github.com/Amanieu/parking_lot/blob/master/CHANGELOG.md) - [Commits](Amanieu/parking_lot@parking_lot-v0.12.4...parking_lot-v0.12.5) --- updated-dependencies: - dependency-name: parking_lot dependency-version: 0.12.5 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump tokio-postgres from 0.7.14 to 0.7.15 (#7533) Bumps [tokio-postgres](https://github.com/rust-postgres/rust-postgres) from 0.7.14 to 0.7.15. - [Release notes](https://github.com/rust-postgres/rust-postgres/releases) - [Commits](rust-postgres/rust-postgres@tokio-postgres-v0.7.14...tokio-postgres-v0.7.15) --- updated-dependencies: - dependency-name: tokio-postgres dependency-version: 0.7.15 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Remove duplicate line from 1.8.1 release notes (#7580) * Upgrade Go from v1.24.2 to v1.25.3 (#7582) * check if index/bucket exists after ConflictException (#7577) * Add `runtime-async` crate with managed Tokio runtime (#7575) * Add `runtime-async` crate with managed Tokio runtime * fix * remove test * fix * fix lint * Optimize GitHub Actions workflows (#7584) * Optimize builds for speed * Update .github/workflows/pr.yml Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]> * Add prepared statements * Remove dupe * Revert "Add prepared statements" This reverts commit 5f8a36b. * Update crates/runtime/src/dataconnector/github/projects.rs Co-authored-by: Copilot <[email protected]> * Fix copilot's complaints * Fixes * Filter out empty segments --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Zimmerman <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Jack Eadie <[email protected]> Co-authored-by: Viktor Yershov <[email protected]> * Add copilot-instructions to help improve Copilot reviews (#7606) * Add copilot-instructions to help improve Copilot reviews. * Updates * Fixes * Add support for DuckDB table-based partitioning (#7581) * Support for partitioning based on table names * Simplify infer_existing_partitions * Better structure * Add DuckDBPartitionedDataSink * Update insert overwrite * insert_append * Update * Update * logic to delete old internal tables for full refresh * Include partitioned_duckdb param * Use statement for list_partitioned_tables * Fix schema mismatch error * lint * Add tests for the DuckDBPartitionedDataSink * Add test for TablesModePartitionedDuckDBAccelerator * Update * Primary key support * on-conflict support * Indexes support for append and full refresh * Update crates/runtime/src/dataaccelerator/partitioned_duckdb/tables_mode/mod.rs Co-authored-by: Phillip LeBlanc <[email protected]> * Update * Make PassThruExec public * Update to the latest table-providers version --------- Co-authored-by: Phillip LeBlanc <[email protected]> * Add clarification * Fix build * Update deny.toml --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: Jack Eadie <[email protected]> Co-authored-by: Viktor Yershov <[email protected]> Co-authored-by: Phillip LeBlanc <[email protected]> Co-authored-by: David Stancu <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Kevin Zimmerman <[email protected]> Co-authored-by: Sergei Grebnov <[email protected]>
This pull request adds robust validation and error handling to the GraphQL client and GitHub data connector, aiming to make failures easier to diagnose and prevent problematic queries or responses from causing issues. The changes include stricter checks on query inputs, pagination, JSON pointer formats, and response data, with improved logging and error messages for debugging.
Validation and Error Handling Improvements:
Logging and Diagnostics:
These changes will make the data connector more resilient to malformed queries and unexpected API responses, and provide better context for troubleshooting when errors occur.