Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(file-based): override primary_key in PermissionsFileBasedStream to avoid invalid parser-defined key#903

Merged
Ryan Waskewich (rwask) merged 1 commit intomainfrom
devin/1770749470-fix-permissions-stream-primary-key
Feb 11, 2026
Merged

fix(file-based): override primary_key in PermissionsFileBasedStream to avoid invalid parser-defined key#903
Ryan Waskewich (rwask) merged 1 commit intomainfrom
devin/1770749470-fix-permissions-stream-primary-key

Conversation

@rwask
Copy link
Contributor

@rwask Ryan Waskewich (rwask) commented Feb 10, 2026

fix(file-based): override primary_key in PermissionsFileBasedStream to avoid invalid parser-defined key

Summary

When permissions transfer mode is enabled on file-based connectors (e.g. SharePoint Enterprise), the PermissionsFileBasedStream uses a completely different schema (with fields like id, file_path, publicly_accessible, allowed_identity_remote_ids) than the standard content schema. However, it inherited DefaultFileBasedStream.primary_key, which falls back to the parser-defined primary key — "document_key" for UnstructuredParser. Since document_key doesn't exist in the permissions schema, the destination rejects the catalog:

ConfigErrorException: A primary key column does not exist in the schema: document_key

Changes:

  1. PermissionsFileBasedStream: Overrides primary_key to return only self.config.primary_key (user-configured PK or None), skipping the parser-defined fallback that returns a key belonging to the content schema.
  2. FileBasedStreamFacade: Delegates primary_key to the underlying legacy stream instead of reimplementing the fallback logic. This ensures permissions streams wrapped in the concurrent facade also get the correct behavior. For DefaultFileBasedStream, this is functionally equivalent to the previous logic.

Review & Testing Checklist for Human

  • Verify behavioral equivalence of the facade change for non-permissions streams: FileBasedStreamFacade.primary_key previously did config.primary_key or parser.get_parser_defined_primary_key() inline; now it delegates to self._legacy_stream.primary_key. Confirm that DefaultFileBasedStream.primary_key has identical logic (it does at time of writing — line 104-108), and that no other subclasses override primary_key in a way that would change behavior unexpectedly through the facade.
  • Verify None primary key doesn't break downstream: When no user-configured PK exists, PermissionsFileBasedStream.primary_key now returns None. Confirm this is handled correctly in catalog generation (sourceDefinedPrimaryKey becomes empty) and that destinations handle the absence of a source-defined PK gracefully (user must configure one for dedup).
  • Test plan: Ideally test with a file-based connector in permissions mode (e.g. SharePoint Enterprise with permissions enabled) to confirm (a) discovery no longer advertises document_key as PK, and (b) syncs succeed when an appropriate PK from the permissions schema is configured.

Notes

Summary by CodeRabbit

  • Refactor
    • Simplified primary key resolution in file-based streams to use a single source of truth, eliminating complex fallback logic.
    • Exposed primary key configuration as a public property for improved accessibility and consistency across file-based stream implementations.

…o avoid invalid parser-defined key

Co-Authored-By: Ryan Waskewich <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@github-actions
Copy link

👋 Greetings, Airbyte Team Member!

Here are some helpful tips and reminders for your convenience.

💡 Show Tips and Tricks

Testing This CDK Version

You can test this version of the CDK using the following:

# Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1770749470-fix-permissions-stream-primary-key#egg=airbyte-python-cdk[dev]' --help

# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1770749470-fix-permissions-stream-primary-key

PR Slash Commands

Airbyte Maintainers can execute the following slash commands on your PR:

  • /autofix - Fixes most formatting and linting issues
  • /poetry-lock - Updates poetry.lock file
  • /test - Runs connector tests with the updated CDK
  • /prerelease - Triggers a prerelease publish with default arguments
  • /poe build - Regenerate git-committed build artifacts, such as the pydantic models which are generated from the manifest JSON schema in YAML.
  • /poe <command> - Runs any poe command in the CDK environment
📚 Show Repo Guidance

Helpful Resources

📝 Edit this welcome message.

@github-actions
Copy link

PyTest Results (Fast)

3 856 tests  ±0   3 844 ✅ ±0   6m 46s ⏱️ +6s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 315c4d8. ± Comparison against base commit 201bdb8.

@devin-ai-integration
Copy link
Contributor

Aldo Gonzalez (@aldogonzalez8) Maxime Carbonneau-Leclerc (@maxi297) — Requesting your review on this fix. You both authored the permissions stream code that's affected here.

Summary: PermissionsFileBasedStream inherits DefaultFileBasedStream.primary_key, which falls back to the parser-defined PK (document_key from UnstructuredParser). But the permissions schema doesn't contain document_key, causing destinations to reject the catalog with ConfigErrorException: A primary key column does not exist in the schema: document_key.

The fix overrides primary_key in PermissionsFileBasedStream to return only self.config.primary_key (no parser fallback), and updates FileBasedStreamFacade to delegate to the underlying stream's primary_key property.

cc Ryan Waskewich (@rwask) who reported this issue.

@github-actions
Copy link

PyTest Results (Full)

3 859 tests  ±0   3 847 ✅ ±0   10m 54s ⏱️ -1s
    1 suites ±0      12 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 315c4d8. ± Comparison against base commit 201bdb8.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

@rwask Ryan Waskewich (rwask) marked this pull request as ready for review February 11, 2026 16:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 11, 2026

📝 Walkthrough

Walkthrough

Two file-based stream classes are updated to improve primary_key handling: one simplifies resolution logic to use the legacy stream's primary_key directly, eliminating fallback logic; the other exposes configuration's primary_key as a public property with proper type annotation.

Changes

Cohort / File(s) Summary
Primary Key Property Updates
airbyte_cdk/sources/file_based/stream/concurrent/adapters.py, airbyte_cdk/sources/file_based/stream/permissions_file_based_stream.py
Refactored primary_key resolution in FileBasedStreamFacade to return legacy stream's value directly instead of combining config and parser-defined keys. Added public primary_key property to PermissionsFileBasedStream with PrimaryKeyType annotation exposing config's primary_key value.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a primary_key override in PermissionsFileBasedStream to prevent invalid parser-defined keys from being used.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch devin/1770749470-fix-permissions-stream-primary-key

No actionable comments were generated in the recent review. 🎉

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rwask Ryan Waskewich (rwask) merged commit 796bb34 into main Feb 11, 2026
31 of 32 checks passed
@rwask Ryan Waskewich (rwask) deleted the devin/1770749470-fix-permissions-stream-primary-key branch February 11, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants