fix(file-based): override primary_key in PermissionsFileBasedStream to avoid invalid parser-defined key#903
Conversation
…o avoid invalid parser-defined key Co-Authored-By: Ryan Waskewich <[email protected]>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This CDK VersionYou can test this version of the CDK using the following: # Run the CLI from this branch:
uvx 'git+https://github.com/airbytehq/airbyte-python-cdk.git@devin/1770749470-fix-permissions-stream-primary-key#egg=airbyte-python-cdk[dev]' --help
# Update a connector to use the CDK from this branch ref:
cd airbyte-integrations/connectors/source-example
poe use-cdk-branch devin/1770749470-fix-permissions-stream-primary-keyPR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
|
|
Aldo Gonzalez (@aldogonzalez8) Maxime Carbonneau-Leclerc (@maxi297) — Requesting your review on this fix. You both authored the permissions stream code that's affected here. Summary: The fix overrides cc Ryan Waskewich (@rwask) who reported this issue. |
Aldo Gonzalez (aldogonzalez8)
left a comment
There was a problem hiding this comment.
APPROVED
📝 WalkthroughWalkthroughTwo file-based stream classes are updated to improve primary_key handling: one simplifies resolution logic to use the legacy stream's primary_key directly, eliminating fallback logic; the other exposes configuration's primary_key as a public property with proper type annotation. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
fix(file-based): override primary_key in PermissionsFileBasedStream to avoid invalid parser-defined key
Summary
When permissions transfer mode is enabled on file-based connectors (e.g. SharePoint Enterprise), the
PermissionsFileBasedStreamuses a completely different schema (with fields likeid,file_path,publicly_accessible,allowed_identity_remote_ids) than the standard content schema. However, it inheritedDefaultFileBasedStream.primary_key, which falls back to the parser-defined primary key —"document_key"forUnstructuredParser. Sincedocument_keydoesn't exist in the permissions schema, the destination rejects the catalog:Changes:
PermissionsFileBasedStream: Overridesprimary_keyto return onlyself.config.primary_key(user-configured PK orNone), skipping the parser-defined fallback that returns a key belonging to the content schema.FileBasedStreamFacade: Delegatesprimary_keyto the underlying legacy stream instead of reimplementing the fallback logic. This ensures permissions streams wrapped in the concurrent facade also get the correct behavior. ForDefaultFileBasedStream, this is functionally equivalent to the previous logic.Review & Testing Checklist for Human
FileBasedStreamFacade.primary_keypreviously didconfig.primary_key or parser.get_parser_defined_primary_key()inline; now it delegates toself._legacy_stream.primary_key. Confirm thatDefaultFileBasedStream.primary_keyhas identical logic (it does at time of writing — line 104-108), and that no other subclasses overrideprimary_keyin a way that would change behavior unexpectedly through the facade.Noneprimary key doesn't break downstream: When no user-configured PK exists,PermissionsFileBasedStream.primary_keynow returnsNone. Confirm this is handled correctly in catalog generation (sourceDefinedPrimaryKeybecomes empty) and that destinations handle the absence of a source-defined PK gracefully (user must configure one for dedup).document_keyas PK, and (b) syncs succeed when an appropriate PK from the permissions schema is configured.Notes
Summary by CodeRabbit