-
Notifications
You must be signed in to change notification settings - Fork 105
Implements Token Federation for Python Driver #552
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase ( |
self.token_endpoint: Optional[str] = None | ||
self.idp_endpoints = None | ||
self.openid_config = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we delegate this work to a util, we only need the hostname. These info don't need to be part of the class
|
||
return get_headers | ||
|
||
def _init_oidc_discovery(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a util that returns the OIDC details
|
||
return parts[0], parts[1] | ||
|
||
def _parse_jwt_claims(self, token: str) -> Dict[str, Any]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we directly us the PyJWT library? Manual parsing can be error prone
|
||
logger = logging.getLogger(__name__) | ||
|
||
TOKEN_EXCHANGE_PARAMS = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not global constants, can we move them within the respective class
"return_original_token_if_authenticated": "true", | ||
} | ||
|
||
TOKEN_REFRESH_BUFFER_SECONDS = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same not a global constant
self.idp_endpoints = None | ||
self.openid_config = None | ||
self.last_exchanged_token: Optional[Token] = None | ||
self.last_external_token: Optional[str] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't think we need all these variables. Can we achieve the flow similar to JDBC
headers = {"Accept": "*/*", "Content-Type": "application/x-www-form-urlencoded"} | ||
|
||
try: | ||
# Make the token exchange request |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The request sending can be separated out in a different function
|
||
# Set expiry time from the response's expires_in field if available | ||
# This is the standard OAuth approach | ||
if "expires_in" in resp_data and resp_data["expires_in"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion : It is hard to read when there are nested try blocks. We can keep one parent try and then any failure within can be captured there. error logs will anyway have the required debugging info
try { try{ } }
return get_headers | ||
|
||
|
||
def create_token_federation_provider( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: Since we are using Decorator pattern, don't think this function is needed.
Just do DatabricksTokenFederationProvider(SimpleCredentialProvider)
in the main place. The whole point of decorator pattern is to not have a specific create_token_fed function
) | ||
|
||
# If expires_in wasn't available, try to parse expiry from the token JWT | ||
if token.expiry == datetime.now(tz=timezone.utc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we trying to find expiry in 2 different ways. Too much code bloat. Would say directly use the access_token parsing as that is the best way because if that fails there is nothing to check
logger = logging.getLogger(__name__) | ||
|
||
|
||
def decode_jwt(token): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Preferrably ue a library, manual parsing can be error prove
SystemExit: If any required environment variable is missing | ||
""" | ||
github_token = os.environ.get("OIDC_TOKEN") | ||
if not github_token: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like too much error handling. If some variable is not present let it fail by Null pointer exception, Code breaking exceptions don't think need to be handled
TOKEN_REFRESH_BUFFER_SECONDS, | ||
) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Plz convert the code to pytest
, we want to move away fromunittest
What type of PR is this?
Description
This PR adds token federation support to the Databricks SQL Python connector, which allows using external identity provider tokens (like GitHub Actions OIDC tokens) with Databricks SQL.
Key Changes
Core Implementation
Code Architecture
DatabricksTokenFederationProvider
class to handle token federationToken
class to manage token lifecycle and expiryAPI & Configuration
identity_federation_client_id
parameter for token federationTesting
Future Improvements
This PR enables Databricks SQL connector users to leverage external identity providers for authentication, particularly useful in CI/CD environments like GitHub Actions.
How is this tested?
Related Tickets & Documents
Notes for reviewers:
Token Federation Flow
1. Client Initialization
auth_type="token-federation"
and provides an external tokenaccess_token
or a customcredentials_provider
use_token_federation
flag2. Auth Provider Selection
get_auth_provider()
inauth.py
detects token federation auth typeDatabricksTokenFederationProvider
wrapper around the credential sourceTOKEN_FEDERATION
as an auth_type while maintaining backward compatibilityDatabricksOAuthProvider
,AccessTokenAuthProvider
, etc.)3. Token Evaluation
4. Token Exchange
5. Token Refresh
TOKEN_REFRESH_BUFFER_SECONDS = 10
):6. Fallback Handling
Future Provider Integration Plan
To properly integrate token federation with all auth providers in
authenticators.py
:Decorator Pattern Implementation:
AuthProvider
with token federation capabilitiesDatabricksOAuthProvider
,AccessTokenAuthProvider
, etc.Configuration Changes:
use_token_federation
boolean flag to connection parametersget_auth_provider()
to apply token federation wrapper when flag is setProvider Interface Enhancement:
CredentialsProvider
interface to expose necessary token informationDatabricksOAuthProvider
properly implements this interface for token accessBackward Compatibility:
auth_type="token-federation"
during transitionThe core token exchange functionality works well for any token, but the current architecture limits token federation to being a separate auth type. The primary improvement needed is architectural - enabling token federation to work with other auth types (including OAuth-based ones) while maintaining backward compatibility.