Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Implements Token Federation for Python Driver #552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 30 commits into
base: main
Choose a base branch
from
Open

Implements Token Federation for Python Driver #552

wants to merge 30 commits into from

Conversation

madhav-db
Copy link
Contributor

@madhav-db madhav-db commented May 7, 2025

What type of PR is this?

  • Refactor
  • Feature
  • Bug Fix
  • Other

Description

This PR adds token federation support to the Databricks SQL Python connector, which allows using external identity provider tokens (like GitHub Actions OIDC tokens) with Databricks SQL.

Key Changes

Core Implementation

  • Added token federation as a new auth type with supporting classes and methods
  • Implemented token exchange mechanism to convert external tokens to Databricks tokens

Code Architecture

  • Added DatabricksTokenFederationProvider class to handle token federation
  • Added Token class to manage token lifecycle and expiry
  • Implemented timezone-aware datetime handling to prevent comparison issues
  • Added IdP detection to support various identity providers (Azure AD, GitHub, Google, AWS)

API & Configuration

  • Added identity_federation_client_id parameter for token federation
  • Added proper OIDC discovery for finding token endpoints
  • Added fallback mechanisms for error handling

Testing

  • Added unit tests with mocking for token federation components
  • Added end-to-end test for GitHub OIDC tokens

Future Improvements

  • Token federation should be refactored as a feature that works with different auth types instead of being an auth type itself
  • OAuthProvider should be integrated with token federation to allow token exchange for OAuth-acquired tokens
  • Use a standardized approach for feature flags across the codebase

This PR enables Databricks SQL connector users to leverage external identity providers for authentication, particularly useful in CI/CD environments like GitHub Actions.

How is this tested?

  • Unit tests
  • E2E Tests
  • Manually (via CI/CD)
  • N/A

Related Tickets & Documents

Notes for reviewers:

Token Federation Flow

1. Client Initialization

  • User creates a SQL connection with auth_type="token-federation" and provides an external token
  • Can be initialized either with access_token or a custom credentials_provider
  • LIMITATION: Currently implemented as a standalone auth type, not a feature that can be combined with other auth types
  • TODO: Refactor to make token federation a feature that works with any auth type via a use_token_federation flag

2. Auth Provider Selection

  • get_auth_provider() in auth.py detects token federation auth type
  • Creates a DatabricksTokenFederationProvider wrapper around the credential source
  • TODO: Remove TOKEN_FEDERATION as an auth_type while maintaining backward compatibility
  • TODO: Allow wrapping of existing providers (DatabricksOAuthProvider, AccessTokenAuthProvider, etc.)

3. Token Evaluation

  • When headers are requested, the federation provider:
    1. Gets external token from underlying provider
    2. Parses JWT claims to check token issuer
    3. Determines if token needs exchange based on issuer comparison
  • The token evaluation works with any valid JWT, regardless of how it was obtained
  • TODO: Design interfaces to wrap any auth provider with token federation capability

4. Token Exchange

  • If token is from a different issuer than the target Databricks host:
    1. Uses OIDC discovery to find token endpoint
    2. Exchanges external token for Databricks token via token exchange protocol
    3. Stores exchanged token and original external token for future reference
  • If token is from same issuer, uses original token without exchange
  • This process works correctly for any token regardless of source

5. Token Refresh

  • Before token expiry (controlled by TOKEN_REFRESH_BUFFER_SECONDS = 10):
    1. Requests fresh external token from underlying provider
    2. Exchanges this fresh token for a new Databricks token
    3. Updates stored tokens and headers
  • LIMITATION: Relies on underlying provider for fresh tokens

6. Fallback Handling

  • If token exchange or refresh fails, falls back to original external token
  • Logs appropriate warnings/errors

Future Provider Integration Plan

To properly integrate token federation with all auth providers in authenticators.py:

  1. Decorator Pattern Implementation:

    • Create a wrapper class that can decorate any AuthProvider with token federation capabilities
    • Allow wrapping of DatabricksOAuthProvider, AccessTokenAuthProvider, etc.
  2. Configuration Changes:

    • Add a use_token_federation boolean flag to connection parameters
    • Modify get_auth_provider() to apply token federation wrapper when flag is set
  3. Provider Interface Enhancement:

    • Update CredentialsProvider interface to expose necessary token information
    • Ensure DatabricksOAuthProvider properly implements this interface for token access
  4. Backward Compatibility:

    • Maintain support for existing auth_type="token-federation" during transition
    • Add deprecation warnings and migration guidance

The core token exchange functionality works well for any token, but the current architecture limits token federation to being a separate auth type. The primary improvement needed is architectural - enabling token federation to work with other auth types (including OAuth-based ones) while maintaining backward compatibility.

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 7, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@madhav-db madhav-db changed the title token federation (draft) Implements Token Federation for Python Driver May 9, 2025
Copy link

github-actions bot commented May 9, 2025

Thanks for your contribution! To satisfy the DCO policy in our contributing guide every commit message must include a sign-off message. One or more of your commits is missing this message. You can reword previous commit messages with an interactive rebase (git rebase -i main).

@madhav-db madhav-db requested a review from gopalldb May 9, 2025 08:16
Comment on lines +90 to +92
self.token_endpoint: Optional[str] = None
self.idp_endpoints = None
self.openid_config = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delegate this work to a util, we only need the hostname. These info don't need to be part of the class


return get_headers

def _init_oidc_discovery(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be a util that returns the OIDC details


return parts[0], parts[1]

def _parse_jwt_claims(self, token: str) -> Dict[str, Any]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we directly us the PyJWT library? Manual parsing can be error prone


logger = logging.getLogger(__name__)

TOKEN_EXCHANGE_PARAMS = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not global constants, can we move them within the respective class

"return_original_token_if_authenticated": "true",
}

TOKEN_REFRESH_BUFFER_SECONDS = 10
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same not a global constant

self.idp_endpoints = None
self.openid_config = None
self.last_exchanged_token: Optional[Token] = None
self.last_external_token: Optional[str] = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need all these variables. Can we achieve the flow similar to JDBC

headers = {"Accept": "*/*", "Content-Type": "application/x-www-form-urlencoded"}

try:
# Make the token exchange request
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The request sending can be separated out in a different function


# Set expiry time from the response's expires_in field if available
# This is the standard OAuth approach
if "expires_in" in resp_data and resp_data["expires_in"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion : It is hard to read when there are nested try blocks. We can keep one parent try and then any failure within can be captured there. error logs will anyway have the required debugging info
try { try{ } }

return get_headers


def create_token_federation_provider(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Since we are using Decorator pattern, don't think this function is needed.
Just do DatabricksTokenFederationProvider(SimpleCredentialProvider) in the main place. The whole point of decorator pattern is to not have a specific create_token_fed function

)

# If expires_in wasn't available, try to parse expiry from the token JWT
if token.expiry == datetime.now(tz=timezone.utc):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we trying to find expiry in 2 different ways. Too much code bloat. Would say directly use the access_token parsing as that is the best way because if that fails there is nothing to check

logger = logging.getLogger(__name__)


def decode_jwt(token):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preferrably ue a library, manual parsing can be error prove

SystemExit: If any required environment variable is missing
"""
github_token = os.environ.get("OIDC_TOKEN")
if not github_token:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like too much error handling. If some variable is not present let it fail by Null pointer exception, Code breaking exceptions don't think need to be handled

TOKEN_REFRESH_BUFFER_SECONDS,
)


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plz convert the code to pytest, we want to move away fromunittest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants