Thanks to visit codestin.com
Credit goes to github.com

Skip to content

SkyflowFoundry/redshift_udf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Skyflow Redshift UDF Handler

A Lambda-based handler for tokenizing and detokenizing sensitive data in Amazon Redshift using Skyflow's vaults.

License: MIT

Table of Contents

Introduction

The Skyflow Redshift UDF Handler is a serverless solution that enables secure tokenization and detokenization operations directly from Amazon Redshift. It leverages AWS Lambda to create User-Defined Functions (UDFs) that interact with Skyflow's data privacy vault to protect sensitive information while maintaining data utility.

Features

  • Seamless Redshift Integration: Invoke tokenization and detokenization directly from SQL queries
  • Role-Based Access Control: Map Redshift roles to appropriate Skyflow permissions
  • Caching Mechanism: Reduce API calls with JWT token caching
  • Flexible Configuration: Support multiple credential configurations for different roles
  • Error Handling: Comprehensive error handling and logging

How It Works

The Skyflow Redshift UDF Handler works as a bridge between Amazon Redshift and Skyflow's APIs:

sequenceDiagram
    %% Lambda invocation
    Redshift->>Lambda: Invoke UDF with data and context
    activate Lambda
    Lambda->>Lambda: Validate environment configuration
    Lambda->>Handler: Initialize handler with environment variables
    activate Handler
    
    %% Process arguments
    Lambda->>Lambda: Extract user role from Redshift context
    Lambda->>Handler: Process arguments based on operation type
    
    alt Tokenization Flow (PII → Token)
        %% Role mapping
        Handler->>Handler: Determine appropriate Skyflow role
        Handler->>Secret: Retrieve role mappings
        Secret-->>Handler: Return role configuration
        Handler->>Handler: Map Redshift role to Skyflow role
        
        %% Authentication
        Handler->>Handler: Get or generate bearer token
        Handler->>Secret: Retrieve credential mappings
        Secret-->>Handler: Return credential configuration
        Handler->>Secret: Retrieve Skyflow credentials
        Secret-->>Handler: Return API credentials
        Handler->>Handler: Generate JWT with credentials
        Handler->>Skyflow: Request access token with JWT
        Skyflow-->>Handler: Return access token
        
        %% Tokenization
        Handler->>Skyflow: Send data to vault for tokenization
        Skyflow-->>Handler: Return secure token
        Handler-->>Lambda: Return token to handler
        
    else Detokenization Flow (Token → PII)
        %% Role mapping
        Handler->>Handler: Determine appropriate Skyflow role
        Handler->>Secret: Retrieve role mappings
        Secret-->>Handler: Return role configuration
        Handler->>Handler: Map Redshift role to Skyflow role
        
        %% Authentication
        Handler->>Handler: Get or generate bearer token
        Handler->>Secret: Retrieve credential mappings
        Secret-->>Handler: Return credential configuration
        Handler->>Secret: Retrieve Skyflow credentials
        Secret-->>Handler: Return API credentials
        Handler->>Handler: Generate JWT with credentials
        Handler->>Skyflow: Request access token with JWT
        Skyflow-->>Handler: Return access token
        
        %% Detokenization
        Handler->>Skyflow: Send token for detokenization
        Skyflow-->>Handler: Return original data
        Handler-->>Lambda: Return data to handler
    end
    
    Lambda->>Redshift: Return results back to SQL query
    deactivate Handler
    deactivate Lambda
Loading

Setup

Prerequisites

  • AWS Account with access to:
    • AWS Lambda
    • AWS Secrets Manager
    • Amazon Redshift
  • Skyflow Account with:
    • A configured vault
    • API credentials
    • Proper role setup

Skyflow Configuration

Before deploying the Lambda function, you need to set up your Skyflow environment:

1. Create a Skyflow Vault

  1. Log in to the Skyflow Studio
  2. Create a new vault or use an existing one
  3. Note the Vault ID and Vault URL, which will be needed for environment variables

2. Configure Vault Schema

  1. Define tables and columns that will store sensitive data
  2. For each table/column combination used in tokenization:
    • Set appropriate data types
    • Configure tokenization settings (format preservation, etc.)
    • Define redaction policies

3. Configure Roles and Policies

  1. Follow the instructions at Create a Custom Role
  2. Create roles for different access patterns (e.g., admin, analyst)
  3. For each role:
    • Set appropriate vault policies (read, insert)
    • Configure column-level permissions
    • Set redaction policies
  4. Note the role IDs for your role mapping configuration

4. Create Service Accounts

  1. Follow the instructions at Create Service Account to create a new service account.
  2. Assign the appropriate role to the Service Account.
  3. Download the credentials (includes private key).
  4. Repeat steps to create a Service Account for any additional use cases if needed.

5. Store Credentials in AWS Secrets Manager

  1. Create secrets in AWS Secrets Manager for:
    • Each service account's credentials
    • Role mappings
    • Credential mappings
  2. Format them according to the Secret Structure above

This Skyflow configuration ensures proper authentication, authorization, and data access controls for your tokenization operations.

Secret Structure

Role Mappings Secret

This secret is used to map a Skyflow role to a redshift role.

{
  "defaultRoleID": "skyflow-default-role-id",
  "roleMappings": [
    {
      "redshiftRoles": ["admin", "superuser"],
      "skyflowRoleID": "skyflow-admin-role-id"
    },
    {
      "redshiftRoles": ["analyst"],
      "skyflowRoleID": "skyflow-analyst-role-id"
    }
  ]
}

Credentials Mapping Secret

This secret maps the credentials.json file generated for a specific service account to a role in Redshift. This includes a default secret that will use a default service account in case the role of the Redshift user making the query does not match with any of the redshift roles mapped in this secret. Note that this UDF does not leverage the capability of a single service account assuming one of many roles assigned to it, as demonstrated in the BigQuery UDF in this repository. Instead, this approach demonstrates using dedicated service accounts for a set of redshift roles.

{
  "defaultCredentialsSecret": "default-skyflow-credentials",
  "credentialMappings": [
    {
      "roles": ["redshift-role-name"],
      "credentialsSecret": "skyflow-service-account-credentials,"
    },
    {
      "roles": ["analyst","marketing_user"],
      "credentialsSecret": "analyst-skyflow-credentials"
    }
  ]
}

Deployment

AWS Lambda Setup

  1. Create a new Lambda function using Python 3.9+

  2. Set the required environment variables

  3. Deploy the code to your Lambda function.

  4. Configure the Lambda execution role with permissions to (found under Lambda Function -> Configuration -> Permissions):

    • Access AWS Secrets Manager - Create a custom inline policy
       {
         "defaultCredentialsSecret": "default-skyflow-credentials",
         "credentialMappings": [
           {
             "roles": ["redshift-role-name"],
             "credentialsSecret": "skyflow-service-account-credentials,"
           },
           {
             "roles": ["analyst","marketing_user"],
             "credentialsSecret": "analyst-skyflow-credentials"
           }
         ]
       }
    - AmazonS3ReadOnlyAccess
    - AWSGlueConsoleFullAccess
    - AWSLambdaRole
    
    
  5. Add the following dependencies via layers:

    • Create a layer (Lambda Home Page -> Left Navigation Menu -> Layers -> Create a Laery ) by uploading the zip file in the redshift-udf/layer folder.
    • Add the following public layers using these ARNs arn:aws:lambda:us-west-2:770693421928:layer:Klayers-p39-cryptography:19 arn:aws:lambda:us-west-2:770693421928:layer:Klayers-p39-boto3:23
  6. Configure the following environment variables in your Lambda function:

Variable Description
SKYFLOW_VAULT_ID Your Skyflow vault ID
SKYFLOW_VAULT_URL Your Skyflow vault URL
CREDENTIALS_MAPPING_SECRET AWS Secret name for credential mappings
ROLE_MAPPINGS_SECRET AWS Secret name for role mappings
SERVICE_ACCOUNT_EMAIL Your valid emailfor JWT generation

Redshift Integration

Create UDFs in Redshift that invoke your Lambda function:

-- Detokenize function with both token and redaction parameters
CREATE EXTERNAL FUNCTION public.vault_detokenize(token character varying, redaction character varying)
 RETURNS character varying 
 VOLATILE
 LAMBDA 'your-lambda-function-name'
 IAM_ROLE 'arn:aws:iam::{ACCOUNT_ID}:role/{LAMBDA_EXECUTION_ROLE_NAME}';

-- Simplified detokenize function with just the token parameter
CREATE EXTERNAL FUNCTION public.vault_detokenize(token character varying)
 RETURNS character varying 
 VOLATILE
 LAMBDA 'your-lambda-function-name'
 IAM_ROLE 'arn:aws:iam::{ACCOUNT_ID}:role/{LAMBDA_EXECUTION_ROLE_NAME}';

Replace {ACCOUNT_ID} with your AWS account ID and {LAMBDA_EXECUTION_ROLE_NAME} with the IAM role name.

Redshift Permissions

To run a UDF, you must have permission to do so for each function. By default, permission to run new UDFs is granted to PUBLIC.

To restrict usage, revoke this permission from PUBLIC for the function. Then grant the privilege to specific individuals or groups.

revoke execute on function your_external_function_name(parameter list) from PUBLIC;

grant execute on function your_external_function_name(parameter list) to analyst_group;

Usage

In SQL Queries

-- Detokenize with default redaction
SELECT detokenize('ab123c4d-ef56-7890-gh12-3ij4klm5no6p');

-- Detokenize with specific redaction
SELECT detokenize('ab123c4d-ef56-7890-gh12-3ij4klm5no6p', 'REDACTED');

Security Considerations

  • Store sensitive credentials securely in AWS Secrets Manager
  • Follow the principle of least privilege when configuring IAM roles
  • Regularly rotate credentials and review permissions
  • Implement proper access controls for Redshift UDFs
  • Monitor and audit tokenization/detokenization operations

Troubleshooting

Common Issues

Issue Resolution
Missing environment variables Ensure all required variables are set in Lambda configuration
Secrets access denied Verify Lambda has proper IAM permissions to access Secrets Manager
Invalid JWT Check private key format and credentials in secrets
API errors Review Skyflow API responses and ensure correct vault configurations
Role mapping issues Verify role mappings are correctly defined in the secrets

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages