A Lambda-based handler for tokenizing and detokenizing sensitive data in Amazon Redshift using Skyflow's vaults.
- Introduction
- Features
- How It Works
- Setup
- Deployment
- Usage
- Security Considerations
- Troubleshooting
- Contributing
- License
The Skyflow Redshift UDF Handler is a serverless solution that enables secure tokenization and detokenization operations directly from Amazon Redshift. It leverages AWS Lambda to create User-Defined Functions (UDFs) that interact with Skyflow's data privacy vault to protect sensitive information while maintaining data utility.
- Seamless Redshift Integration: Invoke tokenization and detokenization directly from SQL queries
- Role-Based Access Control: Map Redshift roles to appropriate Skyflow permissions
- Caching Mechanism: Reduce API calls with JWT token caching
- Flexible Configuration: Support multiple credential configurations for different roles
- Error Handling: Comprehensive error handling and logging
The Skyflow Redshift UDF Handler works as a bridge between Amazon Redshift and Skyflow's APIs:
sequenceDiagram
%% Lambda invocation
Redshift->>Lambda: Invoke UDF with data and context
activate Lambda
Lambda->>Lambda: Validate environment configuration
Lambda->>Handler: Initialize handler with environment variables
activate Handler
%% Process arguments
Lambda->>Lambda: Extract user role from Redshift context
Lambda->>Handler: Process arguments based on operation type
alt Tokenization Flow (PII → Token)
%% Role mapping
Handler->>Handler: Determine appropriate Skyflow role
Handler->>Secret: Retrieve role mappings
Secret-->>Handler: Return role configuration
Handler->>Handler: Map Redshift role to Skyflow role
%% Authentication
Handler->>Handler: Get or generate bearer token
Handler->>Secret: Retrieve credential mappings
Secret-->>Handler: Return credential configuration
Handler->>Secret: Retrieve Skyflow credentials
Secret-->>Handler: Return API credentials
Handler->>Handler: Generate JWT with credentials
Handler->>Skyflow: Request access token with JWT
Skyflow-->>Handler: Return access token
%% Tokenization
Handler->>Skyflow: Send data to vault for tokenization
Skyflow-->>Handler: Return secure token
Handler-->>Lambda: Return token to handler
else Detokenization Flow (Token → PII)
%% Role mapping
Handler->>Handler: Determine appropriate Skyflow role
Handler->>Secret: Retrieve role mappings
Secret-->>Handler: Return role configuration
Handler->>Handler: Map Redshift role to Skyflow role
%% Authentication
Handler->>Handler: Get or generate bearer token
Handler->>Secret: Retrieve credential mappings
Secret-->>Handler: Return credential configuration
Handler->>Secret: Retrieve Skyflow credentials
Secret-->>Handler: Return API credentials
Handler->>Handler: Generate JWT with credentials
Handler->>Skyflow: Request access token with JWT
Skyflow-->>Handler: Return access token
%% Detokenization
Handler->>Skyflow: Send token for detokenization
Skyflow-->>Handler: Return original data
Handler-->>Lambda: Return data to handler
end
Lambda->>Redshift: Return results back to SQL query
deactivate Handler
deactivate Lambda
- AWS Account with access to:
- AWS Lambda
- AWS Secrets Manager
- Amazon Redshift
- Skyflow Account with:
- A configured vault
- API credentials
- Proper role setup
Before deploying the Lambda function, you need to set up your Skyflow environment:
- Log in to the Skyflow Studio
- Create a new vault or use an existing one
- Note the Vault ID and Vault URL, which will be needed for environment variables
- Define tables and columns that will store sensitive data
- For each table/column combination used in tokenization:
- Set appropriate data types
- Configure tokenization settings (format preservation, etc.)
- Define redaction policies
- Follow the instructions at Create a Custom Role
- Create roles for different access patterns (e.g.,
admin,analyst) - For each role:
- Set appropriate vault policies (read, insert)
- Configure column-level permissions
- Set redaction policies
- Note the role IDs for your role mapping configuration
- Follow the instructions at Create Service Account to create a new service account.
- Assign the appropriate role to the Service Account.
- Download the credentials (includes private key).
- Repeat steps to create a Service Account for any additional use cases if needed.
- Create secrets in AWS Secrets Manager for:
- Each service account's credentials
- Role mappings
- Credential mappings
- Format them according to the Secret Structure above
This Skyflow configuration ensures proper authentication, authorization, and data access controls for your tokenization operations.
This secret is used to map a Skyflow role to a redshift role.
{
"defaultRoleID": "skyflow-default-role-id",
"roleMappings": [
{
"redshiftRoles": ["admin", "superuser"],
"skyflowRoleID": "skyflow-admin-role-id"
},
{
"redshiftRoles": ["analyst"],
"skyflowRoleID": "skyflow-analyst-role-id"
}
]
}This secret maps the credentials.json file generated for a specific service account to a role in Redshift. This includes a default secret that will use a default service account in case the role of the Redshift user making the query does not match with any of the redshift roles mapped in this secret. Note that this UDF does not leverage the capability of a single service account assuming one of many roles assigned to it, as demonstrated in the BigQuery UDF in this repository. Instead, this approach demonstrates using dedicated service accounts for a set of redshift roles.
{
"defaultCredentialsSecret": "default-skyflow-credentials",
"credentialMappings": [
{
"roles": ["redshift-role-name"],
"credentialsSecret": "skyflow-service-account-credentials,"
},
{
"roles": ["analyst","marketing_user"],
"credentialsSecret": "analyst-skyflow-credentials"
}
]
}-
Create a new Lambda function using Python 3.9+
-
Set the required environment variables
-
Deploy the code to your Lambda function.
-
Configure the Lambda execution role with permissions to (found under Lambda Function -> Configuration -> Permissions):
- Access AWS Secrets Manager - Create a custom inline policy
{ "defaultCredentialsSecret": "default-skyflow-credentials", "credentialMappings": [ { "roles": ["redshift-role-name"], "credentialsSecret": "skyflow-service-account-credentials," }, { "roles": ["analyst","marketing_user"], "credentialsSecret": "analyst-skyflow-credentials" } ] }
- AmazonS3ReadOnlyAccess - AWSGlueConsoleFullAccess - AWSLambdaRole - Access AWS Secrets Manager - Create a custom inline policy
-
Add the following dependencies via layers:
- Create a layer (Lambda Home Page -> Left Navigation Menu -> Layers -> Create a Laery ) by uploading the zip file in the redshift-udf/layer folder.
- Add the following public layers using these ARNs arn:aws:lambda:us-west-2:770693421928:layer:Klayers-p39-cryptography:19 arn:aws:lambda:us-west-2:770693421928:layer:Klayers-p39-boto3:23
-
Configure the following environment variables in your Lambda function:
| Variable | Description |
|---|---|
SKYFLOW_VAULT_ID |
Your Skyflow vault ID |
SKYFLOW_VAULT_URL |
Your Skyflow vault URL |
CREDENTIALS_MAPPING_SECRET |
AWS Secret name for credential mappings |
ROLE_MAPPINGS_SECRET |
AWS Secret name for role mappings |
SERVICE_ACCOUNT_EMAIL |
Your valid emailfor JWT generation |
Create UDFs in Redshift that invoke your Lambda function:
-- Detokenize function with both token and redaction parameters
CREATE EXTERNAL FUNCTION public.vault_detokenize(token character varying, redaction character varying)
RETURNS character varying
VOLATILE
LAMBDA 'your-lambda-function-name'
IAM_ROLE 'arn:aws:iam::{ACCOUNT_ID}:role/{LAMBDA_EXECUTION_ROLE_NAME}';
-- Simplified detokenize function with just the token parameter
CREATE EXTERNAL FUNCTION public.vault_detokenize(token character varying)
RETURNS character varying
VOLATILE
LAMBDA 'your-lambda-function-name'
IAM_ROLE 'arn:aws:iam::{ACCOUNT_ID}:role/{LAMBDA_EXECUTION_ROLE_NAME}';Replace {ACCOUNT_ID} with your AWS account ID and {LAMBDA_EXECUTION_ROLE_NAME} with the IAM role name.
To run a UDF, you must have permission to do so for each function. By default, permission to run new UDFs is granted to PUBLIC.
To restrict usage, revoke this permission from PUBLIC for the function. Then grant the privilege to specific individuals or groups.
revoke execute on function your_external_function_name(parameter list) from PUBLIC;
grant execute on function your_external_function_name(parameter list) to analyst_group;
-- Detokenize with default redaction
SELECT detokenize('ab123c4d-ef56-7890-gh12-3ij4klm5no6p');
-- Detokenize with specific redaction
SELECT detokenize('ab123c4d-ef56-7890-gh12-3ij4klm5no6p', 'REDACTED');- Store sensitive credentials securely in AWS Secrets Manager
- Follow the principle of least privilege when configuring IAM roles
- Regularly rotate credentials and review permissions
- Implement proper access controls for Redshift UDFs
- Monitor and audit tokenization/detokenization operations
| Issue | Resolution |
|---|---|
| Missing environment variables | Ensure all required variables are set in Lambda configuration |
| Secrets access denied | Verify Lambda has proper IAM permissions to access Secrets Manager |
| Invalid JWT | Check private key format and credentials in secrets |
| API errors | Review Skyflow API responses and ensure correct vault configurations |
| Role mapping issues | Verify role mappings are correctly defined in the secrets |
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.