diff --git a/src/_data/catalog/warehouse.yml b/src/_data/catalog/warehouse.yml index c40df849d4..dad11ce6d9 100644 --- a/src/_data/catalog/warehouse.yml +++ b/src/_data/catalog/warehouse.yml @@ -71,12 +71,12 @@ items: url: 'https://cdn.filepicker.io/api/file/Vk6iFlMvQeynbg30ZEtt' categories: - Warehouses -- display_name: Databricks Delta Lake - slug: databricks-delta-lake - name: catalog/warehouses/databricks-delta-lake +- display_name: Databricks + slug: databricks + name: catalog/warehouses/databricks description: '' - url: connections/storage/catalog/databricks-delta-lake - status: PRIVATE_BETA + url: connections/storage/catalog/databricks + status: PUBLIC endpoints: - us - eu diff --git a/src/_data/sidenav/main.yml b/src/_data/sidenav/main.yml index fb1f7383ff..55cc0c6192 100644 --- a/src/_data/sidenav/main.yml +++ b/src/_data/sidenav/main.yml @@ -306,8 +306,13 @@ sections: section: - path: /unify/profiles-sync/overview title: Profiles Sync Overview - - path: /unify/profiles-sync - title: Setup + - section_title: Profiles Sync Setup + slug: unify/profiles-sync/profiles-sync-setup + section: + - path: /unify/profiles-sync/profiles-sync-setup + title: Setup + - path: /unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync + title: Databricks for Profiles Sync - path: /unify/profiles-sync/sample-queries title: Sample Queries - path: /unify/profiles-sync/tables diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md deleted file mode 100644 index c2a170ffcc..0000000000 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md +++ /dev/null @@ -1,172 +0,0 @@ ---- -title: Databricks Delta Lake Destination (AWS Setup) -hidden: true ---- - -{% comment %} - -With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. - -This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on S3. - -> info "Databricks Delta Lake Destination in Public Beta" -> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions. - -## Overview - -Before getting started, use the overview below to get up to familiarize yourself with Segment's Databricks Delta Lake Destination. - -1. Segment writes directly to your Delta Lake in the cloud storage (S3) -- Segment manages the creation and evolution of Delta tables. -- Segment uses IAM role assumption to write Delta tables to AWS S3. -2. Segment supports both OAuth and personal access tokens (PAT) for API authentication. -3. Segment creates and updates the table's metadeta in Unity Catalog by running queries on a small, single node Databricks SQL warehouse in your environment. -4. If a table already exists and no new columns are introduced, Segment appends data to the table (no SQL required). -5. For new data types/columns, Segment reads the current schema for the table from the Unity Catalog and uses the SQL warehouse to update the schema accordingly. - -## Prerequisites - -Please note the following prerequisites for setup. - -1. The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. -2. You'll need the following permissions for setup: -- **AWS**: The ability to create an S3 bucket and IAM role. -- **Databricks**: Admin access at the account and workspace level. - -## Authentication - -Segment supports both OAuth and personal access token (PAT) for authentication. Segment recommends using OAuth as it's easier to set up and manage. Throughout this guide, some instructions are marked as *OAuth only* or *PAT only*. You can skip any instructions that don't correspond with your authentication method. - -## Key terms - -As you set up Databricks, keep the following key terms in mind. -- **Databricks Workspace URL**: The base URL for your Databricks workspace. -- **Service Principal Application ID**: The ID tied to the service principal you'll create for Segment. -- **Service Principal Secret/Token**: The client secret or PAT you'll create for the service principal. -- **Target Unity Catalog**: The catalog where Segment lands your data. -- **Workspace Admin Token** (*PAT only*): The access token you'll generate for your Databricks workspace admin. - -## Setup for Databricks Delta Lake (S3) - -### Step 1: Find your Databricks Workspace URL - -You'll use the Databricks workspace URL, along with Segment, to access your workspace API. - -Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. - -### Step 2: Create a service principal - -Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. - -### Step 3: Enable entitlements for the service principal on the workspace - -This step allows the Segment service principal to create and use a small SQL warehouse, which is used for creating and updating table schemas in the Unity Catalog. - -To enable entitlements for the service principal you just created, follow the Databricks [guide for managing workspace entitlements for a service principal](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"}. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements. - -### Step 4: Create an external location and storage credentials - -This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage. -1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the AWS guide for [creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"}. -2. Once the external location and storage credentials are created in your Databricks workspace, update the permissions to allow access to the Segment service principal. - 1. In your workspace, navigate to **Data > External Data > Storage Credentials**. - 2. Click the name of the credentials created above to go to the Permissions tab. - 3. Click **Grant**, then select the Segment service principal from the drop-down. - 4. Select the **CREATE EXTERNAL TABLE**, **READ FILES**, and **WRITE FILES** checkboxes. - 5. Click **Grant**. - 6. Click **External Locations**. - 7. Click the name of the location created above and go to the Permissions tab. - 8. Click **Grant**, then select the Segment service principal from the drop-down. - 9. Select the **CREATE EXTERNAL TABLE**, **READ FILES**, and **WRITE FILES** checkboxes. - 10. Click **Grant**. -3. In AWS, supplement the Trust policy for the role created when setting up the storage credentials. - 1. Add: `arn:aws:iam::595280932656:role/segment-storage-destinations-production-access` to the Principal list. - 2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**). - -The Trust policy should look like: - -``` -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "AWS": [ -"arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL", -"arn:aws:iam:::role/", -"arn:aws:iam::595280932656:role/segment-storage-destinations-production-access" - ] - }, - "Action": "sts:AssumeRole", - "Condition": { - "StringEquals": { - "sts:ExternalId": [ -"", -"" - ] - } - } - } - ] -} - -``` - -### Step 5: Create a workspace admin access token (PAT only) - -Your Databricks workspace admin uses the workspace admin access token to generate a personal access token for the service principal. - -To create your token, follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use. - -### Step 6: Enable personal access tokens for the workspace (PAT only) - -This step allows the creation and use of personal access tokens for the workspace admin and the service principal. -1. Follow the Databricks guide for [enabling personal access token authentication](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for the workspace. -2. Follow the Databricks docs to [grant Can Use permission](https://docs.databricks.com/en/security/auth-authz/api-access-permissions.html#manage-token-permissions-using-the-admin-settings-page){:target="_blank"} to the Segment service principal created earlier. - -### Step 7: Generate a personal access token for the service principal (PAT only) - -Segment uses the personal access token to access the Databricks workspace API. The Databricks UI doesn't allow for the creation of service principal tokens. Tokens must be generated using either the Databricks workspace API (*recommended*) or the Databricks CLI. -Generating a token requires the following values: -- **Databricks Workspace URL**: The base URL to your Databricks workspace. -- **Workspace Admin Token**: The token generated for your Databricks admin user. -- **Service Principal Application ID**: The ID generated for the Segment service principal. -- **Lifetime Seconds**: The number of seconds before the token expires. Segment doesn't prescribe a specific token lifetime. Using the instructions below, you'll need to generate and update a new token in the Segment app before the existing token expires. Segment's general guidance is 90 days (7776000 seconds). -- **Comment**: A comment which describes the purpose of the token (for example, "Grants Segment access to this workspace until 12/21/2023"). -1. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}. -``` -curl --location -'/api/2.0/token-management/on-behalf-of/tokens' --header 'Content-Type: application/json' --header 'Authorization: Bearer ' --data '{"application_id": "", "lifetime_seconds": , "comment": ""}' -``` -The response from the API contains a `token_value` field. Note this value for later use. -2. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}. - - ``` - databricks token-management create-obo-token - --comment -p - ``` -The response from the CLI will contain a `token_value` field. Note this value for later use. - -### Step 8: Create a new catalog in Unity Catalog and grant Segment permissions - -This catalog is the target catalog where Segment lands your schemas/tables. -1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. -2. Select the catalog you've just created. - 1. Select the Permissions tab, then click **Grant** - 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. - 3. Click **Grant**. - -### Step 9: Setup the Databricks Delta Lake destination in Segment - -This step links a Segment events source to your Databricks workspace/catalog. -1. From the Segment app, navigate to **Connections > Catalog**, then click **Destinations**. -2. Search for and select the "Databricks Delta Lake" destination. -2. Click **Add Destination**, select a source, then click **Next**. -3. Enter the name for your destination, then click **Create destination**. -4. Enter connection settings for the destination. - - - {% endcomment %} \ No newline at end of file diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md deleted file mode 100644 index 7f8dcc4442..0000000000 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: Databricks Delta Lake Destination (Azure Setup) -hidden: true ---- - -{% comment %} - -With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. - -This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on Azure (ADLS Gen 2). - - -> info "Databricks Delta Lake Destination in Public Beta" -> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions. - -## Overview - -Before getting started, use the overview below to get up to familiarize yourself with Segment's Databricks Delta Lake Destination. - -1. Segment writes directly to your Delta Lake in the cloud storage (Azure) -- Segment manages the creation and evolution of Delta tables. -- Segment uses a cross-tenant service principal to write Delta tables to ADLS Gen2. -2. Segment supports both OAuth and personal access tokens (PAT) for API authentication. -3. Segment creates and updates the table's metadeta in Unity Catalog by running queries on a small, single node Databricks SQL warehouse in your environment. -4. If a table already exists and no new columns are introduced, Segment appends data to the table (no SQL required). -5. For new data types/columns, Segment reads the current schema for the table from the Unity Catalog and uses the SQL warehouse to update the schema accordingly. - -## Prerequisites - -Please note the following pre-requisites for setup. - -1. Your Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide for [enabling Unity Catalog](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/enable-workspaces){:target="_blank"} for more info. -2. You'll need the following permissions for setup: -- **Azure**: Ability to create service principals, as well as create and manage the destination storage container and its associated role assignments. -- **Databricks**: Admin access to the account and workspace level. - -## Key terms - -As you set up Databricks, keep the following key terms in mind. - -- **Databricks Workspace URL**: The base URL for your Databricks workspace. -- **Target Unity Catalog**: The catalog where Segment lands your data. - -## Set up Databricks Delta Lake (Azure) - -### Step 1: Find your Databricks Workspace URL - -You'll use the Databricks workspace URL, along with Segment, to access your workspace API. - -Check your browser's address bar when in your workspace. The workspace URL will look something like: `https://.azuredatabricks.net`. Remove any characters after this portion and note this value for later use. - -### Step 2: Add the Segment Storage Destinations service principal to your Entra ID (Active Directory) - -Segment uses the service principal to access your Databricks workspace APIs as well as your ADLS Gen2 storage container. You can use either Azure PowerShell or the Azure CLI. - -1. **Recommended**: Azure PowerShell - 1. Log in to the Azure console with a user allowed to add new service principals. - 2. Open a Cloud Shell (first button to the right of the top search bar). - 3. Once loaded, enter the following command in the shell: - - ``` - New-AzADServicePrincipal -applicationId fffa5b05-1da5-4599-8360-cc2684bcdefb - ``` - -2. **(Alternative option)** Azure CLI - 1. Log into the Azure CLI using the [az login command](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli){:target="_blank"}. - 2. Once authenticated, run the following command: - - ``` - az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb - ``` - -### Step 3: Update or create an ADLS Gen2 storage container - -The ADLS Gen2 storage container is where Segment lands your Delta Lake files. - -1. In the Azure console, navigate to **Storage accounts** and locate or create a new storage account to use for your Segment data. -2. Select the account, then select **Containers**. -3. Select or create a target container. -4. On the container view, select **Access Control (IAM)**, then navigate to the Role assignments tab. -5. Click **+ Add**, then select **Add role assignment**. -6. Search for and select "Storage Blob Data Contributor", then click next. -7. For "Assign access to" select **User, group, or service principal**. -8. Click **+ Select members**, then search for and select "Segment Storage Destinations". -9. Click **Review + assign**. - -### Step 4: Add the Segment Storage Destinations service principal to the account/workspace - -This step allows Segment to access your workspace. -1. Follow the Databricks guide for [adding a service principal](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} using the account console. -- Segment recommends using "Segment Storage Destinations" for the name, though any identifier is allowed. -- For the **UUID** enter `fffa5b05-1da5-4599-8360-cc2684bcdefb`. -- Segment doesn't require Account admin access. -2. Follow the Databricks guide for [adding a service principal to the workspace](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#assign-a-service-principal-to-a-workspace-using-the-account-console){:target="_blank"} -- Use the service principal created at the account level above. -- Segment doesn't require Workspace admin access. - -### Step 5: Enable entitlements for the service principal on the workspace - -This step allows the Segment service principal to create a small SQL warehouse for creating and updating table schemas in the Unity Catalog. - -To enable entitlements, follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements. - -### Step 6: Create an external location and storage credentials - -This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage. -1. Follow the Databricks guide for [managing external locations and storage credentials](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials){:target="_blank"}. -- Use the storage container you updated in step 3. -- For storage credentials, you can use a service principal or managed identity. -2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal.

-In your workspace, navigate to **Data > External Data > Storage Credentials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes: -- `CREATE EXTERNAL TABLE` -- `READ FILES` -- `WRITE FILES` -3. Click **Grant**. - -### Step 7: Create a new catalog in Unity Catalog and grant Segment permissions - -This catalog is the target catalog where Segment lands your schemas/tables. - -1. Follow the Databricks guide for [creating a catalog](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-catalogs){:target="_blank"}. -- Select the storage location you created earlier. The catalog name can be any valid catalog name (for example, "Segment"). Note this name for later use. -2. Select the newly-created catalog. - 1. Click the Permissions tab, then click **Grant**. - 2. Select the Segment service principal from the dropdown. - 3. Check `ALL PRIVILEGES`, then click **Grant**. - -### Step 8: Setup the Databricks Delta Lake destination in Segment - -This step links a Segment source to your Databricks workspace/catalog. -1. From the Segment app, navigate to **Connections > Catalog**, then click **Destinations**. -2. Search for and select the "Databricks Delta Lake" destination. -2. Click **Add Destination**, select a source, then click **Next**. -3. Enter the name for your destination, then click **Create destination**. -4. Enter the connection settings using the values noted above (leave the Service Principal fields blank). - -{% endcomment %} \ No newline at end of file diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks-delta-lake/index.md deleted file mode 100644 index a564dd12fd..0000000000 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ /dev/null @@ -1,7 +0,0 @@ ---- -title: Databricks Delta Lake -redirect_from: - - '/connections/warehouses/catalog/databricks-delta-lake/' ---- - -Setup docs coming soon! \ No newline at end of file diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md new file mode 100644 index 0000000000..c447425b0e --- /dev/null +++ b/src/connections/storage/catalog/databricks/index.md @@ -0,0 +1,92 @@ +--- +title: Databricks Destination +public: true + +--- +{% include content/warehouse-ip.html %} + +With the Databricks Destination, you can ingest event data directly from Segment into your Databricks Lakehouse. + +This page will help you get started with syncing Segment events into your Databricks Lakehouse. + +> success "" +> Segment has certified the destination for Databricks on AWS and Azure. + + +## Getting started + +Before getting started with the Databricks Destination, note the following prerequisites. + +- The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. +- Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. +- Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. + +> success "" +> Segment recommends that you enable Warehouse Selective Sync. This feature enables customization of collections and properties sent to the warehouse. By syncing only relevant and required data, it reduces sync duration and compute costs, optimizing efficiency compared to syncing everything. Learn more about [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). + +### Warehouse size + +A [SQL warehouse is required](https://docs.databricks.com/en/compute/sql-warehouse/warehouse-behavior.html#sizing-a-serverless-sql-warehouse){:target="_blank"} for compute. Segment recommends a warehouse with the following characteristics: + - **Size**: small + - **Type** Serverless otherwise Pro + - **Clusters**: Minimum of 2 - Maximum of 6 + +> success "" +> Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. + +## Set up Databricks in Segment + +Use the following steps to set up Databricks in Segment: + +1. Navigate to **Connections > Catalog**. +2. Select the **Destinations** tab. +3. Under Connection Type, select **Storage**, and click on the **Databricks storage** tile. +4. (Optional) Select a source(s) to connect to the destination. +5. Follow the steps below to [connect your Databricks warehouse](#connect-your-databricks-warehouse). + +## Connect your Databricks warehouse + +Use the five steps below to connect to your Databricks warehouse. + +> warning "" +> You'll need read and write warehouse permissions for Segment to write to your database. + +### Step 1: Name your destination + +Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. + +### Step 2: Enter the Databricks compute resources URL + + +You'll use the Databricks workspace URL, along with Segment, to access your workspace API. + +Check your browser's address bar when inside the workspace. The workspace URL should resemble: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. + +### Step 3: Enter a Unity catalog name + +This catalog is the target catalog where Segment lands your schemas and tables. +1. Follow the [Databricks guide for creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +2. Select the catalog you've just created. + 1. Select the Permissions tab, then click **Grant**. + 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. + 3. Click **Grant**. + +### Step 4: Add the SQL warehouse details from your Databricks warehouse + +Next, add SQL warehouse details about your compute resource. +- **HTTP Path**: The connection details for your SQL warehouse. +- **Port**: The port number of your SQL warehouse. + + +### Step 5: Add the service principal client ID and OAuth secret + +> warning "" +> Be sure to note the principal ID and the OAuth secret Databricks generates, as you'll need to enter them in this step. + +Segment uses the service principal to access your Databricks workspace and associated APIs. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the principal application ID that Databricks generates to use in this step. Segment doesn't require Account admin or Marketplace admin roles. +2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks to use in this step. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. + + +Once connected, you'll see a confirmation screen with next steps and more info on using your warehouse. + diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md new file mode 100644 index 0000000000..d5313b3af3 --- /dev/null +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -0,0 +1,107 @@ +--- +title: Databricks for Profiles Sync +plan: unify +--- + +With Databricks for Profiles Sync, you can use [Profiles Sync](/docs/unify/profiles-sync/overview/) to sync Segment profiles into your Databricks Lakehouse. + + +## Getting started + +Before getting started with Databricks Profiles Sync, note the following prerequisites for setup. + +- The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. +- Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. +- Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. + +### Warehouse size and performance + +A SQL warehouse is required for compute. Segment recommends a warehouse with the the following characteristics: + - **Size**: small + - **Type** Serverless otherwise Pro + - **Clusters**: Minimum of 2 - Maximum of 6 + + +> success "" +> To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. + +> info "" +> Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. + + +## Set up Databricks for Profiles Sync + +1. From your Segment app, navigate to **Unify > Profiles Sync**. +2. Click **Add Warehouse**. +3. Select **Databricks** as your warehouse type. +4. Use the following steps to [connect your warehouse](#connect-your-databricks-warehouse). + + +## Connect your Databricks warehouse + +Use the five steps below to connect to your Databricks warehouse. + +> warning "" +> To configure your warehouse, you'll need read and write permissions. + +### Step 1: Name your schema + +Pick a name to help you identify this space in the warehouse, or use the default name provided. You can't change this name once the warehouse is connected. + +### Step 2: Enter the Databricks compute resources URL + +You'll use the Databricks workspace URL, along with Segment, to access your workspace API. + +Check your browser's address bar when inside the workspace. The workspace URL should resemble: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. + +### Step 3: Enter a Unity catalog name + +This catalog is the target catalog where Segment lands your schemas and tables. +1. Follow the [Databricks guide for creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +2. Select the catalog you've just created. + 1. Select the Permissions tab, then click **Grant**. + 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. + 3. Click **Grant**. + +### Step 4: Add the SQL warehouse details from your Databricks warehouse + +Next, add SQL warehouse details about your compute resource. +- **HTTP Path**: The connection details for your SQL warehouse. +- **Port**: The port number of your SQL warehouse. + + +### Step 5: Add the service principal client ID and client secret + +Segment uses the service principal to access your Databricks workspace and associated APIs. + +**Service principal client ID**: Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require `Account admin` or `Marketplace admin` roles. + +The service principal needs the following setup: + - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: + - USE CATALOG + - USE SCHEMA + - MODIFY + - SELECT + - CREATE SCHEMA + - CREATE TABLE + - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. + - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. + + +**Client secret**: Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. + + +Once you've configured your warehouse, test the connection and click **Next**. + +## Set up selective sync + +With selective sync, you can choose exactly which tables you want synced to the Databricks warehouse. Segment syncs materialized view tables as well by default. + +Select tables to sync, then click **Next**. Segment creates the warehouse and connects databricks to your Profiles Sync space. + +You can view sync status, and the tables you're syncing from the Profiles Sync overview page. + + +Learn more about [using selective sync](/docs/unify/profiles-sync/#using-selective-sync) with Profiles Sync. + + diff --git a/src/unify/profiles-sync/index.md b/src/unify/profiles-sync/profiles-sync-setup/index.md similarity index 97% rename from src/unify/profiles-sync/index.md rename to src/unify/profiles-sync/profiles-sync-setup/index.md index 3b9289ce77..f0a2a52f04 100644 --- a/src/unify/profiles-sync/index.md +++ b/src/unify/profiles-sync/profiles-sync-setup/index.md @@ -1,6 +1,8 @@ --- title: Profiles Sync Setup plan: unify +redirect_from: + - '/unify/profiles-sync/' --- On this page, you’ll learn how to set up Profiles Sync, enable historical backfill, and adjust settings for warehouses that you’ve connected to Profiles Sync. @@ -17,7 +19,7 @@ Before you begin, prepare for setup with these tips: - To connect your warehouse to Segment, you must have read and write permissions with the warehouse Destination you choose. - During Step 2, you’ll copy credentials between Segment and your warehouse Destination. To streamline setup, open your Segment workspace in one browser tab and open another with your warehouse account. - Make sure to copy any IP addresses Segment asks you to allowlist in your warehouse Destination. - + ### Step 1: Create a warehouse You’ll first choose the Destination warehouse to which Segment will sync profiles. Profiles Sync supports the Snowflake, Redshift, BigQuery, Azure, and Postgres warehouse Destinations. Your initial setup will depend on the warehouse you choose. @@ -31,6 +33,7 @@ The following table shows the supported Profiles Sync warehouse Destinations and | [BigQuery](/docs/connections/storage/catalog/bigquery/) | 1. Create a project and enable BigQuery.
2. Create a service account for Segment. | | [Azure](/docs/connections/storage/catalog/azuresqldw/) | 1. Sign up for an Azure subscription.
2. Provision a dedicated SQL pool. | | [Postgres](/docs/connections/storage/catalog/postgres/) | 1. Follow the steps in the [Postgres getting started](/docs/connections/storage/catalog/postgres/) section. | +| [Databricks](/docs/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync/) | 1. Follow the steps in the [Databricks for Profiles Sync](/docs/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync/) guide. | Once you’ve finished the required steps for your chosen warehouse, you’re ready to connect your warehouse to Segment. Because you’ll next enter credentials from the warehouse you just created, **leave the warehouse tab open to streamline setup.**