From a176d7434ca0c3492d31837a63bb1b50d7009d33 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 12 Jan 2024 17:12:49 -0800 Subject: [PATCH 01/34] Profiles sync first pass --- .../databricks-profiles-sync.md | 106 ++++++++++++++++++ 1 file changed, 106 insertions(+) create mode 100644 src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md new file mode 100644 index 0000000000..10633d6e55 --- /dev/null +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -0,0 +1,106 @@ +--- +title: Databricks Profiles Sync +plan: unify +--- + +With Databricks Profiles Sync, you can use Profiles Sync to sync Segment profiles into your Databricks Lakehouse. + + +## Getting started + +Before starting with the Databricks Profiles Sync destination, note the following prerequisites for setup. + +- The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. +Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. + +- Segment uses the service principal to access your Databricks workspace and associated APIs. + - Use the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. + +- The service principal needs the following setup: + - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete teh existing secret and create a new one. + - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: + - USE CATALOG + - USE SCHEMA + - MODIFY + - SELECT + - CREATE SCHEMA + - CREATE TABLE + - Databricks SQL access [entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. + - CAN USE [permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. + +- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. +- A SQL warehouse is required for compute. Segment recommends the following size: + - **Size**: small + - **Type** Serverless otherwise Pro + - **Clusters**: Minimum of 2 - Maximum of 6 + +- To improve the query performance of the Delta Lake, Segment recommends to create compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. + +- If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. For a better experience, Segment recommends manually starting the warehouse in advance. + + +## Set up Databricks for Profiles Sync + +1. From your Segment app, navigate to **Unify > Profiles Sync**. +2. Click **Add Warehouse**. +3. Select **Databricks** as your warehouse type. +4. Use the following steps to [connect your warehouse](#connect-your-databricks-warehouse). + + +## Connect your Databricks warehouse + +Use the following five steps to connect your Databricks warehouse. + +> warning "" +> To configure your warehouse, you'll need read and write permissions. + +### Step 1: Name your destination + +Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to (??? + +### Step 2: Enter the Databricks compute resources URL + + +You'll use the Databricks workspace URL, along with Segment, to access your workspace API. + +Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. + +### Step 3: Enter a Unity catalog name. + +This catalog is the target catalog where Segment lands your schemasand tablestables. +1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +2. Select the catalog you've just created. + 1. Select the Permissions tab, then click **Grant** + 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. + 3. Click **Grant**. + +### Step 4: Add the SQL warehouse details from your Databricks warehouse. + +Next, add SQL warehouse details about your compute resource. +- **HTTP Path**: Get connection detials for a SQL warehouse +- **Port**: The port number of your SQL warehouse. + + +### Step 5: Add the principal service client ID and client secret. + +Segment uses the service principal to access your Databricks workspace and associated APIs. +1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. +2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. + + +Once you've configured your warehouse, test the connection and click **Next**. + +## Set up selective sync + +With selective sync, you can choose exactly which tables you want synced to the Databricks warehouse. Segment syncs materialized view tables as well by default. + +Select tables to sync, then click **Next**. Segment creates the warehouse and connects databricks to your Profiles Sync space. + +You can view sync status, and the tables you're syncing from the Profiles Sync overview page. + + +Learn more about [using Selective Sync](/docs/unify/profiles-sync/using-selective-sync) with Profiles Sync. + + From 6edf32970f62afa84830b7eb1a4242714c3178d8 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Tue, 16 Jan 2024 17:34:00 -0800 Subject: [PATCH 02/34] Profiles Sync edits --- .../databricks-profiles-sync.md | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index 10633d6e55..2b2b1e68fc 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -19,7 +19,7 @@ Segment creates [managed tables](https://docs.databricks.com/en/data-governance/ - Use the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. - The service principal needs the following setup: - - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete teh existing secret and create a new one. + - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete the existing secret and create a new one. - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: - USE CATALOG - USE SCHEMA @@ -58,16 +58,15 @@ Use the following five steps to connect your Databricks warehouse. ### Step 1: Name your destination -Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to (??? +Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. ### Step 2: Enter the Databricks compute resources URL - You'll use the Databricks workspace URL, along with Segment, to access your workspace API. Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. -### Step 3: Enter a Unity catalog name. +### Step 3: Enter a Unity catalog name This catalog is the target catalog where Segment lands your schemasand tablestables. 1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. @@ -76,14 +75,14 @@ This catalog is the target catalog where Segment lands your schemasand tablestab 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. 3. Click **Grant**. -### Step 4: Add the SQL warehouse details from your Databricks warehouse. +### Step 4: Add the SQL warehouse details from your Databricks warehouse Next, add SQL warehouse details about your compute resource. -- **HTTP Path**: Get connection detials for a SQL warehouse +- **HTTP Path**: The connection details for your SQL warehouse - **Port**: The port number of your SQL warehouse. -### Step 5: Add the principal service client ID and client secret. +### Step 5: Add the principal service client ID and client secret Segment uses the service principal to access your Databricks workspace and associated APIs. 1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. From 8046d15c532ebe87f821f7e0dd42bb134169a49b Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Thu, 18 Jan 2024 16:46:38 -0800 Subject: [PATCH 03/34] Remove outdated aws doc --- .../databricks-delta-lake-aws.md | 172 ------------------ 1 file changed, 172 deletions(-) delete mode 100644 src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md deleted file mode 100644 index c2a170ffcc..0000000000 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-aws.md +++ /dev/null @@ -1,172 +0,0 @@ ---- -title: Databricks Delta Lake Destination (AWS Setup) -hidden: true ---- - -{% comment %} - -With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. - -This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on S3. - -> info "Databricks Delta Lake Destination in Public Beta" -> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions. - -## Overview - -Before getting started, use the overview below to get up to familiarize yourself with Segment's Databricks Delta Lake Destination. - -1. Segment writes directly to your Delta Lake in the cloud storage (S3) -- Segment manages the creation and evolution of Delta tables. -- Segment uses IAM role assumption to write Delta tables to AWS S3. -2. Segment supports both OAuth and personal access tokens (PAT) for API authentication. -3. Segment creates and updates the table's metadeta in Unity Catalog by running queries on a small, single node Databricks SQL warehouse in your environment. -4. If a table already exists and no new columns are introduced, Segment appends data to the table (no SQL required). -5. For new data types/columns, Segment reads the current schema for the table from the Unity Catalog and uses the SQL warehouse to update the schema accordingly. - -## Prerequisites - -Please note the following prerequisites for setup. - -1. The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. -2. You'll need the following permissions for setup: -- **AWS**: The ability to create an S3 bucket and IAM role. -- **Databricks**: Admin access at the account and workspace level. - -## Authentication - -Segment supports both OAuth and personal access token (PAT) for authentication. Segment recommends using OAuth as it's easier to set up and manage. Throughout this guide, some instructions are marked as *OAuth only* or *PAT only*. You can skip any instructions that don't correspond with your authentication method. - -## Key terms - -As you set up Databricks, keep the following key terms in mind. -- **Databricks Workspace URL**: The base URL for your Databricks workspace. -- **Service Principal Application ID**: The ID tied to the service principal you'll create for Segment. -- **Service Principal Secret/Token**: The client secret or PAT you'll create for the service principal. -- **Target Unity Catalog**: The catalog where Segment lands your data. -- **Workspace Admin Token** (*PAT only*): The access token you'll generate for your Databricks workspace admin. - -## Setup for Databricks Delta Lake (S3) - -### Step 1: Find your Databricks Workspace URL - -You'll use the Databricks workspace URL, along with Segment, to access your workspace API. - -Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. - -### Step 2: Create a service principal - -Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. - -### Step 3: Enable entitlements for the service principal on the workspace - -This step allows the Segment service principal to create and use a small SQL warehouse, which is used for creating and updating table schemas in the Unity Catalog. - -To enable entitlements for the service principal you just created, follow the Databricks [guide for managing workspace entitlements for a service principal](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"}. Segment requires the `Allow cluster creation` and `Databricks SQL access` entitlements. - -### Step 4: Create an external location and storage credentials - -This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage. -1. Follow the Databricks guide for [managing external locations and storage credentials](https://docs.databricks.com/en/data-governance/unity-catalog/manage-external-locations-and-credentials.html){:target="_blank"}. This guide assumes the target S3 bucket already exists. If not, follow the AWS guide for [creating a bucket](https://docs.aws.amazon.com/AmazonS3/latest/userguide/create-bucket-overview.html){:target="_blank"}. -2. Once the external location and storage credentials are created in your Databricks workspace, update the permissions to allow access to the Segment service principal. - 1. In your workspace, navigate to **Data > External Data > Storage Credentials**. - 2. Click the name of the credentials created above to go to the Permissions tab. - 3. Click **Grant**, then select the Segment service principal from the drop-down. - 4. Select the **CREATE EXTERNAL TABLE**, **READ FILES**, and **WRITE FILES** checkboxes. - 5. Click **Grant**. - 6. Click **External Locations**. - 7. Click the name of the location created above and go to the Permissions tab. - 8. Click **Grant**, then select the Segment service principal from the drop-down. - 9. Select the **CREATE EXTERNAL TABLE**, **READ FILES**, and **WRITE FILES** checkboxes. - 10. Click **Grant**. -3. In AWS, supplement the Trust policy for the role created when setting up the storage credentials. - 1. Add: `arn:aws:iam::595280932656:role/segment-storage-destinations-production-access` to the Principal list. - 2. Convert the `sts:ExternalID` field to a list and add the Segment Workspace ID. You'll find the Segment workspace ID in the Segment app (**Settings > Workspace settings > ID**). - -The Trust policy should look like: - -``` -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "AWS": [ -"arn:aws:iam::414351767826:role/unity-catalog-prod-UCMasterRole-14S5ZJVKOTYTL", -"arn:aws:iam:::role/", -"arn:aws:iam::595280932656:role/segment-storage-destinations-production-access" - ] - }, - "Action": "sts:AssumeRole", - "Condition": { - "StringEquals": { - "sts:ExternalId": [ -"", -"" - ] - } - } - } - ] -} - -``` - -### Step 5: Create a workspace admin access token (PAT only) - -Your Databricks workspace admin uses the workspace admin access token to generate a personal access token for the service principal. - -To create your token, follow the Databricks guide for [generating personal access tokens](https://docs.databricks.com/en/dev-tools/auth.html#databricks-personal-access-tokens-for-workspace-users){:target="_blank"} for workspace users. Note the generated token for later use. - -### Step 6: Enable personal access tokens for the workspace (PAT only) - -This step allows the creation and use of personal access tokens for the workspace admin and the service principal. -1. Follow the Databricks guide for [enabling personal access token authentication](https://docs.databricks.com/en/administration-guide/access-control/tokens.html#enable-or-disable-personal-access-token-authentication-for-the-workspace){:target="_blank"} for the workspace. -2. Follow the Databricks docs to [grant Can Use permission](https://docs.databricks.com/en/security/auth-authz/api-access-permissions.html#manage-token-permissions-using-the-admin-settings-page){:target="_blank"} to the Segment service principal created earlier. - -### Step 7: Generate a personal access token for the service principal (PAT only) - -Segment uses the personal access token to access the Databricks workspace API. The Databricks UI doesn't allow for the creation of service principal tokens. Tokens must be generated using either the Databricks workspace API (*recommended*) or the Databricks CLI. -Generating a token requires the following values: -- **Databricks Workspace URL**: The base URL to your Databricks workspace. -- **Workspace Admin Token**: The token generated for your Databricks admin user. -- **Service Principal Application ID**: The ID generated for the Segment service principal. -- **Lifetime Seconds**: The number of seconds before the token expires. Segment doesn't prescribe a specific token lifetime. Using the instructions below, you'll need to generate and update a new token in the Segment app before the existing token expires. Segment's general guidance is 90 days (7776000 seconds). -- **Comment**: A comment which describes the purpose of the token (for example, "Grants Segment access to this workspace until 12/21/2023"). -1. (*Recommended option*) To create the token with the API, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. For more information about the API check out the [Databricks API docs](https://docs.databricks.com/api/workspace/tokenmanagement/createobotoken){:target="_blank"}. -``` -curl --location -'/api/2.0/token-management/on-behalf-of/tokens' --header 'Content-Type: application/json' --header 'Authorization: Bearer ' --data '{"application_id": "", "lifetime_seconds": , "comment": ""}' -``` -The response from the API contains a `token_value` field. Note this value for later use. -2. (*Alternative option*) If you prefer to use the Databricks CLI, execute the following command in a terminal or command line tool. Be sure to update the placeholders with the relevant details from above. You will also need to [set up a profile](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html#databricks-personal-access-token-authentication){:target="_blank"} for the CLI. For more info, check out the [Databricks CLI docs](https://docs.databricks.com/en/dev-tools/cli/databricks-cli-ref.html){:target="_blank"}. - - ``` - databricks token-management create-obo-token - --comment -p - ``` -The response from the CLI will contain a `token_value` field. Note this value for later use. - -### Step 8: Create a new catalog in Unity Catalog and grant Segment permissions - -This catalog is the target catalog where Segment lands your schemas/tables. -1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. -2. Select the catalog you've just created. - 1. Select the Permissions tab, then click **Grant** - 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. - 3. Click **Grant**. - -### Step 9: Setup the Databricks Delta Lake destination in Segment - -This step links a Segment events source to your Databricks workspace/catalog. -1. From the Segment app, navigate to **Connections > Catalog**, then click **Destinations**. -2. Search for and select the "Databricks Delta Lake" destination. -2. Click **Add Destination**, select a source, then click **Next**. -3. Enter the name for your destination, then click **Create destination**. -4. Enter connection settings for the destination. - - - {% endcomment %} \ No newline at end of file From 8ba3b77f6e0371f902a5fa77583e287dcb717ce7 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Thu, 18 Jan 2024 16:46:47 -0800 Subject: [PATCH 04/34] Remove outdated azure doc --- .../databricks-delta-lake-azure.md | 137 ------------------ 1 file changed, 137 deletions(-) delete mode 100644 src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md deleted file mode 100644 index 7f8dcc4442..0000000000 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-delta-lake-azure.md +++ /dev/null @@ -1,137 +0,0 @@ ---- -title: Databricks Delta Lake Destination (Azure Setup) -hidden: true ---- - -{% comment %} - -With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. - -This page will help you use the Databricks Delta Lake Destination to sync Segment events into your Databricks Delta Lake built on Azure (ADLS Gen 2). - - -> info "Databricks Delta Lake Destination in Public Beta" -> The Databricks Delta Lake Destination is in public beta, and Segment is actively working on this integration. [Contact Segment](https://segment.com/help/contact/){:target="_blank"} with any feedback or questions. - -## Overview - -Before getting started, use the overview below to get up to familiarize yourself with Segment's Databricks Delta Lake Destination. - -1. Segment writes directly to your Delta Lake in the cloud storage (Azure) -- Segment manages the creation and evolution of Delta tables. -- Segment uses a cross-tenant service principal to write Delta tables to ADLS Gen2. -2. Segment supports both OAuth and personal access tokens (PAT) for API authentication. -3. Segment creates and updates the table's metadeta in Unity Catalog by running queries on a small, single node Databricks SQL warehouse in your environment. -4. If a table already exists and no new columns are introduced, Segment appends data to the table (no SQL required). -5. For new data types/columns, Segment reads the current schema for the table from the Unity Catalog and uses the SQL warehouse to update the schema accordingly. - -## Prerequisites - -Please note the following pre-requisites for setup. - -1. Your Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide for [enabling Unity Catalog](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/enable-workspaces){:target="_blank"} for more info. -2. You'll need the following permissions for setup: -- **Azure**: Ability to create service principals, as well as create and manage the destination storage container and its associated role assignments. -- **Databricks**: Admin access to the account and workspace level. - -## Key terms - -As you set up Databricks, keep the following key terms in mind. - -- **Databricks Workspace URL**: The base URL for your Databricks workspace. -- **Target Unity Catalog**: The catalog where Segment lands your data. - -## Set up Databricks Delta Lake (Azure) - -### Step 1: Find your Databricks Workspace URL - -You'll use the Databricks workspace URL, along with Segment, to access your workspace API. - -Check your browser's address bar when in your workspace. The workspace URL will look something like: `https://.azuredatabricks.net`. Remove any characters after this portion and note this value for later use. - -### Step 2: Add the Segment Storage Destinations service principal to your Entra ID (Active Directory) - -Segment uses the service principal to access your Databricks workspace APIs as well as your ADLS Gen2 storage container. You can use either Azure PowerShell or the Azure CLI. - -1. **Recommended**: Azure PowerShell - 1. Log in to the Azure console with a user allowed to add new service principals. - 2. Open a Cloud Shell (first button to the right of the top search bar). - 3. Once loaded, enter the following command in the shell: - - ``` - New-AzADServicePrincipal -applicationId fffa5b05-1da5-4599-8360-cc2684bcdefb - ``` - -2. **(Alternative option)** Azure CLI - 1. Log into the Azure CLI using the [az login command](https://learn.microsoft.com/en-us/cli/azure/authenticate-azure-cli){:target="_blank"}. - 2. Once authenticated, run the following command: - - ``` - az ad sp create --id fffa5b05-1da5-4599-8360-cc2684bcdefb - ``` - -### Step 3: Update or create an ADLS Gen2 storage container - -The ADLS Gen2 storage container is where Segment lands your Delta Lake files. - -1. In the Azure console, navigate to **Storage accounts** and locate or create a new storage account to use for your Segment data. -2. Select the account, then select **Containers**. -3. Select or create a target container. -4. On the container view, select **Access Control (IAM)**, then navigate to the Role assignments tab. -5. Click **+ Add**, then select **Add role assignment**. -6. Search for and select "Storage Blob Data Contributor", then click next. -7. For "Assign access to" select **User, group, or service principal**. -8. Click **+ Select members**, then search for and select "Segment Storage Destinations". -9. Click **Review + assign**. - -### Step 4: Add the Segment Storage Destinations service principal to the account/workspace - -This step allows Segment to access your workspace. -1. Follow the Databricks guide for [adding a service principal](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#add-service-principals-to-your-account-using-the-account-console){:target="_blank"} using the account console. -- Segment recommends using "Segment Storage Destinations" for the name, though any identifier is allowed. -- For the **UUID** enter `fffa5b05-1da5-4599-8360-cc2684bcdefb`. -- Segment doesn't require Account admin access. -2. Follow the Databricks guide for [adding a service principal to the workspace](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#assign-a-service-principal-to-a-workspace-using-the-account-console){:target="_blank"} -- Use the service principal created at the account level above. -- Segment doesn't require Workspace admin access. - -### Step 5: Enable entitlements for the service principal on the workspace - -This step allows the Segment service principal to create a small SQL warehouse for creating and updating table schemas in the Unity Catalog. - -To enable entitlements, follow the [managing workspace entitlements](https://learn.microsoft.com/en-us/azure/databricks/administration-guide/users-groups/service-principals#--manage-workspace-entitlements-for-a-service-principal){:target="_blank"} instructions for a service principal. Segment requires `Allow cluster creation` and `Databricks SQL access` entitlements. - -### Step 6: Create an external location and storage credentials - -This step creates the storage location where Segment lands your Delta Lake and the associated credentials Segment uses to access the storage. -1. Follow the Databricks guide for [managing external locations and storage credentials](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/manage-external-locations-and-credentials){:target="_blank"}. -- Use the storage container you updated in step 3. -- For storage credentials, you can use a service principal or managed identity. -2. Once you create the external location and storage credentials in your Databricks workspace, update the permissions to allow access to the Segment service principal.

-In your workspace, navigate to **Data > External Data > Storage Credentials**. Click the name of the credentials created above and go to the Permissions tab. Click **Grant**, then select the Segment service principal from the drop down. Select the following checkboxes: -- `CREATE EXTERNAL TABLE` -- `READ FILES` -- `WRITE FILES` -3. Click **Grant**. - -### Step 7: Create a new catalog in Unity Catalog and grant Segment permissions - -This catalog is the target catalog where Segment lands your schemas/tables. - -1. Follow the Databricks guide for [creating a catalog](https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/create-catalogs){:target="_blank"}. -- Select the storage location you created earlier. The catalog name can be any valid catalog name (for example, "Segment"). Note this name for later use. -2. Select the newly-created catalog. - 1. Click the Permissions tab, then click **Grant**. - 2. Select the Segment service principal from the dropdown. - 3. Check `ALL PRIVILEGES`, then click **Grant**. - -### Step 8: Setup the Databricks Delta Lake destination in Segment - -This step links a Segment source to your Databricks workspace/catalog. -1. From the Segment app, navigate to **Connections > Catalog**, then click **Destinations**. -2. Search for and select the "Databricks Delta Lake" destination. -2. Click **Add Destination**, select a source, then click **Next**. -3. Enter the name for your destination, then click **Create destination**. -4. Enter the connection settings using the values noted above (leave the Service Principal fields blank). - -{% endcomment %} \ No newline at end of file From b673bcaa123d47772da06f1d64a41151dc253f90 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Thu, 18 Jan 2024 16:47:26 -0800 Subject: [PATCH 05/34] Add Databricks Destination content --- .../catalog/databricks-delta-lake/index.md | 81 ++++++++++++++++++- 1 file changed, 77 insertions(+), 4 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks-delta-lake/index.md index a564dd12fd..21e95e1ff6 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ b/src/connections/storage/catalog/databricks-delta-lake/index.md @@ -1,7 +1,80 @@ --- -title: Databricks Delta Lake -redirect_from: - - '/connections/warehouses/catalog/databricks-delta-lake/' +title: Databricks Delta Lake Destination +public: true + --- -Setup docs coming soon! \ No newline at end of file + +With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. + +This page will help you get started with syncing Segment events into your Databricks Delta Lake Destination. + + +## Getting started + +Before getting started with the Databricks Destination, note the following prerequisites. + +- The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. +- Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. +- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. +- A SQL warehouse is required for compute. Segment recommends the following size: + - **Size**: small + - **Type** Serverless otherwise Pro + - **Clusters**: Minimum of 2 - Maximum of 6 + +> success "" +> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. + +## Set up Databricks in Segment + +Use the following steps to set up Databricks in Segment: + +1. Navigate to **Connections > Catalog**. +2. Select the **Destinations** tab. +3. Under Connection Type, select **Storage**, and click on the **Databricks storage** tile. +4. (Optional) Select a source(s) to connect to the destination. +5. Follow the steps below to [connect your Databricks warehouse](#connect-your-databricks-warehouse). + +## Connect your Databricks warehouse + +Use the five steps below to connect your Databricks warehouse. + +> warning "" +> You'll need read and write warehouse permissions for Segment to write to your database. + +### Step 1: Name your destination + +Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. + +### Step 2: Enter the Databricks compute resources URL + + +You'll use the Databricks workspace URL, along with Segment, to access your workspace API. + +Check your browser's address bar when inside the workspace. The workspace URL should resemble: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. + +### Step 3: Enter a Unity catalog name + +This catalog is the target catalog where Segment lands your schemas and tables. +1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +2. Select the catalog you've just created. + 1. Select the Permissions tab, then click **Grant** + 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. + 3. Click **Grant**. + +### Step 4: Add the SQL warehouse details from your Databricks warehouse + +Next, add SQL warehouse details about your compute resource. +- **HTTP Path**: The connection details for your SQL warehouse. +- **Port**: The port number of your SQL warehouse. + + +### Step 5: Add the Principal service client ID and Client secret + +Segment uses the service principal to access your Databricks workspace and associated APIs. +1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. +2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. + + +Once connected, you'll see a confirmation screen with next steps and more info on using your warehouse. + From 3e0b89ae3fe5c121b18d6547ad4d967b0ed8c35b Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Thu, 18 Jan 2024 16:53:21 -0800 Subject: [PATCH 06/34] Update warehouse yml file --- src/_data/catalog/warehouse.yml | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/_data/catalog/warehouse.yml b/src/_data/catalog/warehouse.yml index 321b83eed1..b506a09434 100644 --- a/src/_data/catalog/warehouse.yml +++ b/src/_data/catalog/warehouse.yml @@ -72,7 +72,24 @@ items: name: catalog/warehouses/databricks-delta-lake description: '' url: connections/storage/catalog/databricks-delta-lake - status: PRIVATE_BETA + status: PUBLIC + endpoints: + - us + regions: + - us + - eu + logo: + url: 'https://images.ctfassets.net/h6ufgtwb6nv1/4vYEAgYz6nGLC9F64Jxeb6/fcf3ecdae5386f806ae72eb39ce07094/db.svg?w=256&q=75' + mark: + url: '' + categories: + - Warehouses +- display_name: Databricks Profiles Sync + slug: databricks-profiles-sync + name: catalog/warehouses/databricks-delta-lake/databricks-profiles-sync + description: '' + url: connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync + status: PUBLIC endpoints: - us regions: From e5535c386dc332918dc82f946ea4a7e106e42d87 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:28:22 -0800 Subject: [PATCH 07/34] Add link to Profiles Sync doc --- src/unify/profiles-sync/index.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/unify/profiles-sync/index.md b/src/unify/profiles-sync/index.md index 3b9289ce77..184bc4539e 100644 --- a/src/unify/profiles-sync/index.md +++ b/src/unify/profiles-sync/index.md @@ -31,6 +31,7 @@ The following table shows the supported Profiles Sync warehouse Destinations and | [BigQuery](/docs/connections/storage/catalog/bigquery/) | 1. Create a project and enable BigQuery.
2. Create a service account for Segment. | | [Azure](/docs/connections/storage/catalog/azuresqldw/) | 1. Sign up for an Azure subscription.
2. Provision a dedicated SQL pool. | | [Postgres](/docs/connections/storage/catalog/postgres/) | 1. Follow the steps in the [Postgres getting started](/docs/connections/storage/catalog/postgres/) section. | +| [Databricks Profiles Sync](/docs/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync/) | 1. Follow the steps in the [Databricks Profiles Sync](/docs/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync/) section. | Once you’ve finished the required steps for your chosen warehouse, you’re ready to connect your warehouse to Segment. Because you’ll next enter credentials from the warehouse you just created, **leave the warehouse tab open to streamline setup.** From a5387e05683ea14a75cd19ce2c086a2c1d5025c1 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:31:55 -0800 Subject: [PATCH 08/34] Profiles Sync format fixes --- .../databricks-profiles-sync.md | 40 ++++++++++--------- 1 file changed, 22 insertions(+), 18 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index 2b2b1e68fc..9f38e0536c 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -3,22 +3,23 @@ title: Databricks Profiles Sync plan: unify --- -With Databricks Profiles Sync, you can use Profiles Sync to sync Segment profiles into your Databricks Lakehouse. +With Databricks Profiles Sync, you can use [Profiles Sync](/docs/unify/profiles-sync/overview/) to sync Segment profiles into your Databricks Lakehouse. + - ## Getting started -Before starting with the Databricks Profiles Sync destination, note the following prerequisites for setup. +Before getting started with Databricks Profiles Sync, note the following prerequisites for setup. - The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. -Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. +- Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. +- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. -- Segment uses the service principal to access your Databricks workspace and associated APIs. +#### Service principal requirements and setup + +Segment uses the service principal to access your Databricks workspace and associated APIs. - Use the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. -- The service principal needs the following setup: +The service principal needs the following setup: - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete the existing secret and create a new one. - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: - USE CATALOG @@ -30,13 +31,16 @@ Segment creates [managed tables](https://docs.databricks.com/en/data-governance/ - Databricks SQL access [entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. - CAN USE [permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. -- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. -- A SQL warehouse is required for compute. Segment recommends the following size: + + +#### Size and performance + +A SQL warehouse is required for compute. Segment recommends the following size: - **Size**: small - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 -- To improve the query performance of the Delta Lake, Segment recommends to create compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. +- To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. - If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. For a better experience, Segment recommends manually starting the warehouse in advance. @@ -51,34 +55,34 @@ Segment creates [managed tables](https://docs.databricks.com/en/data-governance/ ## Connect your Databricks warehouse -Use the following five steps to connect your Databricks warehouse. +Use the five steps below to connect your Databricks warehouse. > warning "" > To configure your warehouse, you'll need read and write permissions. ### Step 1: Name your destination -Add a name to help you identify this warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. +Add a name to help you identify your warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. ### Step 2: Enter the Databricks compute resources URL You'll use the Databricks workspace URL, along with Segment, to access your workspace API. -Check your browser's address bar when inside the workspace. The workspace URL will look something like: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. +Check your browser's address bar when inside the workspace. The workspace URL should resemble: `https://.cloud.databricks.com`. Remove any characters after this portion and note the URL for later use. ### Step 3: Enter a Unity catalog name -This catalog is the target catalog where Segment lands your schemasand tablestables. +This catalog is the target catalog where Segment lands your schemas and tables. 1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. 2. Select the catalog you've just created. - 1. Select the Permissions tab, then click **Grant** + 1. Select the Permissions tab, then click **Grant**. 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. 3. Click **Grant**. ### Step 4: Add the SQL warehouse details from your Databricks warehouse Next, add SQL warehouse details about your compute resource. -- **HTTP Path**: The connection details for your SQL warehouse +- **HTTP Path**: The connection details for your SQL warehouse. - **Port**: The port number of your SQL warehouse. @@ -100,6 +104,6 @@ Select tables to sync, then click **Next**. Segment creates the warehouse and co You can view sync status, and the tables you're syncing from the Profiles Sync overview page. -Learn more about [using Selective Sync](/docs/unify/profiles-sync/using-selective-sync) with Profiles Sync. +Learn more about [using selective sync](/docs/unify/profiles-sync/#using-selective-sync) with Profiles Sync. From 9f3e0441545cb2727226500e066e3bc393d9eb15 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:34:51 -0800 Subject: [PATCH 09/34] Add warehouse size section --- .../catalog/databricks-delta-lake/databricks-profiles-sync.md | 4 ++-- .../storage/catalog/databricks-delta-lake/index.md | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index 9f38e0536c..6a16ee87cc 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -33,7 +33,7 @@ The service principal needs the following setup: -#### Size and performance +#### Warehouse size and performance A SQL warehouse is required for compute. Segment recommends the following size: - **Size**: small @@ -42,7 +42,7 @@ A SQL warehouse is required for compute. Segment recommends the following size: - To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. -- If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. For a better experience, Segment recommends manually starting the warehouse in advance. +- Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. ## Set up Databricks for Profiles Sync diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks-delta-lake/index.md index 21e95e1ff6..38cc156966 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ b/src/connections/storage/catalog/databricks-delta-lake/index.md @@ -17,6 +17,9 @@ Before getting started with the Databricks Destination, note the following prere - The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. - Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. + +#### Warehouse size + - A SQL warehouse is required for compute. Segment recommends the following size: - **Size**: small - **Type** Serverless otherwise Pro From 616d58463519432c6d64970453a9fb2335e14e17 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:35:13 -0800 Subject: [PATCH 10/34] [netlify-build] --- src/connections/storage/catalog/databricks-delta-lake/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks-delta-lake/index.md index 38cc156966..744e52e3ee 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ b/src/connections/storage/catalog/databricks-delta-lake/index.md @@ -9,7 +9,7 @@ With the Databricks Delta Lake Destination, you can ingest event data from Segme This page will help you get started with syncing Segment events into your Databricks Delta Lake Destination. - + ## Getting started Before getting started with the Databricks Destination, note the following prerequisites. From b6bd2d07f2509c3450b8d13141e62071a3ff94cc Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:43:45 -0800 Subject: [PATCH 11/34] Update links in Databricks --- .../storage/catalog/databricks-delta-lake/index.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks-delta-lake/index.md index 744e52e3ee..533baf8025 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ b/src/connections/storage/catalog/databricks-delta-lake/index.md @@ -16,11 +16,11 @@ Before getting started with the Databricks Destination, note the following prere - The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. -- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. +- Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. #### Warehouse size -- A SQL warehouse is required for compute. Segment recommends the following size: +A SQL warehouse is required for compute. Segment recommends the following size: - **Size**: small - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 @@ -59,7 +59,7 @@ Check your browser's address bar when inside the workspace. The workspace URL sh ### Step 3: Enter a Unity catalog name This catalog is the target catalog where Segment lands your schemas and tables. -1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +1. Follow the [Databricks guide for creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. 2. Select the catalog you've just created. 1. Select the Permissions tab, then click **Grant** 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. @@ -75,8 +75,8 @@ Next, add SQL warehouse details about your compute resource. ### Step 5: Add the Principal service client ID and Client secret Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. +2. (*OAuth only*) Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. Once connected, you'll see a confirmation screen with next steps and more info on using your warehouse. From 29ec0a99b4eb28f7ffd0fd27da0723addf5230dc Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:45:24 -0800 Subject: [PATCH 12/34] Fix PS links --- .../databricks-profiles-sync.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index 6a16ee87cc..4971a3bba6 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -12,12 +12,12 @@ Before getting started with Databricks Profiles Sync, note the following prerequ - The target Databricks workspace must be Unity Catalog enabled. Segment doesn't support the Hive metastore. Visit the Databricks guide [enabling the Unity Catalog](https://docs.databricks.com/en/data-governance/unity-catalog/enable-workspaces.html){:target="_blank"} for more information. - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. -- Segment supports only OAuth [(M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. +- Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. #### Service principal requirements and setup Segment uses the service principal to access your Databricks workspace and associated APIs. - - Use the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. + - Use the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. The service principal needs the following setup: - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete the existing secret and create a new one. @@ -28,8 +28,8 @@ The service principal needs the following setup: - SELECT - CREATE SCHEMA - CREATE TABLE - - Databricks SQL access [entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. - - CAN USE [permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. + - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. + - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. @@ -73,7 +73,7 @@ Check your browser's address bar when inside the workspace. The workspace URL sh ### Step 3: Enter a Unity catalog name This catalog is the target catalog where Segment lands your schemas and tables. -1. Follow the Databricks guide for [creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. +1. Follow the [Databricks guide for creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. 2. Select the catalog you've just created. 1. Select the Permissions tab, then click **Grant**. 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. @@ -89,8 +89,8 @@ Next, add SQL warehouse details about your compute resource. ### Step 5: Add the principal service client ID and client secret Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the Databricks guide for [adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the Databricks instructions to [generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. +2. (*OAuth only*) Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. Once you've configured your warehouse, test the connection and click **Next**. From 5ae6d2f490a5e1c649eb74dfd1e853795cf7808b Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 09:52:05 -0800 Subject: [PATCH 13/34] More Profiles Sync formatting --- .../databricks-delta-lake/databricks-profiles-sync.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index 4971a3bba6..a25428dbf7 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -40,9 +40,10 @@ A SQL warehouse is required for compute. Segment recommends the following size: - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 -- To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. +To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. -- Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. +> success "" +> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. ## Set up Databricks for Profiles Sync From 28bacfdb460122483f84bcd442f9dca8ca11046f Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 19 Jan 2024 10:09:14 -0800 Subject: [PATCH 14/34] [netlify-build] --- .../catalog/databricks-delta-lake/databricks-profiles-sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md index a25428dbf7..7b93b3e1df 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md @@ -45,7 +45,7 @@ To improve the query performance of the Delta Lake, Segment recommends creating > success "" > Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. - + ## Set up Databricks for Profiles Sync 1. From your Segment app, navigate to **Unify > Profiles Sync**. From 2774a65a67599933dcc123f13a23e99095f7c5e5 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 22 Jan 2024 12:22:42 -0800 Subject: [PATCH 15/34] Move Profiles Sync setup doc --- src/unify/profiles-sync/{ => profiles-sync-setup}/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename src/unify/profiles-sync/{ => profiles-sync-setup}/index.md (97%) diff --git a/src/unify/profiles-sync/index.md b/src/unify/profiles-sync/profiles-sync-setup/index.md similarity index 97% rename from src/unify/profiles-sync/index.md rename to src/unify/profiles-sync/profiles-sync-setup/index.md index 184bc4539e..83e2dea7fe 100644 --- a/src/unify/profiles-sync/index.md +++ b/src/unify/profiles-sync/profiles-sync-setup/index.md @@ -31,7 +31,7 @@ The following table shows the supported Profiles Sync warehouse Destinations and | [BigQuery](/docs/connections/storage/catalog/bigquery/) | 1. Create a project and enable BigQuery.
2. Create a service account for Segment. | | [Azure](/docs/connections/storage/catalog/azuresqldw/) | 1. Sign up for an Azure subscription.
2. Provision a dedicated SQL pool. | | [Postgres](/docs/connections/storage/catalog/postgres/) | 1. Follow the steps in the [Postgres getting started](/docs/connections/storage/catalog/postgres/) section. | -| [Databricks Profiles Sync](/docs/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync/) | 1. Follow the steps in the [Databricks Profiles Sync](/docs/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync/) section. | +| [Databricks](/docs/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync/) | 1. Follow the steps in the [Databricks for Profiles Sync](/docs/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync/) guide. | Once you’ve finished the required steps for your chosen warehouse, you’re ready to connect your warehouse to Segment. Because you’ll next enter credentials from the warehouse you just created, **leave the warehouse tab open to streamline setup.** From 947f9ec5907712103af2545300d968671191b4fa Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 22 Jan 2024 12:23:32 -0800 Subject: [PATCH 16/34] Move Databricks for Profiles Sync --- .../profiles-sync-setup}/databricks-profiles-sync.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) rename src/{connections/storage/catalog/databricks-delta-lake => unify/profiles-sync/profiles-sync-setup}/databricks-profiles-sync.md (94%) diff --git a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md similarity index 94% rename from src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md rename to src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 7b93b3e1df..e379af1de3 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -1,9 +1,9 @@ --- -title: Databricks Profiles Sync +title: Databricks for Profiles Sync plan: unify --- -With Databricks Profiles Sync, you can use [Profiles Sync](/docs/unify/profiles-sync/overview/) to sync Segment profiles into your Databricks Lakehouse. +With Databricks for Profiles Sync, you can use [Profiles Sync](/docs/unify/profiles-sync/overview/) to sync Segment profiles into your Databricks Lakehouse. ## Getting started @@ -61,9 +61,9 @@ Use the five steps below to connect your Databricks warehouse. > warning "" > To configure your warehouse, you'll need read and write permissions. -### Step 1: Name your destination +### Step 1: Name your schema -Add a name to help you identify your warehouse in Segment. You can change this name at any time by navigating to the destination settings (**Connections > Destinations > Settings**) page. +Pick a name to help you identify this space in the warehouse, or use the default name provided. You can't change this name once the warehouse is connected. ### Step 2: Enter the Databricks compute resources URL From 1a03743da005b7ace7d136ccfd844763252c77b4 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 22 Jan 2024 12:28:12 -0800 Subject: [PATCH 17/34] Add PM feedback --- .../{databricks-delta-lake => databricks}/index.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) rename src/connections/storage/catalog/{databricks-delta-lake => databricks}/index.md (93%) diff --git a/src/connections/storage/catalog/databricks-delta-lake/index.md b/src/connections/storage/catalog/databricks/index.md similarity index 93% rename from src/connections/storage/catalog/databricks-delta-lake/index.md rename to src/connections/storage/catalog/databricks/index.md index 533baf8025..0e465c7876 100644 --- a/src/connections/storage/catalog/databricks-delta-lake/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -1,13 +1,16 @@ --- -title: Databricks Delta Lake Destination +title: Databricks Destination public: true --- +{% include content/warehouse-ip.html %} +With the Databricks Destination, you can ingest event data directly from Segment into your Databricks Lakehouse. -With the Databricks Delta Lake Destination, you can ingest event data from Segment into the bronze layer of your Databricks Delta Lake. +This page will help you get started with syncing Segment events into your Databricks Destination. -This page will help you get started with syncing Segment events into your Databricks Delta Lake Destination. +> success "" +> Segment has certified the destination for Databricks on AWS and Databricks on Azure. ## Getting started From 9bf2b8bf0a54e264bd60d6af8b0d6bde196c5109 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 22 Jan 2024 12:28:32 -0800 Subject: [PATCH 18/34] Update nav and catalog --- src/_data/catalog/warehouse.yml | 25 ++++--------------------- src/_data/sidenav/main.yml | 9 +++++++-- 2 files changed, 11 insertions(+), 23 deletions(-) diff --git a/src/_data/catalog/warehouse.yml b/src/_data/catalog/warehouse.yml index b506a09434..2a7f71eae1 100644 --- a/src/_data/catalog/warehouse.yml +++ b/src/_data/catalog/warehouse.yml @@ -67,28 +67,11 @@ items: url: 'https://cdn.filepicker.io/api/file/Vk6iFlMvQeynbg30ZEtt' categories: - Warehouses -- display_name: Databricks Delta Lake - slug: databricks-delta-lake - name: catalog/warehouses/databricks-delta-lake +- display_name: Databricks Destination + slug: databricks-destination + name: catalog/warehouses/databricks description: '' - url: connections/storage/catalog/databricks-delta-lake - status: PUBLIC - endpoints: - - us - regions: - - us - - eu - logo: - url: 'https://images.ctfassets.net/h6ufgtwb6nv1/4vYEAgYz6nGLC9F64Jxeb6/fcf3ecdae5386f806ae72eb39ce07094/db.svg?w=256&q=75' - mark: - url: '' - categories: - - Warehouses -- display_name: Databricks Profiles Sync - slug: databricks-profiles-sync - name: catalog/warehouses/databricks-delta-lake/databricks-profiles-sync - description: '' - url: connections/storage/catalog/databricks-delta-lake/databricks-profiles-sync + url: connections/storage/catalog/databricks status: PUBLIC endpoints: - us diff --git a/src/_data/sidenav/main.yml b/src/_data/sidenav/main.yml index ece0eef513..ac3e30e7f4 100644 --- a/src/_data/sidenav/main.yml +++ b/src/_data/sidenav/main.yml @@ -316,8 +316,13 @@ sections: section: - path: /unify/profiles-sync/overview title: Profiles Sync Overview - - path: /unify/profiles-sync - title: Setup + - section_title: Profiles Sync Setup + slug: unify/profiles-sync/profiles-sync-setup + section: + - path: /unify/profiles-sync/profiles-sync-setup + title: Setup + - path: /unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync + title: Databricks for Profiles Sync - path: /unify/profiles-sync/sample-queries title: Sample Queries - path: /unify/profiles-sync/tables From 4f3d859bfc4bc60c9a28b7a7e2bf67411006e2a7 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 22 Jan 2024 12:29:07 -0800 Subject: [PATCH 19/34] [netlify-build] --- src/unify/profiles-sync/profiles-sync-setup/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/index.md b/src/unify/profiles-sync/profiles-sync-setup/index.md index 83e2dea7fe..5643d45ff8 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/index.md +++ b/src/unify/profiles-sync/profiles-sync-setup/index.md @@ -17,7 +17,7 @@ Before you begin, prepare for setup with these tips: - To connect your warehouse to Segment, you must have read and write permissions with the warehouse Destination you choose. - During Step 2, you’ll copy credentials between Segment and your warehouse Destination. To streamline setup, open your Segment workspace in one browser tab and open another with your warehouse account. - Make sure to copy any IP addresses Segment asks you to allowlist in your warehouse Destination. - + ### Step 1: Create a warehouse You’ll first choose the Destination warehouse to which Segment will sync profiles. Profiles Sync supports the Snowflake, Redshift, BigQuery, Azure, and Postgres warehouse Destinations. Your initial setup will depend on the warehouse you choose. From 2eca2b9641ac3a487c09c87d68494fcf155ac71c Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Tue, 23 Jan 2024 08:50:13 -0800 Subject: [PATCH 20/34] Add redirect --- src/unify/profiles-sync/profiles-sync-setup/index.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/unify/profiles-sync/profiles-sync-setup/index.md b/src/unify/profiles-sync/profiles-sync-setup/index.md index 5643d45ff8..f0a2a52f04 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/index.md +++ b/src/unify/profiles-sync/profiles-sync-setup/index.md @@ -1,6 +1,8 @@ --- title: Profiles Sync Setup plan: unify +redirect_from: + - '/unify/profiles-sync/' --- On this page, you’ll learn how to set up Profiles Sync, enable historical backfill, and adjust settings for warehouses that you’ve connected to Profiles Sync. From 78d2e21a65c17bc72638373797b690c126678ac4 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Wed, 24 Jan 2024 10:55:10 -0800 Subject: [PATCH 21/34] EM feedback --- .../databricks-profiles-sync.md | 14 ++++++++++---- 1 file changed, 10 insertions(+), 4 deletions(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index e379af1de3..6c863ee8d4 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -40,10 +40,12 @@ A SQL warehouse is required for compute. Segment recommends the following size: - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 -To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. > success "" -> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. +> To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}.
+ +> info "" +> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. ## Set up Databricks for Profiles Sync @@ -89,9 +91,13 @@ Next, add SQL warehouse details about your compute resource. ### Step 5: Add the principal service client ID and client secret +> warning "" +> Be sure to note the principal ID and the client secret Databricks generates in this step for later use. + Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the principal application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. +2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. + Once you've configured your warehouse, test the connection and click **Next**. From b2b4b5c869e27fab50c39c9fe97310b93b9da797 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Wed, 24 Jan 2024 13:11:13 -0800 Subject: [PATCH 22/34] More feedback from Profiles Sync EM --- .../databricks-profiles-sync.md | 39 +++++++------------ 1 file changed, 15 insertions(+), 24 deletions(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 6c863ee8d4..589b3c1933 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -14,25 +14,6 @@ Before getting started with Databricks Profiles Sync, note the following prerequ - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. - Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. -#### Service principal requirements and setup - -Segment uses the service principal to access your Databricks workspace and associated APIs. - - Use the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the Application ID that Databricks generates for later use. Segment doesn't require `Account admin` or `Marketplace admin` roles. - -The service principal needs the following setup: - - OAuth secret tocken generated. Follow the [Databricks guide for generating an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from the page the secret is no longer visible. If you lose or forget the secret, you can delete the existing secret and create a new one. - - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: - - USE CATALOG - - USE SCHEMA - - MODIFY - - SELECT - - CREATE SCHEMA - - CREATE TABLE - - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. - - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. - - - #### Warehouse size and performance A SQL warehouse is required for compute. Segment recommends the following size: @@ -42,7 +23,7 @@ A SQL warehouse is required for compute. Segment recommends the following size: > success "" -> To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}.
+> To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. > info "" > Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. @@ -89,15 +70,25 @@ Next, add SQL warehouse details about your compute resource. - **Port**: The port number of your SQL warehouse. -### Step 5: Add the principal service client ID and client secret +### Step 5: Add the service principal client ID and client secret > warning "" -> Be sure to note the principal ID and the client secret Databricks generates in this step for later use. +> Be sure to note the principal ID and the client secret Databricks generates, as you'll need to enter them in this step. Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the principal application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the principal application ID that Databricks generates to use in this step. Segment doesn't require `Account admin` or `Marketplace admin` roles. +2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks to use in this step. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +The service principal needs the following setup: + - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: + - USE CATALOG + - USE SCHEMA + - MODIFY + - SELECT + - CREATE SCHEMA + - CREATE TABLE + - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. + - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. Once you've configured your warehouse, test the connection and click **Next**. From bb654441d8227aaa4f2a4f0cea88468de8b882fa Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Wed, 24 Jan 2024 13:11:51 -0800 Subject: [PATCH 23/34] Add PS feedback to Databricks Destination --- src/connections/storage/catalog/databricks/index.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 0e465c7876..397099c292 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -29,8 +29,8 @@ A SQL warehouse is required for compute. Segment recommends the following size: - **Clusters**: Minimum of 2 - Maximum of 6 > success "" -> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection when you hit the **Test Connection** button during setup. - +> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. + ## Set up Databricks in Segment Use the following steps to set up Databricks in Segment: @@ -75,11 +75,14 @@ Next, add SQL warehouse details about your compute resource. - **Port**: The port number of your SQL warehouse. -### Step 5: Add the Principal service client ID and Client secret +### Step 5: Add the service principal client ID and client secret + +> warning "" +> Be sure to note the principal ID and the client secret Databricks generates, as you'll need to enter them in this step. Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the Application ID that Databricks generates for later use. Segment doesn't require Account admin or Marketplace admin roles. -2. (*OAuth only*) Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks for later use. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. +1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the principal application ID that Databricks generates to use in this step. Segment doesn't require Account admin or Marketplace admin roles. +2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks to use in this step. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. Once connected, you'll see a confirmation screen with next steps and more info on using your warehouse. From 68b28fa7051bed1defdf49303b4c7510d4df7644 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Wed, 24 Jan 2024 14:29:09 -0800 Subject: [PATCH 24/34] More PM feedback --- src/connections/storage/catalog/databricks/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 397099c292..a1522567cc 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -7,7 +7,7 @@ public: true With the Databricks Destination, you can ingest event data directly from Segment into your Databricks Lakehouse. -This page will help you get started with syncing Segment events into your Databricks Destination. +This page will help you get started with syncing Segment events into your Databricks Lakehouse. > success "" > Segment has certified the destination for Databricks on AWS and Databricks on Azure. From 1c316dacc2c3d6c4df5e0c619175a98e497d9700 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Thu, 25 Jan 2024 08:43:11 -0800 Subject: [PATCH 25/34] [netlify-build] --- .../profiles-sync-setup/databricks-profiles-sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 589b3c1933..88e01eeffa 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -4,7 +4,7 @@ plan: unify --- With Databricks for Profiles Sync, you can use [Profiles Sync](/docs/unify/profiles-sync/overview/) to sync Segment profiles into your Databricks Lakehouse. - + ## Getting started From 517c193756ed218b55fb6e72298a99f2eeca4798 Mon Sep 17 00:00:00 2001 From: rchinn-segment <93161299+rchinn-segment@users.noreply.github.com> Date: Thu, 25 Jan 2024 17:31:35 -0800 Subject: [PATCH 26/34] Apply suggestions from code review Co-authored-by: forstisabella <92472883+forstisabella@users.noreply.github.com> --- src/connections/storage/catalog/databricks/index.md | 10 +++++----- .../profiles-sync-setup/databricks-profiles-sync.md | 8 ++++---- 2 files changed, 9 insertions(+), 9 deletions(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index a1522567cc..23a05a50ff 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -21,15 +21,15 @@ Before getting started with the Databricks Destination, note the following prere - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. - Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. -#### Warehouse size +### Warehouse size -A SQL warehouse is required for compute. Segment recommends the following size: +A SQL warehouse is required for compute. Segment recommends a warehouse with the following characteristics: - **Size**: small - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 > success "" -> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. +> Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. ## Set up Databricks in Segment @@ -43,7 +43,7 @@ Use the following steps to set up Databricks in Segment: ## Connect your Databricks warehouse -Use the five steps below to connect your Databricks warehouse. +Use the five steps below to connect to your Databricks warehouse. > warning "" > You'll need read and write warehouse permissions for Segment to write to your database. @@ -64,7 +64,7 @@ Check your browser's address bar when inside the workspace. The workspace URL sh This catalog is the target catalog where Segment lands your schemas and tables. 1. Follow the [Databricks guide for creating a catalog](https://docs.databricks.com/en/data-governance/unity-catalog/create-catalogs.html#create-a-catalog){:target="_blank"}. Be sure to select the storage location created earlier. You can use any valid catalog name (for example, "Segment"). Note this name for later use. 2. Select the catalog you've just created. - 1. Select the Permissions tab, then click **Grant** + 1. Select the Permissions tab, then click **Grant**. 2. Select the Segment service principal from the dropdown, and check `ALL PRIVILEGES`. 3. Click **Grant**. diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 88e01eeffa..673f86e0ef 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -14,9 +14,9 @@ Before getting started with Databricks Profiles Sync, note the following prerequ - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. - Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. -#### Warehouse size and performance +### Warehouse size and performance -A SQL warehouse is required for compute. Segment recommends the following size: +A SQL warehouse is required for compute. Segment recommends a warehouse with the the following characteristics: - **Size**: small - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6 @@ -26,7 +26,7 @@ A SQL warehouse is required for compute. Segment recommends the following size: > To improve the query performance of the Delta Lake, Segment recommends creating compact jobs per table using OPTIMIZE following [Databricks recommendations](https://docs.databricks.com/en/delta/optimize.html#){:target="_blank"}. > info "" -> Segment recommends manually starting your SQL warehouse in advance. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. +> Segment recommends manually starting your SQL warehouse before setting up your Databricks destination. If the SQL warehouse isn't running, Segment attempts to start the SQL warehouse to validate the connection and may experience a timeout when you hit the **Test Connection** button during setup. ## Set up Databricks for Profiles Sync @@ -39,7 +39,7 @@ A SQL warehouse is required for compute. Segment recommends the following size: ## Connect your Databricks warehouse -Use the five steps below to connect your Databricks warehouse. +Use the five steps below to connect to your Databricks warehouse. > warning "" > To configure your warehouse, you'll need read and write permissions. From 7638bfbbfa05e973d51eec36a3822204fc9d7bdc Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 26 Jan 2024 09:13:30 -0800 Subject: [PATCH 27/34] re-organize step 5 --- .../profiles-sync-setup/databricks-profiles-sync.md | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 673f86e0ef..9f578136c6 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -72,12 +72,11 @@ Next, add SQL warehouse details about your compute resource. ### Step 5: Add the service principal client ID and client secret -> warning "" -> Be sure to note the principal ID and the client secret Databricks generates, as you'll need to enter them in this step. - Segment uses the service principal to access your Databricks workspace and associated APIs. -1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Note the principal application ID that Databricks generates to use in this step. Segment doesn't require `Account admin` or `Marketplace admin` roles. -2. Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Note the secret generated by Databricks to use in this step. Once you navigate away from this page, the secret is no longer visible. If you lose or forget the secret, delete the existing secret and create a new one. + +#### Service principal client ID + +Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require `Account admin` or `Marketplace admin` roles. The service principal needs the following setup: - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: @@ -90,6 +89,10 @@ The service principal needs the following setup: - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. +#### Client secret + +Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. + Once you've configured your warehouse, test the connection and click **Next**. From 97d487344f093af229ab040d3c46bdd5c9a65930 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 26 Jan 2024 09:44:09 -0800 Subject: [PATCH 28/34] Fix format --- .../profiles-sync-setup/databricks-profiles-sync.md | 7 ++----- 1 file changed, 2 insertions(+), 5 deletions(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 9f578136c6..406134a73b 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -74,9 +74,7 @@ Next, add SQL warehouse details about your compute resource. Segment uses the service principal to access your Databricks workspace and associated APIs. -#### Service principal client ID - -Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require `Account admin` or `Marketplace admin` roles. +**Service principal client ID**: Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require `Account admin` or `Marketplace admin` roles. The service principal needs the following setup: - [Catalog level priveleges](https://docs.databricks.com/en/data-governance/unity-catalog/manage-privileges/privileges.html#general-unity-catalog-privilege-types){:target="_blank"} which include: @@ -89,9 +87,8 @@ The service principal needs the following setup: - Databricks [SQL access entitlement](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-workspace-entitlements-for-a-service-principal){:target="_blank"} at the workspace level. - [CAN USE permissions](https://docs.databricks.com/en/security/auth-authz/access-control/sql-endpoint-acl.html#sql-warehouse-permissions){:target="_blank"} on the SQL warehouse that will be used for the sync. -#### Client secret -Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. +**Client secret**: Follow the [Databricks instructions to generate an OAuth secret](https://docs.databricks.com/en/dev-tools/authentication-oauth.html#step-2-create-an-oauth-secret-for-a-service-principal){:target="_blank"}. Once you've configured your warehouse, test the connection and click **Next**. From 283346e21f23617d01840ca2027e2068172db886 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 5 Feb 2024 13:45:16 -0800 Subject: [PATCH 29/34] Update catalog tile --- src/_data/catalog/warehouse.yml | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/_data/catalog/warehouse.yml b/src/_data/catalog/warehouse.yml index 2a7f71eae1..02998bd50a 100644 --- a/src/_data/catalog/warehouse.yml +++ b/src/_data/catalog/warehouse.yml @@ -67,8 +67,8 @@ items: url: 'https://cdn.filepicker.io/api/file/Vk6iFlMvQeynbg30ZEtt' categories: - Warehouses -- display_name: Databricks Destination - slug: databricks-destination +- display_name: Databricks + slug: databricks name: catalog/warehouses/databricks description: '' url: connections/storage/catalog/databricks From e6d989de8fece3792f0de44bdae9526434e1ee0b Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Mon, 5 Feb 2024 13:59:58 -0800 Subject: [PATCH 30/34] [netlify-build] --- .../profiles-sync-setup/databricks-profiles-sync.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md index 406134a73b..d5313b3af3 100644 --- a/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md +++ b/src/unify/profiles-sync/profiles-sync-setup/databricks-profiles-sync.md @@ -71,7 +71,7 @@ Next, add SQL warehouse details about your compute resource. ### Step 5: Add the service principal client ID and client secret - + Segment uses the service principal to access your Databricks workspace and associated APIs. **Service principal client ID**: Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Profiles Sync"). Segment doesn't require `Account admin` or `Marketplace admin` roles. From 2dfc592e0d6b71f5b7d75e76237a46be30b04a0b Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Tue, 6 Feb 2024 16:59:57 -0800 Subject: [PATCH 31/34] Small updates --- src/connections/storage/catalog/databricks/index.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 23a05a50ff..89ec2d5e3a 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -10,7 +10,7 @@ With the Databricks Destination, you can ingest event data directly from Segment This page will help you get started with syncing Segment events into your Databricks Lakehouse. > success "" -> Segment has certified the destination for Databricks on AWS and Databricks on Azure. +> Segment has certified the destination for Databricks on AWS, Azure, and GCP. ## Getting started @@ -21,6 +21,9 @@ Before getting started with the Databricks Destination, note the following prere - Segment creates [managed tables](https://docs.databricks.com/en/data-governance/unity-catalog/create-tables.html#managed-tables){:target="_blank"} in the Unity catalog. The service account needs access to create schemas on the catalog and can delete, drop, or vacuum tables. - Segment supports only [OAuth (M2M)](https://docs.databricks.com/en/dev-tools/auth/oauth-m2m.html){:target="_blank"} for authentication. +> success "" +> Segment recommends that you enable Warehouse Selective Sync. This feature enables customization of collections and properties sent to the warehouse. By syncing only relevant and required data, it reduces sync duration and compute costs, optimizing efficiency compared to syncing everything. Learn more about [Warehouse Selective Sync](/docs/connections/storage/warehouses/warehouse-syncs/#warehouse-selective-sync). + ### Warehouse size A SQL warehouse is required for compute. Segment recommends a warehouse with the following characteristics: From b77f3a78e0de446a18dce628f81f64c8eed740c9 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Fri, 9 Feb 2024 15:41:56 -0800 Subject: [PATCH 32/34] Remove GCP --- src/connections/storage/catalog/databricks/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 89ec2d5e3a..0032f76ca4 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -10,7 +10,7 @@ With the Databricks Destination, you can ingest event data directly from Segment This page will help you get started with syncing Segment events into your Databricks Lakehouse. > success "" -> Segment has certified the destination for Databricks on AWS, Azure, and GCP. +> Segment has certified the destination for Databricks on AWS and Azure. ## Getting started From ae58cbe1100fdaa058a249f85ace16589b98ae96 Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Tue, 13 Feb 2024 12:12:32 -0800 Subject: [PATCH 33/34] Add suggestions from code review --- src/connections/storage/catalog/databricks/index.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 0032f76ca4..836ba34181 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -78,10 +78,10 @@ Next, add SQL warehouse details about your compute resource. - **Port**: The port number of your SQL warehouse. -### Step 5: Add the service principal client ID and client secret +### Step 5: Add the service principal client ID and OAuth secret > warning "" -> Be sure to note the principal ID and the client secret Databricks generates, as you'll need to enter them in this step. +> Be sure to note the principal ID and the OAuth secret Databricks generates, as you'll need to enter them in this step. Segment uses the service principal to access your Databricks workspace and associated APIs. 1. Follow the [Databricks guide for adding a service principal to your account](https://docs.databricks.com/en/administration-guide/users-groups/service-principals.html#manage-service-principals-in-your-account){:target="_blank"}. This name can be anything, but Segment recommends something that identifies the purpose (for example, "Segment Storage Destinations"). Note the principal application ID that Databricks generates to use in this step. Segment doesn't require Account admin or Marketplace admin roles. From 4ea06d148e113acf0878c513a9a3786c9c6e9c7f Mon Sep 17 00:00:00 2001 From: rchinn-segment Date: Tue, 13 Feb 2024 12:27:02 -0800 Subject: [PATCH 34/34] Add additional link --- src/connections/storage/catalog/databricks/index.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/connections/storage/catalog/databricks/index.md b/src/connections/storage/catalog/databricks/index.md index 836ba34181..c447425b0e 100644 --- a/src/connections/storage/catalog/databricks/index.md +++ b/src/connections/storage/catalog/databricks/index.md @@ -26,7 +26,7 @@ Before getting started with the Databricks Destination, note the following prere ### Warehouse size -A SQL warehouse is required for compute. Segment recommends a warehouse with the following characteristics: +A [SQL warehouse is required](https://docs.databricks.com/en/compute/sql-warehouse/warehouse-behavior.html#sizing-a-serverless-sql-warehouse){:target="_blank"} for compute. Segment recommends a warehouse with the following characteristics: - **Size**: small - **Type** Serverless otherwise Pro - **Clusters**: Minimum of 2 - Maximum of 6