Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
39 views71 pages

De Mod 6 Manage Data Access For Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
39 views71 pages

De Mod 6 Manage Data Access For Analytics

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Manage Data

Access for
Analytics

©2023 Databricks Inc. — All rights reserved 1


Module Objectives
By the end of this course, you will be able to:

1. Describe Unity Catalog key concepts and how it integrates with the
Databricks platform
2. Access Unity Catalog through clusters and SQL warehouses
3. Create and govern data assets in Unity Catalog
4. Adopt Databricks recommendations into your organization’s Unity
Catalog-based solutions

©2023 Databricks Inc. — All rights reserved 2


Module Agenda
Manage Data Access for Analytics with Unity Catalog
Introduction to Unity Catalog
Compute Resources in Unity Catalog
DE 6.1 - Introduction to Unity Catalog
DE 6.8 - Compute Resources
DE 6.2 - Overview of Data Governance
DE 6.9 - Creating Compute Resources
DE 6.3 - Unity Catalog Key Concepts

DE 6.4 - Unity Catalog Architecture

DE 6.5 - Unity Catalog Identities

DE 6.6 - Managing Principals in Unity Catalog

DE 6.7 - Managing Catalog Metastores

Data Access Control in Unity Catalog Unity Catalog Best Practices

DE 6.10 - Data Access Control in Databricks DE 6.16 - Best Practices

DE 6.11 - Security Model DE 6.17 - Data Segregation

DE 6.12 - External Storage DE 6.18 - Identity Management

DE 6.13 - Creating and Governing Data DE 6.19 - External Storage

DE 6.14 - Create and Share Tables DE 6.20 - Upgrade a Table to Unity Catalog

DE 6.15 - Create External Tables DE 6.21 - Create Views and Limiting Table Access

©2023 Databricks Inc. — All rights reserved 3


Introduction to Unity
Catalog

©2023 Databricks Inc. — All rights reserved 4


Overview of
Data Governance

©2023 Databricks Inc. — All rights reserved 5


80% of organizations seeking to scale digital
business will fail because they do not take a modern
approach to data and analytics governance

Source: Gartner

©2023 Databricks Inc. — All rights reserved 6


Data Governance
Four key functional areas

Data Access Control Data Access Audit

Control who has access to which data Capture and record all access to data

Data Lineage Data Discovery

Capture upstream sources and downstream Ability to search for and discover authorized assets
consumers

©2023 Databricks Inc. — All rights reserved 7


Governance for data, analytics and AI is complex

Permissions on files No row and column level permissions

Inflexible when policies change


Data Lake
Data analyst
Permissions on tables and views
Can be out of sync with data

Metadata

Permissions on tables, columns, rows


Data engineer Different governance model

Data Warehouse
Permissions on ML models,
dashboards, features, …
Yet another governance model
Data scientist
ML and AI

©2023 Databricks Inc. — All rights reserved 8


Databricks Unity Catalog
Unified governance for data, analytics and AI

Data Lake
Data analyst

Metadata
Unity Catalog
Data engineer

Data Warehouse

Data scientist
ML and AI

©2023 Databricks Inc. — All rights reserved 9


Unity Catalog
Overview

Unified governance across Unified data and AI assets Unified existing catalogs
clouds
Centrally share, audit, secure and Works in concert with existing
Fine-grained governance for data manage all data types with one data, storage, and catalogs - no
lakes across clouds - based on simple interface. hard migration required.
open standard ANSI SQL.

1 2 3
©2023 Databricks Inc. — All rights reserved 10
Unity Catalog
Key Capabilities

● Centralized metadata and user management Unity Catalog

● Centralized data access controls Databricks Databricks


Workspace Workspace

● Data access auditing


● Data lineage GRANT … ON … TO …
REVOKE … ON … FROM …

● Data search and discovery Catalogs, Databases (schemas),


Tables, Views, Storage
● Secure data sharing with Delta Sharing credentials, External locations

©2023 Databricks Inc. — All rights reserved 11


Unity Catalog
Key Concepts

©2023 Databricks Inc. — All rights reserved 12


Metastore
Unity Catalog metastore elements

Metastore

Control
Plane
Storage Credential External Location Catalog Share Recipient

Schema
(Database) Cloud
Storage

Table View Function

©2023 Databricks Inc. — All rights reserved 13


Metastore
Accessing legacy Hive metastore

Metastore

hive_metastore Catalog 1 Catalog 2

Schema
Workspace (Database)

Table View Function

©2023 Databricks Inc. — All rights reserved 14


Catalog
Top-level container for data objects

Metastore

Storage Credential External Location Catalog Share Recipient

Schema
(Database)

Table View Function

©2023 Databricks Inc. — All rights reserved 15


Catalog
Three-level namespace

Traditional SQL two-level Unity Catalog three-level


namespace namespace

SELECT * FROM schema.table SELECT * FROM catalog.schema.table

©2023 Databricks Inc. — All rights reserved 16


Data Objects
Schema (database), tables, views, functions

Metastore

Storage Credential External Location Catalog Share Recipient

Schema
(Database)

Managed table
Table View Function
External table

©2023 Databricks Inc. — All rights reserved 17


External Storage
Storage credentials and external locations

Metastore

Storage
External Location Catalog Share Recipient
Credential

Schema
(Database)

Table View Function

©2023 Databricks Inc. — All rights reserved 18


Delta Sharing
Shares and recipients

Metastore

Storage Credential External Location Catalog Share Recipient

Schema
(Database)

Table View Function

©2023 Databricks Inc. — All rights reserved 19


Unity Catalog
Architecture

©2023 Databricks Inc. — All rights reserved 20


Architecture

Before Unity Catalog With Unity Catalog

Workspace 1 Workspace 2 Unity Catalog

User/group User/group User/group Access


Metastore
management management management controls

Metastore Metastore

Access controls Access controls Workspace 1 Workspace 2

Compute Compute Compute Compute


resources resources resources resources

©2023 Databricks Inc. — All rights reserved 21


Query Lifecycle
Unity Catalog Security Model

Check namespace,
2 metadata and grants

Return short-lived
1 Send query 4 token and signed URL Audit Log

Assume IAM Role or


Principal Request data from URL 3
8 Send result 5 with short-lived token
Service Principal

Compute
Enforce
7 policies
6 Return data

Cloud Storage

©2023 Databricks Inc. — All rights reserved 22


Compute Resources
and Unity Catalog

©2023 Databricks Inc. — All rights reserved 23


Compute Resources for Unity Catalog
Cluster Access Mode

Modes supporting UC Modes not supporting UC

Single user No isolation shared


Multiple language support, not
Multiple language support
shareable
Shared
Shareable, Python and SQL, legacy
table ACLs

©2023 Databricks Inc. — All rights reserved 24


Cluster Access Mode
Feature matrix

Init scripts
Supported Legacy Credential DBFS Fuse Dynamic Machine
Access mode Shareable RDD API and
languages table ACL passthrough mounts views learning
libraries

No Isolation
All ⬤ ⬤ ⬤ ⬤ ⬤
Shared

Single user All ⬤ ⬤ ⬤ ⬤ ⬤

SQL
Shared ⬤ ⬤ ⬤
Python

©2023 Databricks Inc. — All rights reserved 25


Roles and Identities in
Unity Catalog

©2023 Databricks Inc. — All rights reserved 26


Unity Catalog
Roles

Cloud Admin Identity Admin Cloud Admin

• Manage underlying cloud resources


Account Admin • Storage accounts/buckets
• IAM role/service principals/managed
identities
Metastore Admin
Identity Admin

Data Owner • Manage users and groups in the identity


provider (IdP)
• Provision into account (with account admin)
Workspace Admin

©2023 Databricks Inc. — All rights reserved 27


Unity Catalog
Roles

Account Admin

Cloud Admin Identity Admin • Create or delete metastores, assign


metastores to workspaces
• Manage users and groups, integrate with IdP
Account Admin • Full access to all data objects

Metastore Admin
Metastore Admin
• Create or drop, grant privileges on, and
change ownership of catalogs and otherdata
Data Owner objects

Data Owner - owns data objects they created


Workspace Admin
• Create nested objects, grant privileges on,
and change ownership of owned objects

©2023 Databricks Inc. — All rights reserved 28


Unity Catalog
Roles

Cloud Admin Identity Admin


Workspace Admin
Account Admin • Manages permissions on workspace assets
• Restricts access to cluster creation
Metastore Admin • Adds or removes users
• Elevates users permissions
• Grant privileges to others
Data Owner • Change job ownership

Workspace Admin

©2023 Databricks Inc. — All rights reserved 29


Unity Catalog
Identities
• User • Service Principal
• Account Administrator • Service Principal with administrative privileges

[email protected] terraform

First name First name


App ID UUID
GUID
Last name Last name
Name terraform
Password ●●●●●●●●●●
Admin role
Admin role

©2023 Databricks Inc. — All rights reserved 30


Unity Catalog
Identities
• Groups

allusers

analysts developers

[email protected] terraform

©2023 Databricks Inc. — All rights reserved 31


Unity Catalog
Identity Federation

Account Workspace 1 Workspace 2

[email protected]

Account identity

[email protected] [email protected]

Workspace identity Workspace identity

©2023 Databricks Inc. — All rights reserved 32


Data Access Control in
Unity Catalog

©2023 Databricks Inc. — All rights reserved 33


Security model

Principals Privileges Securables


Account admin CREATE
Catalog Schema
Metastore admin USAGE

Data owner SELECT


Table View
User MODIFY

Service principal CREATE TABLE


Function
Group READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 34


Security model

Principals Privileges Securables


Data owner CREATE
Catalog Schema
Account admin USAGE

Metastore admin SELECT


Table View
User MODIFY

Service principal CREATE TABLE


Function
Group READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 35


Security model

Principals Privileges Securables


Data owner CREATE
Table
Account admin USAGE

Metastore admin SELECT Catalog Schema View

User MODIFY
Function
Service principal CREATE TABLE
Storage External
Group READ FILES
credential location
WRITE FILES
Share Recipient
EXECUTE

©2023 Databricks Inc. — All rights reserved 36


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 37


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 38


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 39


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 40


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 41


Security model

Privileges Securables
CREATE
Catalog Schema
USAGE

SELECT
Table View
MODIFY

CREATE TABLE
Function
READ FILES

WRITE FILES
Storage credential External location
EXECUTE

Share Recipient

©2023 Databricks Inc. — All rights reserved 42


Privilege Recap
Tables

Querying tables (SELECT)


Metastore
Modifying tables (MODIFY)
• Data (INSERT, DELETE)
✓ USAGE Catalog • Metadata (ALTER)
Traversing containers (USAGE)

✓ USAGE Schema

✓ SELECT/MODIFY Table

©2023 Databricks Inc. — All rights reserved 43


Privilege Recap
Views

Abstract complex queries


Metastore
• Aggregations
• Transformations
✓ USAGE Catalog
• Joins
• Filters
Enhanced table access control
✓ USAGE Schema
Querying views (SELECT)
Traversing containers (USAGE)
✓ SELECT View Table

©2023 Databricks Inc. — All rights reserved 44


Privilege Recap
Functions

Provide custom code via


Metastore
user-defined functions
Using functions (EXECUTE)
✓ USAGE Catalog Traversing containers (USAGE)

✓ USAGE Schema

✓ EXECUTE Function

©2023 Databricks Inc. — All rights reserved 45


Dynamic Views

Limit access to columns Limit access to rows Data Masking

Omit column values from output Omit rows from output Obscure data

●●●●●●@databricks.com

Can be conditional on a specific user/service principal or group


membership through Databricks-provided functions

©2023 Databricks Inc. — All rights reserved 46


Creating New Objects

Creating new objects (CREATE)


Metastore
Traversing containers (USAGE)

✓ USAGE Catalog

✓ USAGE
Schema
✓ CREATE

New table, view or


function

©2023 Databricks Inc. — All rights reserved 47


Deleting Objects

DROP objects
Metastore

Catalog

Schema

Table, view or function

©2023 Databricks Inc. — All rights reserved 48


Unity Catalog External
Storage

©2023 Databricks Inc. — All rights reserved 49


Storage Credentials and External Locations

Storage Credential External Location

Enables Unity Catalog to connect to Cloud storage path + storage


an external cloud storage credential
Examples include: • Self-contained object for
accessing specific locations in
• IAM role for AWS S3
cloud storage
• Service principal for Azure Storage
• Fine-grained control over external
storage

©2023 Databricks Inc. — All rights reserved 50


Storage Credentials and External Locations
Access Control

CREATE TABLE READ FILES WRITE FILES

Create an External Table Read files directly using this Write files directly using this
directly using this Storage Storage Credential Storage Credential
Credential
Storage Credential

External Location
Create an External Table
from files governed by this Read files governed by this Write files governed by this
External Location External Location External Location

©2023 Databricks Inc. — All rights reserved 51


Managed Tables

Metastore

Catalog

Schema

Managed table

Metastore storage

©2023 Databricks Inc. — All rights reserved 52


External Tables

Metastore

Catalog Storage credential

Schema External location

Managed table External table

Metastore storage External storage

©2023 Databricks Inc. — All rights reserved 53


Unity Catalog Patterns
and Best Practices

©2023 Databricks Inc. — All rights reserved 54


UC Patterns & Best Practices
1 metastore per region Region B

dev staging prod


workspace workspace workspace

Metastore

Region A

dev staging prod


workspace workspace workspace

Metastore

©2023 Databricks Inc. — All rights reserved 55


UC Patterns & Best Practices
Share data with Delta Sharing Region B

dev staging prod


workspace workspace workspace

Metastore

Region A

dev staging prod


workspace workspace workspace

Metastore

Share tables from Region A with Region B


©2023 Databricks Inc. — All rights reserved 56
UC Patterns & Best Practices
Data Segregation

Use catalogs (not metastores) to segregate data Metastore

Apply permissions appropriately


For example, grant to group B: ✓ USAGE
Catalog A Catalog B
• USAGE on catalog B
• USAGE on all applicable schemas in catalog B
✓ USAGE
• SELECT/MODIFY on applicable tables Schema A Schema B

✓ SELECT
Table 1 Table 2 Table 3

©2023 Databricks Inc. — All rights reserved 57


UC Patterns & Best Practices
Data Segregation Catalogs
dev

staging Environment scope


prod

bu1_dev

Metastore bu1_staging Business unit + environment scope


bu1_prod

team1_sandbox
Sandboxes
team2_sandbox

©2023 Databricks Inc. — All rights reserved 58


UC Patterns & Best Practices
Identity Management

Account-level Identities Groups Service Principals


Manage all identities at the Use groups rather than users to Use service principals to run
account-level assign access and ownership to production jobs
securable objects
Enable UC for workspaces to
enable identity federation
analysts
terraform
[email protected]
App ID GUID
Name terraform
Admin

developers

terraform

©2023 Databricks Inc. — All rights reserved 59


UC Patterns & Best Practices
Storage Credentials and External Locations

External
user1/
location 1
users/
External
user2/
location 2
Storage /
credential
External
tables/
location 3
shared/
External
tmp/
location 4

©2023 Databricks Inc. — All rights reserved 60


©2023 Databricks Inc. — All rights reserved 61
Unity Catalog Key
Capabilities

©2023 Databricks Inc. — All rights reserved 62


Centralized metadata and user management
Unity Catalog Architecture

Before Unity Catalog With Unity Catalog

Workspace 1 Workspace 2 Unity Catalog

User/group User/group User/group Access


Metastore
management management management controls

Metastore Metastore

Access controls Access controls Workspace 1 Workspace 2

Compute Compute Compute Compute


resources resources resources resources

©2023 Databricks Inc. — All rights reserved 63


Centralized Access Controls
Centrally grant and manage access permissions across workloads

Using ANSI SQL DCL Using UI

GRANT <privilege> ON <securable_type>


<securable_name> TO `<principal>`

GRANT SELECT ON iot.events TO engineers

Choose
‘Table’= collection of
permission level files in S3/ADLS Sync groups from
your identity
provider

64
Three level namespace
Seamless access to your existing metastores

Unity Catalog

hive_metastore
Catalog 2 Catalog 1
(legacy)

default
(database) Database 2 Database 1

customers External External Managed


(table) Views
Table Tables Tables

SELECT * FROM main.student.example; -- <catalog>.<database>.<table>


SELECT * FROM hive_metastore.default.customers;

65
Managed Data Sources & External Locations
Simplify data access management across clouds

External
Audit log
Unity Locations &
Catalog Credentials

Access Control
Cloud Storage
(S3, ADLS, GCS)

Managed Managed
Managed container / bucket Data Sources
tables

External
External container / bucket
User Cluster or tables
SQL warehouse … External
Locations
External
Files in container / bucket
Cloud Strg

66
Automated lineage for all workloads
End-to-end visibility into how data flows and consumed in your organization

● Auto-capture runtime data lineage on


a Databricks cluster or SQL warehouse
● Track lineage down to the table and
column level
● Leverage common permission model
from Unity Catalog
● Lineage across tables, dashboards,
workflows, notebooks

67
Lineage flow - How it works

ETL / Job
Explore lineage in UI
Workspace Table and
Lineage
cluster / SQL column
service
Warehouse lineage
Ad-hoc Alation

FY23Q4 Microsoft
Purview
Collibra

DLT
External Catalogs

● Code (any language) is submitted ● Lineage service analyzes logs emitted ● Presented to the end user
to a cluster or SQL warehouse or from the cluster, and pulls metadata graphically in Databricks
DLT* executes data flow from DLT ● Lineage can be exported via
● Assembles column and table level API and imported into other
lineage tool

68
Built-in search and discovery
Accelerate time to value with low latency data discovery

● UI to search for data assets stored in


Unity Catalog
● Unified UI across DSML + DBSQL
● Leverage common permission model
from Unity Catalog

69
An open standard for secure sharing of data assets
Unity Catalog -Architecture

Audit Log Account Level


Metastore
User Mgmt

Lineage Unity Storage


Explorer Catalog Credentials

Data Explorer Access ACL Store


Control

Cloud Storage
(S3, ADLS, GCS)

Databricks
✔ * Container / bucket
Workspace
User

©2021 Databricks Inc. — All rights reserved * Unity Catalog will support any data format (table or raw files) 71

You might also like