De Mod 6 Manage Data Access For Analytics
De Mod 6 Manage Data Access For Analytics
Access for
Analytics
1. Describe Unity Catalog key concepts and how it integrates with the
Databricks platform
2. Access Unity Catalog through clusters and SQL warehouses
3. Create and govern data assets in Unity Catalog
4. Adopt Databricks recommendations into your organization’s Unity
Catalog-based solutions
DE 6.14 - Create and Share Tables DE 6.20 - Upgrade a Table to Unity Catalog
DE 6.15 - Create External Tables DE 6.21 - Create Views and Limiting Table Access
Source: Gartner
Control who has access to which data Capture and record all access to data
Capture upstream sources and downstream Ability to search for and discover authorized assets
consumers
Metadata
Data Warehouse
Permissions on ML models,
dashboards, features, …
Yet another governance model
Data scientist
ML and AI
Data Lake
Data analyst
Metadata
Unity Catalog
Data engineer
Data Warehouse
Data scientist
ML and AI
Unified governance across Unified data and AI assets Unified existing catalogs
clouds
Centrally share, audit, secure and Works in concert with existing
Fine-grained governance for data manage all data types with one data, storage, and catalogs - no
lakes across clouds - based on simple interface. hard migration required.
open standard ANSI SQL.
1 2 3
©2023 Databricks Inc. — All rights reserved 10
Unity Catalog
Key Capabilities
Metastore
Control
Plane
Storage Credential External Location Catalog Share Recipient
Schema
(Database) Cloud
Storage
Metastore
Schema
Workspace (Database)
Metastore
Schema
(Database)
Metastore
Schema
(Database)
Managed table
Table View Function
External table
Metastore
Storage
External Location Catalog Share Recipient
Credential
Schema
(Database)
Metastore
Schema
(Database)
Metastore Metastore
Check namespace,
2 metadata and grants
Return short-lived
1 Send query 4 token and signed URL Audit Log
Compute
Enforce
7 policies
6 Return data
Cloud Storage
Init scripts
Supported Legacy Credential DBFS Fuse Dynamic Machine
Access mode Shareable RDD API and
languages table ACL passthrough mounts views learning
libraries
No Isolation
All ⬤ ⬤ ⬤ ⬤ ⬤
Shared
SQL
Shared ⬤ ⬤ ⬤
Python
Account Admin
Metastore Admin
Metastore Admin
• Create or drop, grant privileges on, and
change ownership of catalogs and otherdata
Data Owner objects
Workspace Admin
[email protected] terraform
allusers
analysts developers
[email protected] terraform
Account identity
[email protected] [email protected]
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
User MODIFY
Function
Service principal CREATE TABLE
Storage External
Group READ FILES
credential location
WRITE FILES
Share Recipient
EXECUTE
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
Privileges Securables
CREATE
Catalog Schema
USAGE
SELECT
Table View
MODIFY
CREATE TABLE
Function
READ FILES
WRITE FILES
Storage credential External location
EXECUTE
Share Recipient
✓ USAGE Schema
✓ SELECT/MODIFY Table
✓ USAGE Schema
✓ EXECUTE Function
Omit column values from output Omit rows from output Obscure data
●●●●●●@databricks.com
✓ USAGE Catalog
✓ USAGE
Schema
✓ CREATE
DROP objects
Metastore
Catalog
Schema
Create an External Table Read files directly using this Write files directly using this
directly using this Storage Storage Credential Storage Credential
Credential
Storage Credential
External Location
Create an External Table
from files governed by this Read files governed by this Write files governed by this
External Location External Location External Location
Metastore
Catalog
Schema
Managed table
Metastore storage
Metastore
Metastore
Region A
Metastore
Metastore
Region A
Metastore
✓ SELECT
Table 1 Table 2 Table 3
bu1_dev
team1_sandbox
Sandboxes
team2_sandbox
developers
terraform
External
user1/
location 1
users/
External
user2/
location 2
Storage /
credential
External
tables/
location 3
shared/
External
tmp/
location 4
Metastore Metastore
Choose
‘Table’= collection of
permission level files in S3/ADLS Sync groups from
your identity
provider
64
Three level namespace
Seamless access to your existing metastores
Unity Catalog
hive_metastore
Catalog 2 Catalog 1
(legacy)
default
(database) Database 2 Database 1
65
Managed Data Sources & External Locations
Simplify data access management across clouds
External
Audit log
Unity Locations &
Catalog Credentials
Access Control
Cloud Storage
(S3, ADLS, GCS)
Managed Managed
Managed container / bucket Data Sources
tables
External
External container / bucket
User Cluster or tables
SQL warehouse … External
Locations
External
Files in container / bucket
Cloud Strg
66
Automated lineage for all workloads
End-to-end visibility into how data flows and consumed in your organization
67
Lineage flow - How it works
ETL / Job
Explore lineage in UI
Workspace Table and
Lineage
cluster / SQL column
service
Warehouse lineage
Ad-hoc Alation
FY23Q4 Microsoft
Purview
Collibra
DLT
External Catalogs
● Code (any language) is submitted ● Lineage service analyzes logs emitted ● Presented to the end user
to a cluster or SQL warehouse or from the cluster, and pulls metadata graphically in Databricks
DLT* executes data flow from DLT ● Lineage can be exported via
● Assembles column and table level API and imported into other
lineage tool
68
Built-in search and discovery
Accelerate time to value with low latency data discovery
69
An open standard for secure sharing of data assets
Unity Catalog -Architecture
Cloud Storage
(S3, ADLS, GCS)
Databricks
✔ * Container / bucket
Workspace
User
©2021 Databricks Inc. — All rights reserved * Unity Catalog will support any data format (table or raw files) 71