Unity Catalog in Databricks - Detailed Guide
Introduction
Unity Catalog is Databricks' unified governance solution that simplifies access control, audit logging, data discovery, and
lineage tracking across multiple workspaces and cloud platforms. It provides a consistent security and governance
model at scale for data and AI assets.
Object Model & Hierarchy
Unity Catalog organizes assets in the following hierarchy:
- Metastore: The top-level container for metadata and access policies.
- Catalog: Groups schemas; represents business units or environments.
- Schema (Database): Contains tables, views, volumes, functions, models.
- Objects: Data tables (managed/external), views, ML models, volumes (non-tabular), functions.
This structure allows fine-grained permissions at every level.
Governance Models
Unity Catalog supports both centralized and distributed governance models:
- Centralized: Admins control all access and configurations.
- Distributed: Catalog-level ownership by domain teams ensures domain-based security and flexibility.
Best Practice: Assign admin roles to user groups for scalable governance.
Data Access Control
Access policies are defined using ANSI SQL GRANT and REVOKE statements at the catalog, schema, table, and view
level.
You can bind specific catalogs to workspaces for environment isolation (e.g., development, production).
Storage Separation and Hierarchy
Unity Catalog supports hierarchical storage configuration:
1. Schema-level
2. Catalog-level
3. Metastore-level
Data separation can be achieved using dedicated cloud buckets (e.g., s3://myorg-hr-prod).
External Locations & Credentials
External locations link cloud storage paths with credentials. These are used to define external tables and volumes.
Unity Catalog in Databricks - Detailed Guide
Best Practices:
- Use volumes for SQL-based access.
- Avoid direct access to raw paths.
- Register tables instead of ad hoc file path access.
Lineage and Auditing
Unity Catalog automatically tracks the full lineage of tables, views, and notebooks. Audit logs include user activity, data
access, and permission changes. This is critical for compliance and operational observability.