50 Essential Topics to Master Delta Live Tables (DLT)
1. What are Delta Live Tables (DLT)?
2. Benefits of DLT vs traditional pipelines
3. Batch vs Streaming pipelines in DLT
4. Architecture of Delta Live Tables
5. Difference between DLT and Delta Lake
6. How DLT handles schema evolution
7. What is a declarative pipeline?
8. DLT vs Autoloader vs Structured Streaming
9. Delta Lake features used in DLT (e.g., time travel, schema enforcement)
10. DLT supported formats (CSV, JSON, Avro, Parquet, etc.)
11. LIVE keyword and its purpose
12. Tables vs Views in DLT
13. Materialized vs Streaming Views
14. How DLT handles CDC (Change Data Capture)
15. Medallion Architecture in DLT (Bronze/Silver/Gold)
16. Incremental processing in DLT
17. Table expectations (expect_or_drop, expect_or_fail)
18. Autoloader integration in DLT
19. DAG generation and lineage tracking
20. Time travel in DLT tables
21. Creating a DLT pipeline in Databricks UI
22. Creating a DLT pipeline with Python code
23. YAML configuration for DLT pipelines
24. SQL vs Python syntax in DLT
25. Using notebooks in DLT pipelines
26. Managing dependencies between tables
27. Pipeline modes: Continuous vs Triggered
28. Using comments and metadata
29. Using @dlt.table and @dlt.view decorators
30. Logging in DLT pipelines
31. Enabling CDC using apply_changes
32. Using streaming and trigger_once
33. Handling Slowly Changing Dimensions (SCD) in DLT
34. Dynamic schema inference and enforcement
35. Data Quality with DLT expectations
36. Version control and pipeline history
37. Integration with Unity Catalog
38. Performance tuning and optimization
39. Using DLT for real-time dashboards
40. Monitoring pipeline health & metrics
41. Data lineage and audit trails
42. Access control for DLT pipelines
43. Managing data retention and GDPR compliance
44. Role of Unity Catalog in governance
45. Parameterization and secrets in DLT
46. DLT cost management and pricing
47. Best practices for organizing DLT projects
48. Debugging and troubleshooting DLT errors
49. Alerts and pipeline failure notifications
50. Deploying DLT in production environments
50 Essential Topics to Master Databricks Workflows
1. What are Databricks Workflows?
2. Use cases of Workflows (ETL, ML pipelines, Alerts)
3. Differences between Workflows and Delta Live Tables
4. Types of tasks (Notebook, JAR, Python, SQL, dbt)
5. Workflows UI overview
6. Creating your first workflow job
7. Difference between manual and scheduled jobs
8. Jobs vs Tasks in Workflows
9. Cluster types in workflows (job vs all-purpose)
10. Passing parameters to tasks
11. Multi-task workflows
12. Task dependencies and sequencing
13. Reusing code between tasks
14. Task retries and timeout settings
15. Running a notebook task
16. Running a Python script task
17. Running a dbt task
18. Running a SQL task
19. Setting and using task parameters
20. Using dbutils.jobs.taskValues to pass values
21. Cron expressions for scheduling
22. Triggering workflows on file arrival (Auto Loader integration)
23. Event-driven workflows (e.g., Unity Catalog, webhooks)
24. Triggering jobs from REST API
25. Triggering jobs from external tools (Airflow, Azure Data Factory)
26. If/Else branching in workflows (via task outputs)
27. Looping and dynamic task generation (workarounds)
28. Job run timeout and concurrency settings
29. Notifications and alerting (email, webhook, Slack)
30. Custom retry logic using notebook code
31. Monitoring job runs via UI
32. Debugging failed jobs (logs, traceback)
33. Logging best practices in notebooks
34. Viewing job run history and statuses
35. Exporting logs from workflow runs
36. Workflows API (create, run, list, cancel)
37. CI/CD integration with Workflows
38. Git integration for notebooks in Workflows
39. Using Workflows with MLflow models
40. Triggering jobs from external systems (Jenkins, GitHub Actions)
41. Job permissions (run, edit, view)
42. Unity Catalog support in Workflows
43. Secrets management with Databricks Secrets
44. Personal vs Shared jobs
45. Running jobs as service principals
46. Cluster reuse for cost optimization
47. Modular design with reusable notebooks
48. Using environment variables
49. Tagging workflows for cost tracking
50. Best practices for production-grade workflows