Module 1:
Data Factory Fundamentals
What is it and why use it?
Resource Components
Common Activities
Execution Dependencies
Azure Data Factory –
What is it?
Why use it?
A Quick History Lesson
v1 v2 Data Flows (Alpha)
Sept
July Sept March
2020
2016 2017 2018
What is Azure Data Factory (ADF)?
https://azure.microsoft.com/en-gb/services/data-factory
What is Azure Data Factory (ADF)?
Copy Transform
Resource Components
Data Factory Components
Copy Transform
Data Factory Components
Data Factory Components
1 Linked Services – What to interact with and how?
SQLDBLinkedService
ConnectionString: Server=MyServer;Database=myDataBase
UserName: “MrPaulAndrew”
Password: ***************
Data Factory Components
1 Linked Services
2 Datasets – Where is my data? What format? What file path/table do I need?
[dbo].[SalesOrders]
/RAW/Orders/2018/01/01/SalesOrders.csv
Data Factory Components
1 Linked Services
2 Datasets
Databricks Notebook Activity
3 Activities – What do we notebookPath: /Playground/Playing
want to happen when we baseParameters: Testing
libraries[ jar]: dbfs:/lib1.jar
invoke a Linked Service?
linkedServiceName: BricksOfData01
With what conditions?
Data Factory Components
Extract Transform
1 Linked Services
2 Datasets
3 Activities
4 Pipelines – Logical groups of
work that can be executed.
Execute Pipeline
Activity
Data Factory Components
Extract
1 Linked Services
2 Datasets
3 Activities
4 Pipelines – Logical groups of Transform & Load
work that can be executed.
Data Factory Components
Extract Transform
1 Linked Services
2 Datasets Manually
Programmatically
3 Activities Schedule
Tumbling Windows
4 Pipelines
File Event
5 Triggers – Telling our when pipelines to run.
Data Factory Components
1 Linked Services
2 Datasets
3 Activities
4 Pipelines
5 Triggers
Data Factory Control Flow Components
1 Linked Services
2 Datasets
3 Activities
4 Pipelines
5 Triggers
Common Activities
Paul’s Favourites
Data Factory Common Activities
1 Linked Services
2 Datasets
3 Activities
4 Pipelines
5 Triggers
Copy
Dataset Dataset
(Source) (Sink)
Copy Data
Auto Scaling
Transactional Restarts
Handle Zip Compression
Attribute Mapping and Schema Drift
Handle Failed Rows
Add Custom Attributes
Parse Excel & JSON Files
Lookup
Get value to support other control flow activities
Single Value
Or
Many Values
Dataset [array]
Lookup
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-lookup-activity
ForEach IsSequential:
true
[array]
Scaling Out Control Flow Activities [0]
[1]
Many Values [2]
[array]
[3]
[i]
ForEach
[array]
Lookup
[0] [1] [2] [3] [4] [5] [6] [i]
Copy Data Do Stuff
Batch Count Default: 20
@item().
Batch Count Max: 50
https://docs.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity
Switch
@case Switch
Default: Dev: Test: Prod:
Run Notebook Run Notebook
Raise on Small on Medium
Run Notebook
Error on Big Cluster
Cluster Cluster
https://mrpaulandrew.com/2020/01/22/using-the-azure-data-factory-switch-activity/
Execute Pipeline
Execute Pipeline
Call Child
Pipeline
Custom
Extend Data Factory with Custom Code
References Objects
Datasets: []
Linked Services: []
Custom
Linked Services
Azure Batch ???
Azure Blob Storage
https://mrpaulandrew.com/2018/11/12/creating-an-azure-data-factory-v2-custom-activity/
Azure Function
Web
Do Stuff
Extend Data Factory with Rest Calls
Web Hook
GET Do Stuff
POST
PUT
etc...
Do Stuff ???
Headers
Body
Execution Dependencies
Execution Dependency Options
Success
Fail
Get
Values
Complete
Skip
Execution On Failure
Get
Values
Error Handler
Execution On Failure or On Success
Do Stuff
Get
Values
Error Handler
Execution On ???
Run Stored
Do Stuff Procedure
Get
Values
AND AN
D
Error Handler
Execution On Failure or On Success
Run Stored
Do Stuff Procedure
Get
Values
OR OR OR
Error Handler Error Handler Error Handler
Module 1:
Data Factory Fundamentals
What is it and why use it?
Resource Components
Common Activities
Execution Dependencies