Transforming data using Mapping Data Flow
Mapping Data Flows allow data engineers to develop data transformation logic
without writing code.
Mapping Data Flows are executed as activities within Azure Data Factory
pipelines that use scaled-out Apache Spark clusters.
Mapping Data Flows provide an environment for building a wide range of data
transformations visually without the need to use code.
Mapping Data Flows also provides the capability to monitor the execution of the
transformations so that you can view how the transformations are progressing,
or to understand any errors that may occur.
Different transformation types that enable
you to modify data.
Schema Modifier
These transformations will make a modification to a sink destination by creating
new columns based on the action of the transformation.
Aggregate
Define different types of aggregations such as SUM, MIN, MAX, and COUNT
grouped by existing or computed columns.
Derived column
Generate new columns or modify existing fields using the data flow expression
language.
Pivot
An aggregation where one or more grouping columns has distinct row values
transformed into individual columns.
Surrogate key
Add an incrementing non-business arbitrary key value.
Select
Select transformation to rename, drop, or reorder columns.
This transformation doesn't alter row data, but chooses which columns are
propagated downstream.
Unpivot
Pivot columns into row values.
Window
Define window-based aggregations of columns in your data streams.
Rank
Rank transformation to generate an ordered ranking based upon sort conditions
specified by the user.
External Call
The external call transformation enables data engineers to call out to external
REST end points row-by-row in order to add custom or third-party results into
your data flow streams.
Row Modifier
These types of transformations impact how the rows are presented in the
destination.
Alter row
Set insert, delete, update, and upsert policies on rows. You can add one-to-many
conditions as expressions.
These conditions should be specified in order of priority, as each row will be
marked with the policy corresponding to the first-matching expression.
Each of those conditions can result in a row (or rows) being inserted, updated,
deleted, or upserted. Alter Row can produce both DDL & DML actions against
your database.
Filter
Filter a row based upon a condition.
Cast
cast transformation to easily modify the data types of individual columns in a data flow.
The cast transformation also enables an easy way to check for casting errors.
Sort
Sort incoming rows on the current data stream.
These types of transformations will generate new data pipelines or merge
pipelines into one.
Assert
The assert transformation enables you to build custom rules inside your mapping
data flows for data quality and data validation.
Conditional split
Route rows of data to different streams based on matching conditions.
Exists
Check whether your data exists in another source or stream.
Join
Combine data from two sources or streams.
Lookup
Enables you to reference data from another source.
Union
Combine multiple data streams vertically.
Flatten
Take array values inside hierarchical structures such as JSON and unroll them
into individual rows.
Stringify
This can be useful when you need to store or send column data as a single
string entity that may originate as a structure, map, or array type.
Parse
Parse transformation to parse text columns in your data that are strings in
document form.
The current supported types of embedded documents that can be parsed are
JSON, XML, and delimited text.