SSIS supports numerous transformations that allow you to combine data originating from
multiple sources, cleanse the data and give it the shape your data destination expects. Then you
can import the data into a single or multiple destinations.
Description
Examples of when Transformation
Would be Used
Aggregate
Calculates aggregations such as SUM,
COUNT, AVG, MIN and MAX based on
the values of a given numeric column.
This transformation produces additional
output records.
Adding aggregated information to
your output. This can be useful for
adding totals and sub-totals to your
output.
Audit
Includes auditing information, such as
computer name where the package
runs, package version ID, task name,
etc in the data flow.
Creates advanced logs which
indicate where and when the
package was executed, how long it
took to run the package and the
outcome of execution.
Character Map
Performs minor manipulations on string
Applying string manipulations prior
columns. Converts all letters to
to loading data into the data
uppercase, lowercase, reverse bytes,
warehouse.
etc.
Transformation
Cleansing the data to extract specific
rows from the source. If a specific
Accepts an input and determines which
column does not conform to the
Conditional Split destination to pipe the data into based
predefined format (perhaps it has
on the result of condition.
leading spaces or zeros, nulls),
move such records to the error file.
Copy Column
Extracting columns that need to be
Makes a copy of a single or multiple
cleansed of leading / trailing spaces,
columns which will be further
applying character map
transformed by subsequent tasks in the
transformation to uppercase all data
package.
and then load it into the table.
Data Conversion Converts input columns from one data
type to another.
Converting columns extracted from
the data source to the proper data
type expected by the data
warehouse. Having such
transformation options allows us the
freedom of moving data directly from
its source into the destination
without having an intermediary
staging database.
Data Mining
Query
Queries a data mining model. Includes
a query builder to assist you with
development of Data Mining
eXpressions (DMX) prediction queries.
Evaluating the input data set against
a data mining model developed with
Analysis Services.
Removing leading and trailing
Calculates new column value based on
spaces from a column. Add title of
Derived Column an existing column or multiple columns,
courtesy (Mr., Mrs., Dr, etc) to the
or variable or built in functions.
name.
Export Column
Saving large strings or images into
Exports contents of large columns
files while moving the rest of the
(TEXT, NTEXT, IMAGE data types) into
columns into a transactional
files.
database or data warehouse.
Fuzzy Grouping
Finds close or exact matches between
multiple rows in the data source. Adds
columns to the output including the
values and similarity scores.
Cleansing data by translating
various versions of the same value
to a common identifier. For example,
"Dr", "Dr.", "doctor", "M.D." should all
be considered equivalent.
Fuzzy Lookup
Compares values in the input data
source rows to values in the lookup
table. Finds the exact matches as well
as those values that are similar.
Cleansing data by translating
various versions of the same value
to a common identifier. For example,
"Dr", "Dr.", "doctor", "M.D." should all
be considered equivalent.
Imports contents of a file and appends
to the output. Can be used to append
TEXT, NTEXT and IMAGE data
columns to the input obtained from a
separate data source.
This transformation could be useful
for web content developers. For
example, suppose you offer college
courses online. Normalized course
meta-data, such as course_id,
name, and description is stored in a
typical relational table. Unstructured
course meta-data, on the other
hand, is stored in XML files. You can
use Import Column transformation to
add XML meta-data to a text column
in your course table.
Import Column
Lookup
Joins the input data set to the reference Obtaining additional data columns.
table, view or row set created by a SQL For example, the majority of
statement to lookup corresponding
employee demographic information
values. If some rows in the input data
do not have corresponding rows in the
lookup table then you must redirect
such rows to a different output.
Merge
Merge Join
might be available in a flat file, but
other data such as department
where each employee works, their
employment start date and job grade
might be available from a table in
relational database.
Combining the columns from
Merges two sorted inputs into a single multiple data sources into a single
output based on the values of the key
row set prior to populating a
columns in each data set. Merged
dimension table in a data
columns must have either identical or
warehouse. Using Merge
compatible data types. For example you transformation saves the step of
can merge VARCHAR (30) and
having a temporary staging area.
VARCHAR (50) columns. You cannot
With prior versions of SQL Server
merge INT and DATETIME columns.
you had to populate the staging area
first if your data warehouse had
multiple transactional data sources.
Combining the columns from
multiple data sources into a single
row set prior to populating a
dimension table in a data
warehouse. Using Merge Join
transformation saves the step of
having a temporary staging area.
Joins two sorted inputs using INNER
With prior versions of SQL Server
JOIN, LEFT OUTER JOIN or FULL
you had to populate the staging area
OUTER JOIN algorithm. You can
first if your data warehouse had
specify columns used for joining inputs. multiple transactional data sources.
Note that Merge and Merge Join
transformations can only combine
two data sets at a time. However,
you could use multiple Merge Join
transformations to include additional
data sets.
Multicast
Similar to the conditional split
transformation, but the entire data set
Populating the relational warehouse
as well as the source file with the
is piped to multiple destinations.
output of a derived column
Maintains logical copies of source data. transformation.
OLEDB
Command
Runs a SQL command for each input
data row. Normally your SQL statement
will include a parameter (denoted by the
? mark), for example: UPDATE
employee_source SET
has_been_loaded=1 WHERE
employee_id=?
Majorly, this transformation can be
used in Lookup/ SCD
transformations to update changed
values/attributes (Type -1 in SCD)
into database/data warehouse
system
Limiting the data set during
development phases of your project.
Your data sources might contain
billions of rows. Processing cubes
against the entire data set can be
prohibitively lengthy.
Percentage
Sampling
Loads only a subset of your data,
defined as the percentage of all rows in
the data source. Note that rows are
If you're simply trying to ensure that
chosen randomly.
your warehouse functions properly
and data values on transactional
reports match the values obtained
from your Analysis Services cubes
you might wish to only load a subset
of data into your cubes.
Pivot
Pivots the normalized data set by
certain column to create a more easily
readable output. Similar to PIVOT
command in Transact-SQL. You can
think of this transformation as
converting rows into columns.
Row count
Determining the total size of your
data set. You could also execute a
different set of tasks based on the
number of rows you have
Counts the number of transformed rows transformed. For example, if you
and store in a variable.
increase the number of rows in your
fact table by 5% you could perform
no maintenance. If you increase the
size of the table by 50% you might
wish to rebuild the clustered index.
Creating a row set that displays the
table data in a more user-friendly
format. The data set could be
consumed by a web service or could
be distributed to users through
email.
Row sampling
Script
Component
Loads only a subset of your data,
defined as the number of rows. Note
that rows are chosen randomly.
Every data flow consists of three main
components: source, destination and
transformation. Script Component
allows you to write transformations for
otherwise un-supported source and
destination file formats. Script
component also allows you to perform
transformations not directly available
through the built-in transformation
algorithms.
Limiting the data set during
development phases of your project.
Your data warehouse might contain
billions of rows. Processing cubes
against the entire data set can be
prohibitively lengthy.
If you're simply trying to ensure that
your warehouse functions properly
and data values on transactional
reports match the values obtained
from your Analysis Services cubes
you might wish to only load a subset
of data into your cubes.
Custom transformations can call
functions in managed assemblies,
including .NET framework. This type
of transformation can be used when
the data source (or destination) file
format cannot be managed by
typical connection managers. For
example, some log files might not
have tabular data structures. At
times you might also need to parse
strings one character at a time to
import only the needed data
elements.
Much like Script Task the Script
Component transformation must
be written using Visual Basic
.NET.
Maintains historical values of the
Slowly Changing
dimension members when new
Dimension
members are introduced.
Useful for maintaining dimension
tables in a data warehouse when
maintaining historical dimension
member values is necessary.
Sort
Ordering the data prior to loading it
Sorts input by column values. You can
sort the input by multiple columns in
either ascending or descending order.
The transformation also allows you to
specify the precedence of columns
used for sorting. This transformation
could also discard the rows with
duplicate sort values.
into a data warehouse. This could be
useful if you're ordering your
dimension by member name values
as opposed to sorting by member
keys.
You can also use Sort
transformation prior to feeding the
data as the input to the Merge Join
or Merge transformation.
Term Extraction
Extracts terms (nouns and noun
phrases) from the input text into the
transformation output column.
Processing large text data and
extracting main concepts. For
example, you could extract the
primary terms used in this section of
SQLServerPedia by feeding the
Term Extraction transformation the
text column containing the entire
section.
Term Lookup
Extracts terms from the input column
with TEXT data type and match them
with same or similar terms found in the
lookup table. Each term found in the
lookup table is scanned for in the input
column. If the term is found the
transformation returns the value as well
as the number of times it occurs in the
row. You can configure this
transformation to perform casesensitive search.
Analyzing large textual data for
specific terms. For example,
suppose you accept email feedback
for latest version of your software.
You might not have time to read
through every single email
messages that comes to the generic
inbox. Instead you could use this
task to look for specific terms of
interest.
Union ALL
Combines multiple inputs into a single
output. Rows are sorted in the order
they're added to the transformation. You
can ignore some columns from each
output, but each output column must be
mapped to at least one input column.
Note: Maintain the following prerequisites,
1. Data type,
Import data from multiple disparate
data sources into a single
destination. For example, you could
extract data from mail system, text
file, Excel spreadsheet and Access
database and populate a SQL
Server table.
Unlike Merge and Merge Join
transformations Union ALL can
2. Number of columns
3. Order of columns
4. Length
Un pivot
accept more than two inputs.
Opposite of Pivot transformation,
Unpivot coverts columns into rows. It
normalizes the input data set that has
many duplicate values in multiple
columns by creating multiple rows that
have the same value in a single column. Massaging a semi-structured input
data file and convert it into a
normalized input prior to loading
data into a warehouse.
For example if your input has a
customer name and a separate column
for checking and savings' accounts
Unpivot can transform it into a row set
that has customer, account and account
balance columns.