Slowly Changing Dimensions (SCD) - Types | Data Warehouse
Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes
slowly, rather than changing regularly on a time basis.
For example, you may have a customer dimension in a retail domain. Let say the customer is in India and every
month he does some shopping. Now creating the sales report for the customers is easy. Now assume that the
customer is transferred to United States and he does shopping there. How to record such a change in your
customer dimension?
You could sum or average the sales done by the customers. In this case you won't get the exact comparison of
the sales done by the customers. As the customer salary is increased after the transfer, he/she might do more
shopping in United States compared to in India. If you sum the total sales, then the sales done by the customer
might look stronger even if it is good. You can create a second customer record and treat the transferred
customer as the new customer. However this will create problems too.
Handling these issues involves SCD management methodologies which referred to as Type 1 to Type 3. The
different types of slowly changing dimensions are explained in detail below.
SCD Type 1: SCD type 1 methodology is used when there is no need to store historical data in the dimension
table. This method overwrites the old data in the dimension table with the new data. It is used to correct data
errors in the dimension.
As an example, i have the customer table with the below data.
surrogate_key customer_id customer_name Location
-----------------------------------------------1
Marspton
Illions
Here the customer name is misspelt. It should be Marston instead of Marspton. If you use type1 method, it just
simply overwrites the data. The data in the updated table will be.
surrogate_key customer_id customer_name Location
-----------------------------------------------1
Marston
Illions
The advantage of type1 is ease of maintenance and less space occupied. The disadvantage is that there is no
historical data kept in the data warehouse.
SCD Type 3: In type 3 method, only the current status and previous status of the row is maintained in the table.
To track these changes two separate columns are created in the table. The customer dimension table in the type
3 method will look as
surrogate_key customer_id customer_name Current_Location previous_location
-------------------------------------------------------------------------1
Marston
Illions
NULL
Let say, the customer moves from Illions to Seattle and the updated table will look as
surrogate_key customer_id customer_name Current_Location previous_location
-------------------------------------------------------------------------1
Marston
Seattle
Illions
Now again if the customer moves from seattle to NewYork, then the updated table will be
surrogate_key customer_id customer_name Current_Location previous_location
-------------------------------------------------------------------------1
Marston
NewYork
Seattle
The type 3 method will have limited history and it depends on the number of columns you create.
SCD Type 2: SCD type 2 stores the entire history the data in the dimension table. With type 2 we can store
unlimited history in the dimension table. In type 2, you can store the data in three different ways. They are
Versioning
Flagging
Effective Date
SCD Type 2 Versioning: In versioning method, a sequence number is used to represent the change. The latest
sequence number always represents the current row and the previous sequence numbers represents the past
data.
As an example, lets use the same example of customer who changes the location. Initially the customer is in
Illions location and the data in dimension table will look as.
surrogate_key customer_id customer_name Location Version
-------------------------------------------------------1
Marston
Illions
The customer moves from Illions to Seattle and the version number will be incremented. The dimension table will
look as
surrogate_key customer_id customer_name Location Version
-------------------------------------------------------1
Marston
Illions
Marston
Seattle
Now again if the customer is moved to another location, a new record will be inserted into the dimension table
with the next version number.
SCD Type 2 Flagging: In flagging method, a flag column is created in the dimension table. The current record
will have the flag value as 1 and the previous records will have the flag as 0.
Now for the first time, the customer dimension will look as.
surrogate_key customer_id customer_name Location flag
-------------------------------------------------------1
Marston
Illions
Now when the customer moves to a new location, the old records will be updated with flag value as 0 and the
latest record will have the flag value as 1.
surrogate_key customer_id customer_name Location Version
-------------------------------------------------------1
Marston
Illions
Marston
Seattle
SCD Type 2 Effective Date: In Effective Date method, the period of the change is tracked using the start_date
and end_date columns in the dimension table.
surrogate_key customer_id customer_name Location Start_date
End_date
------------------------------------------------------------------------1
Marston
Illions
01-Mar-2010
20-Fdb-2011
Marston
Seattle
21-Feb-2011
NULL
The NULL in the End_Date indicates the current version of the data and the remaining records indicate the past
data.
Design/Implement/Create SCD Type 2 Effective Date Mapping in
Informatica
How to create or implement slowly changing dimension (SCD) Type 2 Effective Date mapping in informatica?
SCD type 2 will store the entire history in the dimension table. In SCD type 2 effective date, the dimension table
will have Start_Date (Begin_Date) and End_Date as the fields. If the End_Date is Null, then it indicates the
current row. Know more about SCDs at Slowly Changing Dimensions Concepts.
We will see how to implement the SCD Type 2 Effective Date in informatica. As an example consider the
customer dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
(
Customer_Id Number Primary Key,
Location
Varchar2(30)
);
--Target Dimension Table
Create Table Customers_Dim
(
Cust_Key Number Primary Key,
Customer_Id
Number,
Location
Varchar2(30),
Begin_Date
Date,
End_Date
Date
);
The basic steps involved in creating a SCD Type 2 Effective Date mapping are
Identifying the new records and inserting into the dimension table with Begin_Date as the Current date
(SYSDATE) and End_Date as NULL.
Identifying the changed record and inserting into the dimension table with Begin_Date as the Current
date (SYSDATE) and End_Date as NULL.
Identify the changed record and update the existing record in dimension table with End_Date as Curren
date.
We will divide the steps to implement the SCD type 2 Effective Date mapping into four parts.
SCD Type 2 Effective Date implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2 Effective Date. The steps involved
are:
Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Select the customer dimension table and click on OK.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id and location ports in the lookup transformation. Create a new port (IN_Customer_Id) in
the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier
transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the WHERE clause.
SELECT
Customers_Dim.Cust_Key as Cust_Key,
Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM
Customers_Dim
WHERE
Customers_Dim.End_Date IS NULL
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation
to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location
and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
New_Flag = IIF(ISNULL(Cust_Key), 1,0)
Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)
The part of the mapping flow is shown below.
SCD Type 2 Effective Date implementation - Part 2
In this part, we will identify the new records and insert them into the target with Begin Date as the current date.
The steps involved are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into
the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Date".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Begin_Date with date/time data type) and assign value
SYSDATE to it.
Now connect the ports of expression transformation (Nextval, Begin_Date) to the Target definition ports
(Cust_Key, Begin_Date). The part of the mapping flow is shown in the below image.
SCD Type 2 Effective Date implementation - Part 3
In this part, we will identify the changed records and insert them into the target with Begin Date as the current
date. The steps involved are:
Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the
changed records. Now drag the ports from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Now create an update strategy transformation and drag the ports of Filter transformation (customer_id,
location) into the update strategy transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val, Begin_Date ports of expression transformation (Expr_Date created in part 2)
to the cust_key, Begin_Date ports of the target definition respectively. The part of the mapping diagram is shown
below.
SCD Type 2 Effective Date implementation - Part 4
In this part, we will update the changed records in the dimension table with End Date as current date.
Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed
created in part 3) into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (End_Date with date/time
data type). Assign a value SYSDATE to this port.
Now create an update strategy transformation and drag the ports of the expression transformation into it.
Go to the properties tab and enter the update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the appropriate ports of update strategy to it. The
complete mapping image is shown below.
Design/Implement/Create SCD Type 2 Version Mapping in Informatica
Q) How to create or implement slowly changing dimension (SCD) Type 2 versioning mapping in informatica?
SCD type 2 will store the entire history in the dimension table. Know more about SCDs at Slowly Changing
Dimensions DW Concepts.
We will see how to implement the SCD Type 2 version in informatica. As an example consider the customer
dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
(
Customer_Id Number Primary Key,
Location
Varchar2(30)
);
--Target Dimension Table
Create Table Customers_Dim
(
Cust_Key Number Primary Key,
Customer_Id
Number,
Location
Varchar2(30),
Version
Number
);
The basic steps involved in creating a SCD Type 2 version mapping are
Identifying the new records and inserting into the dimension table with version number as one.
Identifying the changed record and inserting into the dimension table by incrementing the version
number.
Lets divide the steps to implement the SCD type 2 version mapping into three parts.
SCD Type 2 version implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2 version. The steps involved are:
Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Select the customer dimension table and click on OK.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id, location ports and Version ports in the lookup transformation. Create a new port
(IN_Customer_Id) in the lookup transformation. This new port needs to be connected to the customer_id port of
the source qualifier transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the order by clause.
SELECT
Customers_Dim.Cust_Key as Cust_Key,
Customers_Dim.Location as Location,
Customers_Dim.Version as Version,
Customers_Dim.Customer_Id as Customer_Id
FROM
ORDER BY
Customers_Dim
Customers_Dim.Customer_Id, Customers_Dim.Version--
You have to use an order by clause in the above query. If you sort the version column in ascending
order, then you have to specify "Use Last Value" in the "Lookup policy on multiple match" property. If you have
sorted the version column in descending order then you have to specify the "Lookup policy on multiple match"
option as "Use First Value"
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation
to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location
and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
New_Flag = IIF(ISNULL(Cust_Key), 1,0)
Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)
The part of the mapping flow is shown below.
SCD Type 2 version implementation - Part 2
In this part, we will identify the new records and insert them into the target with version value as 1. The steps
involved are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into
the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Ver".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Version) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Version) to the Target definition ports
(Cust_Key, Version). The part of the mapping flow is shown in the below image.
SCD Type 2 Version implementation - Part 3
In this part, we will identify the changed records and insert them into the target by incrementing the version
number. The steps involved are:
Create a filter transformation. This is used to find the changed record. Now drag the ports from
expression transformation (changed_flag), source qualifier transforamtion (customer_id, location) and LKP
transformation (version) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Create an expression transformation and drag the ports of filter transformation except the changed_flag
port into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (O_Version) and assign
the expression as (version+1).
Now create an update strategy transformation and drag the ports of expression transformation
(customer_id, location,o_version) into the update strategy transformation. Go to the properties tab and enter the
update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val port of expression transformation (Expr_Ver created in part 2) to the cust_key
port of the target definition. The complete mapping diagram is shown in the below image:
You can implement the SCD type 2 version mapping in your own way. Remember that SCD type2 version
mapping is rarely used in real time.
Design/Implement/Create SCD Type 2 Flag Mapping in Informatica
Q) How to create or implement slowly changing dimension (SCD) Type 2 Flagging mapping in informatica?
SCD type 2 will store the entire history in the dimension table. Know more about SCDs at Slowly Changing
Dimensions Concepts.
We will see how to implement the SCD Type 2 Flag in informatica. As an example consider the customer
dimension. The source and target table structures are shown below:
--Source Table
Create Table Customers
(
Customer_Id Number Primary Key,
Location
Varchar2(30)
);
--Target Dimension Table
Create Table Customers_Dim
(
Cust_Key Number Primary Key,
Customer_Id
Number,
Location
Varchar2(30),
Flag
Number
);
The basic steps involved in creating a SCD Type 2 Flagging mapping are
Identifying the new records and inserting into the dimension table with flag column value as one.
Identifying the changed record and inserting into the dimension table with flag value as one.
Identify the changed record and update the existing record in dimension table with flag value as zero.
We will divide the steps to implement the SCD type 2 flagging mapping into four parts.
SCD Type 2 Flag implementation - Part 1
Here we will see the basic set up and mapping flow require for SCD type 2 Flagging. The steps involved are:
Create the source and dimension tables in the database.
Open the mapping designer tool, source analyzer and either create or import the source definition.
Go to the Warehouse designer or Target designer and import the target definition.
Go to the mapping designer tab and create new mapping.
Drag the source into the mapping.
Go to the toolbar, Transformation and then Create.
Select the lookup Transformation, enter a name and click on create. You will get a window as shown in
the below image.
Select the customer dimension table and click on OK.
Edit the lookup transformation, go to the ports tab and remove unnecessary ports. Just keep only
Cust_key, customer_id and location ports in the lookup transformation. Create a new port (IN_Customer_Id) in
the lookup transformation. This new port needs to be connected to the customer_id port of the source qualifier
transformation.
Go to the conditions tab of the lookup transformation and enter the condition as Customer_Id =
IN_Customer_Id
Go to the properties tab of the LKP transformation and enter the below query in Lookup SQL Override.
Alternatively you can generate the SQL query by connecting the database in the Lookup SQL Override
expression editor and then add the WHERE clause.
SELECT
Customers_Dim.Cust_Key as Cust_Key,
Customers_Dim.Location as Location,
Customers_Dim.Customer_Id as Customer_Id
FROM
Customers_Dim
WHERE
Customers_Dim.Flag = 1
Click on Ok in the lookup transformation. Connect the customer_id port of source qualifier transformation
to the In_Customer_Id port of the LKP transformation.
Create an expression transformation with input/output ports as Cust_Key, LKP_Location, Src_Location
and output ports as New_Flag, Changed_Flag. Enter the below expressions for output ports.
New_Flag = IIF(ISNULL(Cust_Key), 1,0)
Changed_Flag = IIF( NOT ISNULL(Cust_Key) AND
LKP_Location != SRC_Location, 1, 0)
SCD Type 2 Flag implementation - Part 2
In this part, we will identify the new records and insert them into the target with flag value as 1. The steps
involved are:
Now create a filter transformation to identify and insert new record in to the dimension table. Drag the
ports of expression transformation (New_Flag) and source qualifier transformation (Customer_Id, Location) into
the filter transformation.
Go the properties tab of filter transformation and enter the filter condition as New_Flag=1
Now create a update strategy transformation and connect the ports of filter transformation (Customer_Id,
Location). Go to the properties tab and enter the update strategy expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Create a sequence generator and an expression transformation. Call this expression transformation as
"Expr_Flag".
Drag and connect the NextVal port of sequence generator to the Expression transformation. In the
expression transformation create a new output port (Flag) and assign value 1 to it.
Now connect the ports of expression transformation (Nextval, Flag) to the Target definition ports
(Cust_Key, Flag). The part of the mapping flow is shown in the below image.
SCD Type 2 Flag implementation - Part 3
In this part, we will identify the changed records and insert them into the target with flag value as 1. The steps
involved are:
Create a filter transformation. Call this filter transformation as FIL_Changed. This is used to find the
changed records. Now drag the ports from expression transformation (changed_flag), source qualifier
transformation (customer_id, location), LKP transformation (Cust_Key) into the filter transformation.
Go to the filter transformation properties and enter the filter condition as changed_flag =1.
Now create an update strategy transformation and drag the ports of Filter transformation (customer_id,
location) into the update strategy transformation. Go to the properties tab and enter the update strategy
expression as DD_INSERT.
Now drag the target definition into the mapping and connect the appropriate ports of update strategy
transformation to the target definition.
Now connect the Next_Val, Flag ports of expression transformation (Expr_Flag created in part 2) to the
cust_key, Flag ports of the target definition respectively. The part of the mapping diagram is shown below.
SCD Type 2 Flag implementation - Part 4
In this part, we will update the changed records in the dimension table with flag value as 0.
Create an expression transformation and drag the Cust_Key port of filter transformation (FIL_Changed
created in part 3) into the expression transformation.
Go to the ports tab of expression transformation and create a new output port (Flag). Assign a value "0"
to this Flag port.
Now create an update strategy transformation and drag the ports of the expression transformation into it.
Go to the properties tab and enter the update strategy expression as DD_UPDATE.
Drag the target definition into the mapping and connect the appropriate ports of update strategy to it. The
complete mapping image is shown below.
I have added one more filter transformation to check if already existing user with no changes
(changes flags=0)
Used a filter condition changes flags=1 means it will update only changed flags