Dimensional Data Modeling
Lecture 3 – Dimensional Modeling
Considerations
Monster Dimensions
Case where the dimension table is very large
(millions of rows) changes frequently and
contains a large number of attributes
Example: Customer
Assume there is a small set of variables that
change often Customer_Dim
•Name
•Address
•etc Sales_Fact
•Income
•Education
•Marital Status
•Etc.
2
Monster Dimensions
Strategy: split the variables that change
frequently into own table.
Need to ‘band’ continuous variables like salary
into ranges (20000-40000, 40000-60000, etc).
Each variable on the new table needs to have
small number of variables.
Need to create one row in the new table for
each combination of variables. Assume: 2
variables with 5 values each. How many rows
would be in the table?
3
Monster Dimensions
Answer: 25 (5*5)
The ‘base’ Customer_Dim
Customer •Cust_key
•Name Sales_Fact
Dimension table •Address
Cust_key
contains the key to •Demo_key
Demo_Key
the new table. Demo_Dim
Keys of both the •Demo_key
•Income_Range
new key and the •Education
are placed as •Marital Status
foreign keys on
fact table What happens when
customer income
changes?
4
Degenerate Dimensions
Consider an order. What is the grain of
information we would want in a fact table?
What information is left at the order level, other
than the information that we place in the fact
table (date, customer, product, quantity, etc.?)
5
Degenerate Dimensions
The answer: probably only the order number.
A degenerate dimension is one that has no
attributes other that the key value
Strategy: make the order number an attribute
of the fact table. It will look like a dimension
key, but will not join to anything – it is just an
attribute.
This allows us to perform analysis at the order
level (GROUP BY)
What are some other degenerate dimensions?
6
Different types of facts
There are multiple types: additive, semi-
additive and non-additive.
Additive: can ‘add’ the values across all
dimensions (e.g., sales revenue).
Semi-additive: Certain types of facts are not
‘perfectly’ additive but represent a snapshot at a
point in time (account balances, inventory
balances). These cannot be treated the same
as perfectly additive facts
Non-additive: Some facts can be textual (non-
additive). Basically, can only count these.
Example of a non-additive fact?
7
Families of facts
When designing a data warehouse, need to
think of the process to be supported.
Its important to realize that this translates into a
set of related facts – a value chain. Examples:
Inquiry Order Shipment Invoice Return
Credit
8
Transaction and Snapshot Facts
When the operations of an organization are
examined, its important to realize that most
organizations want to look at their information
on a transactional and a snapshot basis.
Transaction basis: look at individual
transactions (inventory movement, sales, ATM
transactions, etc.). Allows analysis of patterns
of behavior (time of day analysis, market basket
analysis, etc.)
9
Transaction and Snapshot Facts
Typically created in addition to a transaction
fact.
Typically, create ‘snapshots’ at the end of
specific reporting periods (month-end, etc.)
Rolling snapshot – continuously update then
‘publish’ – advantage spreads the work.
Example: monthly sales. Bank month end
account balance and monthly transaction
counts, etc.
10
Snapshot Example
ATM Activity Snapshot
A snapshot
(Foreign_Keys)
for ATM .
usage, by .
.
account, by Transaction count
month Account balance
Revenue_earned
Average_daily_balance
.
.
11
Factless Facts
Type of fact where there is no real measure
(additive or otherwise)
Typically factless facts related to events
Attendance in class, for example
Typically, add a dummy variable with value of 1
for purpose of counting
12