Data Warehouse Design
Sekalema Hamza
Intro
The star schema and the snowflake schema are ways to
organize data marts or entire data warehouses using
relational databases.
Both of them use dimension tables to describe data
aggregated in a fact table.
Star Schema
The most obvious characteristic of the star schema is
that dimension tables are not normalized.
In the model above, the pink fact_sales table stores
aggregated data created from our operational
database(s).
The light blue tables are dimension tables. We use these
five dimensions because we need to create reports
using them as parameters.
Snow Flake Schema
…
This snowflake schema stores exactly the same data as
the star schema.
The fact table has the same dimensions as it does in the
star schema example. The most important difference is
that the dimension tables in the snowflake schema are
normalized.
The process of normalizing dimension tables is called
snowflaking.
Snowflake vs. Star Schemas: Which Should
You Use?
Consider using the snowflake schema:
In large data warehouses. As the warehouse is Data
Central for the company, we could save lot of space
this way.
Large data dimension tables: When dimension tables
require a significant amount of storage space. In most
cases, the fact tables will be the ones that take most
of the space. They’ll probably also grow much faster
than dimension tables.
…
Consider using the star schema:
In data marts. Data marts are subsets of data taken out
of the central data warehouse. They are usually created
for different departments and don’t even contain all the
history data. In this setting, saving storage space is not a
priority.
Simple Analysis is needed: the star schema does simplify
analysis. This is not just about query efficiency but also
about simplifying future actions for business users.
Example: IUIU Data warehouse
Student Exam Performance:
Programme,
Lecturer,
Course Unit
Student
iuiu_warehouse Dim_student
Stud_reg_no PFK)
Dim_courseunit Age
Course_code (PK) EXAM FACT TABLE Session
Credit_units Stud_reg_no (FK) Clearance_period
Lecture_hrs Course_code (FK) Marital_status
Practical_hrs Prog_code (FK)
lecturererID (FK)
Mark_scored
Dim_lecturer
Grade Dim_programme
lecturererID (PK)
GradePT Prog_code (PK)
Max_education
Sem Duration
Percent_attendance
Acadyear Fees_amount
Total_sem_load
Experience