22-01-10
Seminar
Muhammad Tariq
In the Beginning, life was
simple…
1
22-01-10
But …
Our Information Needs…
2
22-01-10
Kept growing. (The Spider web)
A decision support database that is
maintained separately from the organization’s
operational databases.
A data warehouse is a
subject-oriented,
integrated,
time-varying,
non-volatile
collection of data that is used primarily in
organizational decision making
3
22-01-10
“A collection of integrated, subjectoriented
databases designed to supply the
information required for decisionmaking.”
-- W. Inmon (1992)
4
22-01-10
Data Warehouse is designed around
“subjects” rather than processes
A company may have
Retail Sales System
Outlet Sales System
Catalog Sales System
DW will have a Sales Subject Area
5
22-01-10
Heterogeneous Source Systems
Little or no control
Need to Integrate source data
For Example: Product codes could be different
in different systems
Arrive at common code in DW
6
22-01-10
Most business analysis
has a time component
Trend Analysis (historical
data is required)
7
22-01-10
Used For …
Making consolidated reports
Finding relationships and correlations
Data mining
Examples
Banks identifying credit risks
Insurance companies searching for fraud
Medical research
How Do Data Warehouses Differ
From Operational Systems?
Goals
Structure
Size
Performance optimization
Technologies used
8
22-01-10
Operational v/s Information
System
Features Operational Information
Characteristics Operational processing Informational processing
Orientation Transaction Analysis
User Clerk,DBA,database Knowledge workers
professional
Function Day to day operation Decision support
Data Current Historical
View Detailed,flat relational Summarized,
multidimensional
DB design Application oriented Subject oriented
Unit of work Short ,simple transaction Complex query
Access Read/write Mostly read
Operational v/s Information
System
Features Operational Information
Focus Data in Information out
Number of records tens millions
accessed
Number of users thousands hundreds
DB size 100MB to GB 100 GB to TB
Priority High performance,high High flexibility,end-
availability user autonomy
Metric Transaction throughput Query througput
9
22-01-10
A logical design technique that seeks to
eliminate data redundancy
Illuminates the microscopic relationships
among data elements
Perfect for OLTP systems
Responsible for success of transaction
processing in Relational Databases
10
22-01-10
ER models are NOT suitable for DW?
End user cannot understand or remember
an ER Model
Many DWs have failed because of overly
complex ER designs
Not optimized for complex, ad-hoc queries
Data retrieval becomes difficult due to
normalization
Browsing becomes difficult
Facts are stored in FACT Tables
Dimensions are stored in DIMENSION
tables
Dimension tables contains textual
descriptors of business
Fact and dimension tables form a Star
Schema
“BIG” fact table in center surrounded by
“SMALL” dimension tables
11
22-01-10
Measures or facts
Facts are “numeric” & “additive”
For example; Sale Amount, Sale Units
Factors or dimensions
Star Schemas
Snowflake & Starflake Schemas
12
22-01-10
Data Extraction
Data Cleaning
Data Transformation
Convert from legacy/host format to warehouse
format
Load
Sort, summarize, consolidate, compute views,
check integrity, build indexes, partition
Consumes 70-80% of project time
Heterogeneous Source Systems
Little or no control over source systems
Source systems scattered
Different currencies, measurement units
Ensuring data quality
13
22-01-10
Commercial tools:
Warehouse Builders (Oracle)
MS Data Transformation Services
DataStage
SAS ETL Server
Typical functions
Define source, query (run SQL), define
transformation, define target, verify
transformation, schedule run, audit report
Tools
Query & reporting
OLAP
Data mining, visualization, segmentation,
clustering
New developments: text mining, web mining &
personalization
Mining multimedia data
14
22-01-10
Commercial tools
Crystal Report, Impromptu, WebFocus
Increasingly common mode of delivery:
Web-enabled
Useful URLs
Ralph Kimball’s home page
http://www.rkimball.com
Larry Greenfield’s Data Warehouse Information
Center
http://pwp.starnetinc.com/larryg/
Data Warehousing Institute
http://www.dw-institute.com/
OLAP Council
http://www.olapcouncil.com/
30
15
22-01-10
Thank you
16