Data Warehousing
Lecture-3
Introduction and Background
1
Introduction and Background
2
What is a Data Warehouse ?
It is a blend of many technologies, the basic
concept being:
Take all data from different operational systems.
If necessary, add relevant data from industry.
Transform all data and bring into a uniform
format.
Integrate all data as a single entity.
3
What is a Data Warehouse ? (Cont…)
It is a blend of many technologies, the basic
concept being:
Store data in a format supporting easy access for
decision support.
Create performance enhancing indices.
Run ad-hoc queries.
4
How is it Different?
Fundamentally different
Business user
needs info
Answers result
User requests
in more questions
IT people
?
Business user
may get answers
IT people do
system analysis
and design
IT people
send reports to IT people
business user create reports
5
Data warehouse
Data warehouse is a subject oriented,
integrated, non volatile, time variant
collection of data in support of management
decision.
6
DWH-Ahsan Abdullah
How is it Different?
Different patterns of hardware utilization
100%
0%
Operational DWH
Bus Service vs. Train
7
How is it Different?
Combines operational and historical data.
Don’t do data entry into a DWH, OLTP or ERP are the
source systems.
OLTP systems don’t keep history, cant get balance
statement more than a year old.
DWH keep historical data, even of bygone customers. Why?
In the context of bank, want to know why the customer left?
What were the events that led to his/her leaving?
8
How much history?
Depends on:
Industry.
Cost of storing historical data.
Economic value of historical data.
9
How much history?
Industries and history
Telecomm calls are much much more as compared to
bank transactions- 18 months.
Retailers interested in analyzing yearly seasonal
patterns- 65 weeks.
Insurance companies want to do actuary analysis, use
the historical data in order to predict risk- 7 years.
10
How is it Different?
Usually (but not always) periodic or batch
updates rather than real-time.
DWH is for strategic decision making based on historical
data. Wont hurt if transactions of last one hour/day are
absent.
11
How is it Different?
Rate of update depends on:
volume of data,
nature of business,
cost of keeping historical data,
benefit of keeping historical data.
12