Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views70 pages

DWDM - Unit - I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views70 pages

DWDM - Unit - I

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 70

UNIT - III

Data Warehouse consists of

1. What is Data Warehouse


2. Differences between OLAP and OLTP
3. Multidimensional Data Model
4. Data Warehouse Architecture

1
1. What is Data Warehouse: Data warehouse is a
subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of
management’s decision-making process.[Bill W. H.
Inmon ]

 A Data warehouse is a copy transaction data


specifically structured for query and analysis.[Ralph
Kimball]
 Father of Data warehousing is and the term Data
Warehouse was coined by Bill W. H. Inmon and
Ralph Kimball in 1990.
 Data warehousing is the process of constructing and
2
using data warehouses.
IOT Internet of Things

Big data Analytics Deals with 3V’s big data like face book

Data Mining Extracting Meaningful data

Data warehouse Collection of Data Marts/OLAP

Data Mart Subset of a DWH

DBMS Collection of related information / OLTP

Information Processed Data

Data Raw material/facts/images

Fig: Hierarchy of Data warehousing and Data Mining


 The term "data warehouse" refers to a special type of
database that acts as the central repository for company
data.

 A data warehouse is a relational or multidimensional


database that is designed for query and analysis.

 It can be thought of as a database archive that is


segregated from the operational databases, and used
primarily for reporting and data mining purposes.

 Data warehouse enables knowledge worker to make


faster and better decisions .
4
5
 Data Warehouse can be defined as collection of Data
marts.

 Data warehousing is a collection of decision support


technologies, aimed at enabling the knowledge worker
to make better decisions.

 Requirements of a Data Warehouse system


Efficient cube computation
Better access methods
Efficient query processing

6
Data in Tape Distributed
DW with
files Mainframe DBMS
Server

7
 Characteristics are

1. Subject Oriented: Data that gives information about a particular


subject instead of about a company's ongoing operations. A data
warehouse can be used to analyze a particular subject area.
 Ex: "Sales" can be a particular subject. Customer, product,
sales, weather data, stock market.

2. Integrated: Data that is gathered into the data warehouse from a


variety of sources and merged into a coherent whole. A data
warehouse integrates data from multiple data sources. Data
cleaning and data integration techniques are applied to ensure
consistency in encoding structures, naming conventions, attribute
measures etc.
 Ex: Constructed by integrating multiple, heterogeneous data
sources, Relational databases, flat files, on-line transaction
records.
8
Process Oriented Subject Oriented

Entry

Sales
Sales
Sales Rep
Quantity Sold
Date
Customer Name Customers
Customers
Product Description
Unit Price
Mail Address
Products
Products

Transactional Storage Data Warehouse Storage

Fig: Subject Oriented


Appl. A - M, F
Encoding Appl. B - 1, 0 M, F
Appl. C - X, Y

Appl. A - pipeline cm.


Unit of Appl. B - pipeline inches pipeline cm
Attributes Appl. C - pipeline m

Appl. A - balance dec(13,2)

Integration
Physical Appl. B - balance char 9(9)V99 balance dec(13, 2)
Attributes Appl. C - balance float

Appl. A - bal-on-hand
Naming Appl. B - current_balance balance
Conventions Appl. C - balance

Appl. A - date (Julian)


Data Appl. B - date (yymmdd) date (Julian)
Consistency Appl. C - date (absolute)

Transactional Storage Data Warehouse Storage

Fig: Integrated
3. Time-variant: All data in the data warehouse is
identified with a particular time period. Historical data is
kept in a data warehouse.
 Ex: Operational database: current value data, Data warehouse
data: provide information from a historical perspective (e.g.,
past 5-10 years).

4. Non-volatile: Data is stable in a data warehouse. More


data is added but data is never removed. Once data is in
the data warehouse, it will not change.
 Ex: Initial loading of data and access of data, No update of data
allowed and Only loading and access of data operations.
11
Current Data Historical Data

Sales ( Region , Year - Year 97 - 1st Qtr)

20

15
Sales ( in lakhs
10 East
)
West
5 North

0
January February March
Year97

Transactional Storage Data Warehouse Storage

Fig: Time Variant


Volatile Non-Volatile

Insert Change

Delete
Access

Insert Load

Change
Access

Record-by-Record Data Manipulation Mass Load / Access of Data

Transactional Storage Data Warehouse Storage

Fig: Non - Volatile


Need of Data Warehousing

 Data warehousing is capable of storing and consolidating the


past information.
 Provides support for sophisticated multidimensional queries.
 Data warehouse uses update driven approach than query
driven approach.
 To understand current business trends and forecasting
decisions.
 To generate reporting & analysis
 Knowledge Discovery and Decision Support
 To improve the Performance of stock market data
14
DBMS, OLAP and Data Mining

DBMS(OLTP) OLAP(DW) Data Mining


 Knowledge
 Extraction of  Summaries,
discovery of
Task detailed and trends and
hidden patterns
summary data forecasts
and insights
 Insight and
Type of Result  Information  Analysis
Prediction

 Multidimensional  Induction (Build


 Deduction (Ask
data modeling, the model, apply
Method the question,
Aggregation, it to new data, get
verify with data)
Statistics the result)

 What is the 
 Who purchased Who will buy a
average income
mutual fund in
Example question mutual funds in of mutual fund
the next 6 months
the last 3 years? buyers by region
and why?
by year?
15
Benefits of a Data Warehouse

1. A Data Warehouse Delivers Enhanced Business Intelligence.

2. A Data Warehouse Saves Time.

3. A Data Warehouse Enhances Data Quality and Consistency.

4. A Data Warehouse Provides Historical Intelligence

5. A Data Warehouse Generates a High ROI(Return of Investment).

16
 DW Tools

1. Informatica - Power Center


2. IBM - Websphere DataStage, Cognos Data Manager
3. SAP - Business Objects Data Integrator
4. Microsoft - SQL Server Integration Services
5. Oracle - Data Integrator, Warehouse Builder
6. SAS - Data Integration Studio
7. AB Initio

17
Data Warehousing Tools
Tool Category Products

ETL Tools Informatica, Ab Initio, IBM Infosphere Data Stage, Oracle


Warehouse Builder, Business Objects Data Integrator etc.

OLAP Server Oracle Express Server, Oracle Essbase, IBM Cognos, SAP
Netweaver OLAP, Microsoft Analysis Services

OLAP Tools Oracle Express Suite, Oracle Essbase, Cognos Powerplay,


Business Objects. Micro Strategy

Data Warehouse Oracle, Informix, Teradata, DB2

Data Mining & SAS Enterprise Miner, IBM Intelligent Miner


Analytics
APPLICATION AREAS FOR DATA WAREHOUSING

1. Financial Data Analysis


2. Data mining in the retail industry
3. Telecommunication Industry
4. Biological Data Analysis
5. Scientific Applications
6. Intrusion Detection
7. Sales and Marketing
8. Health Care and Insurance
9. e-Commerce 19
2. OLTP(Online Transaction OLAP(Online Analytical Processing)
Processing
1. It is market oriented
1. It is customer oriented 2. Users are knowledge worker
2. Users are clerk, IT professional 3. Function is decision support
3. Function is day to day operations. 4. DB design is subject-oriented
4. DB design is application-oriented 5. Data is historical, summarized, multi
5. Data is current, up-to-date, detailed, dimensional, integrated,
flat relational, Isolated. consolidated.
6. Usage is Repetitive 6. Usage is ad-hoc
7. Access is read/write, index/hash on 7. Access requires lots of scans
primary key
8. Unit of work is short, simple 8. Unit of work is complex query
transaction.(uses 3NF) 9. No. of records accessed is
9. No. of records accessed is tens millions(Uses 2NF)
10. No. of users are thousands 10. No. of users are hundreds
11. DB size is 100MB-GB 11. DB size is 100GB-TB
12. Low processing time 12. High processing time
13. Metric is transaction throughput 13. Metric is query throughput, response.
3 - Multi-Dimensional Data Model
 The Dimensional Model was Developed for
Implementing data warehouse and data marts.
 MDDM provide both a mechanism to store data and a
way for business analysis.

 Data warehouses and OLAP tools are based on a


multidimensional data model.
 Multidimensional data model is typically used for the
design of corporate data warehouses and departmental
data marts.
 Multidimensional data model is represented through
Data cubes. 21
 The core of the multidimensional model is the data cube, which
consists of a large set of facts (or measures) and a number of
dimensions.
 Component of MDDM are two primary components of
dimensional models are Dimensions and Facts.
 Data cube consists of dimensions & measures and
multidimensional view of data is the foundation of OLAP.

 Data cube consists of a lattice of cuboids, each corresponding to a


different degree of summarization of the given multidimensional
data.
 Dimension Table: It consists of tuple of attributes of dimension, It
is simple primary key, dimensions are texture attributes to analyses
data.
 Fact Table: A Fact table contains keys to each of related dimension
tables, Facts are numeric volume to analyze business.
22
Types of Facts: These are
1. Additive Facts
2. Semi Additive Facts
3. Non- Additive Facts

1. Additive Facts: Additive facts are facts that can be


summed up through all of the dimensions in the fact table.

Ex: Number of products sold on day 1 = 500


Number of products sold on day 2 = 200
-----------
700
-----------
23
2. Semi-Additive Facts: These are the facts that can be
summed up for some of dimensions in the fact table but
not others.
Ex: Balance of company's a/c1 on day 1= 5000
Balance of company's a/c 1 on day 2= 3000

_________
8000
_________
3. Non-Additive Facts: These are the facts that can not be
summed up for any of the dimensions present in fact
table.
Ex: Profit margin for day 1= 30%
Profit margin for day 2 = 80% 24
Types of Dimensions
1. Slowly Changing Dimensions(SCD) – Dimensions that change
slowly over time rather than changing on regular schedule , time
based.
Type 1: Overwrite old value
Type 2 : Add new row
Type 3: Add new column

Id Year Name City


1 2000 James New York

Id Year Name City


1 2000 James New York
1 2004 James Claifornia

Id Start Date End Date Name City


1 1st January 2013 31st December 2013 James New York

1 1st January 2014 31st December 2014 James Claifornia


2. Conformed Dimensions – Dimension that has
exactly the same meaning and content when being
referred from different fact tables.
Ex: Products, Time, Location for Sales fact

3. Degenerate Dimensions – Dimension that don’t


require any dimension table in specific related to
fact table.
Ex: From Location to City

26
 Data warehouse is based on a multidimensional data model which
views data in the form of a data cube.
 Multi Dimensional Models are two types of views are
i. Logical view: Easy understanding for user, e.g. to formulate
queries or to understand result presentation
ii. Physical view: Storage in computer memory, access methods
Sparse vs. Dense.

 Dimensions are the entities, attributes and Dimensional modeling


elements are
Hierarchies:
Ex Mandal->District->State->India ->World
Ex: Day->Week->Month->Quarter->Half year->Year
 Facts are measures .
 Ex: Cost, Revenue, Quantity , No.of Units
27
Ex: Sales volume as a function of product, month,
and region are representing through Data Cubes.

Dimensions: Product, Location, Time

Hierarchical summarization paths


o n
gi

Industry Region Year


Re

Category Country Quarter


Product

Product City Month Week

Office Day

Month
28
It consists of
1. Tables to Data Cubes
2. Stars, Snowflakes and Fact Constellations
3. Schemas with DMQL
4. Measures
5. Concept Hierarchies
6. OLAP Operations
7. Starnet Query Model 29
1. Tables to Data Cubes
 A Data Warehouse is based on a multidimensional data
model, which views data in the form of a data cube.

 When data is grouped or combined together in


multidimensional matrices called Data Cubes.

 A Data Cube, such as sales, allows data to be modeled and


viewed in multiple dimensions.
– Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)

– Fact Table contains measures (such as dollars_sold) and keys to


each of the related dimension tables
30
region

Fig: Sales Product shown in Table and Cube 31


 Multidimensional data model is to view it as a cube.

 The cube on the right associates sales number (unit sold)


with dimensions-product type, market and time with the
unit variables organized as cell in an array.

 In cube as number of dimensions increases number of cubes


cell increase exponentially.

 Dimensions are hierarchical in nature i.e. time dimension


may contain hierarchies for years, quarters, months, weak
and day.
 Lattice Cuboid: An n-D base cube is called a base cuboid.
 The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid.
 The lattice of cuboids forms a data cube.

all 0-D(apex) cuboid

time item location supplier


1-D cuboids

time,location item, location location,supplier


time,item
time, supplier
2-D cuboids
item,supplier
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier

Fig : Lattice Cuboid


2. Stars, Snowflakes, and Fact Constellations
 The most popular data model for a data warehouse is a
multidimensional model.

 Modeling data warehouses with dimension & measure


tables.

 There are three types of schema’s


1. Star schema
2. Snowflake schema
3. Fact constellations
34
1. Star schema: It is also known as Star Join Schema.
 It is the simplest style of data warehouse schema.
 It is called a Star Schema because the entity relationship diagram
of this Schema resembles a star, with points radiating from
central table.

 A star query is a join between a fact table and a number of


dimension table.
 Each dimension table is joined to the fact table using primary
key to foreign key join but dimension table are not joined to each
other.
 A typical fact table contain key and measure.
 In a Star schema , a dimension table will not have any parent
table.
 Each dimension table have primary key that corresponds exactly
to one of the component s of the composite key in the fact table.
 Star schema is the selfish model(Subject-oriented). 35
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures

Fig: Star Schema 36


Characteristics of Star Schema:
i. Simple structure: Easy to understand schema
ii. Great query effectives: Small number of tables to join
iii. Relatively long time of loading data into dimension tables: de-
normalization, redundancy data caused that size of the table
could be large.
iv. The most commonly used in the data warehouse
implementations: Widely supported by a large number
of business intelligence tools

Advantage of Star Schema Model


1. Provide highly optimized performance for typical star queries.
2. Provide a direct and intuitive mapping between the business
entities being analyzed by end users and the schema design.

37
2. Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
third normal form and forms a set of smaller
dimensional tables.

 It is a combination of star schemas.


 It keeps same fact table structure as star schema.
 In the dimension, it has multiple levels with multiple
hierarchies.
 From each hierarchy of levels any one level can be
attached to fact table.
 Mostly lowest level hierarchy is attached to fact table.
 The snowflake structure can reduce the effectiveness of
browsing, since more joins will be needed to execute a
query. 38
 The snowflake schema architecture is a more complex
variation of the star schema used in a data warehouse,
because the tables which describe the dimensions are
normalized.
 The Snow Flake Schema is represented by centralized fact
table which are connected to multiple dimensions.
 The Snow Flaking effecting only affecting the dimension
tables and not the fact tables.
Benefits of Snow flake schema
1. It is easier to implement a snow flake Schema when a
multidimensional is added to the typically normalized
tables.
2. A Snow flake schema can reflect the same data to the
database. 39
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country

Fig: Snowflake Schema 40


Star vs. Snowflake
Star Schema Snowflake Schema
1. Has redundant data 1. No redundancy, saves storage
2. Lower query complexity and space
easy to understand 2. More complex queries and less
3. Less number of foreign keys easy to understand
and shorter execution time 3. More foreign keys and long query
4. One simple query analysis execution time
5. Less number of joins 4. Many simple query analysis
6. Contains single dimensions 5. More joins
7. When dimension table contains 6. Contains multiple dimensions-
less rows level
8. Both dimension and fact tables 7. When dimension table is big
are de-normalized 8. Dimension tables are normalized
and fact tables are denormalised
3. Fact constellations:
 It is set of fact tables that share some dimensions tables.

 The fact constellation architecture contains multiple fact


tables that share many dimension tables.

 Multiple fact tables share dimension tables, viewed as a


collection of stars, therefore called Galaxy schema or fact
constellation.

 Fact constellation is a combination of more than one fact


table and many dimension tables.

42
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location


branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type

Fig: Fact Constellation Schema 43


3. Schemas with DMQL
 A Data Mining Query Language for Relational Databases.
 Data Mining Query Languages can be designed to support ad hoc
and interactive data mining.

 Two language primitives:


i) Cube definition
Syntax: Define cube <cube_name> <dimension _list><measure_list>.
Ex: define cube sales_star [time, item, branch,location]
dollars_sold=sum(sales_in_dollars), units_sold=count(*).

ii) Dimension definition


Syntax: define dimension <dimension_name> as
(<attribute_or_subdimension list>)
Ex: Define dimension time as (time_key, day, day_of_week, month,
quarter, year)
44
There are three schemas with DMQL syntax is

1. Defining Star Schema in DMQL: Syntax


 Sales_star [time, item, branch, location]: dollars_sold =
sum(sales_in_dollars), avg_sales = avg(sales_in_dollars),
units_sold = count(*).
2. Defining Snowflake Schema in DMQL: Syntax
 Sales_snowflake [time, item, branch, location]: dollars_sold =
sum(sales_in_dollars), avg_sales = avg(sales_in_dollars),
units_sold = count(*).
3. Defining Fact Constellation in DMQL: Syntax
 Sales_factconstellation [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars), avg_sales =
avg(sales_in_dollars), units_sold = count(*).
45
4. Measures
 Measures of Data Cube: There are three Categories:

1.Distributive: if the result derived by applying the function to n


aggregate values is the same as that derived by applying the
function on all the data without partitioning
 E.g., count(), sum(), min(), max()
2.Algebraic: if it can be computed by an algebraic function with M
arguments (where M is a bounded integer), each of which is
obtained by applying a distributive aggregate function
 E.g., avg(), min_N(), standard_deviation()
3.Holistic: if there is no constant bound on the storage size needed to
describe a subaggregate.
 E.g., median(), mode(), rank()
46
5. Concept Hierarchies
 Concept hierarchies organize the values of attributes or
dimensions into gradual levels of abstraction and are
useful in mining at multiple levels of abstraction.

 A concept hierarchy defines a sequence of mappings


from a set of low-level concepts to higher-level, more
general concepts.

 Ex: Mandal->District->State->India ->World

Day->Week->Month->Quarter->Half year->Year
47
all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

Fig: A Concept Hierarchy: Dimension (location) 48


6. OLAP Operations

 On-Line Analytical Processing (OLAP) can be performed


in data warehouses/marts using the multidimensional data
model.
 Online Analytical Processing Server (OLAP) is based on
multidimensional data model. It allows the managers ,
analysts to get insight the information through fast,
consistent, interactive access to information.
 One of the most compelling front-end applications for
OLAP is a PC spreadsheet program.
 OLAP operations can be implemented efficiently using the
data cube structure.
 Typical OLAP operations include rollup, drill-(down,
across, through), slice-and-dice, pivot (rotate). 49
Fig. Typical OLAP Operations 50
OLAP Operations are
1. Roll up: It is also called as Drill-up. Summarize data-by
climbing up hierarchy or by dimension reduction. Roll-up
takes the current aggregation level of fact values and does
a further aggregation on one or more of the dimensions.
 Equivalent to doing GROUP BY to this dimension by
using attribute hierarchy.
 Decreases a number of dimensions and removes row
headers.

 Ex: SELECT [attribute list], SUM [attribute names]


FROM [table list]
WHERE [condition list]
GROUP BY [grouping list]; 51
2. Drill-down: it is also called as Roll down. It is reverse of roll-up-
from higher level summary to lower level summary or detailed
data, or introducing new dimensions
 Opposite of roll-up.
 Summarizes data at a lower level of a dimension hierarchy, thereby
viewing data in a more specialized level within a dimension.
 Increases a number of dimensions - adds new headers

3. Slice: it is defined project and Select


 Performs a selection on one dimension of the given cube, resulting
in a sub-cube.
 Reduces the dimensionality of the cubes.
 Sets one or more dimensions to specific values and keeps a subset
of dimensions for selected values.

52
4. Dice: Define a sub-cube by performing a selection of one
or more dimensions.
 Refers to range select condition on one dimension, or to
select condition on more than one dimension.
 Reduces the number of member values of one or more
dimensions.

5. Pivot it is also called as rotate. Reorient the cube,


visualization, 3D to series of 2D planes
 Rotates the data axis to view the data from different
perspectives.
 Groups data with different dimensions.

53
6. Drill across: Involving (across) more than one fact table.
 Accesses more than one fact table that is linked by
common dimensions.
 Combines cubes that share one or more dimensions.

7. Drill through: Through the bottom level of the cube to its


back-end relational tables (using SQL)
 Drill down to the bottom level of a data cube down to its
back-end relational tables.

8. Cross-tab: Spreadsheet style row/column aggregates.

54
7. Starnet Query Model
 The querying of multidimensional databases can be
based on a Starnet model.

 A starnet model consists of radial lines emanating


from a central point, where each line represents a
concept hierarchy for a dimension, each abstraction
level in the hierarchy is called a footprint.

 These represent the granularities available for use by


OLAP operations such as drill-down and roll-up.
55
Fig: Modeling business queries: a Starnet Model.
 Starnet query model for the all Electronics data
warehouse.
 Starnet consists of four radial lines, representing concept
hierarchies for the dimensions location, customer, item,
and time, respectively.

 Each line consists of footprints representing abstraction


levels of the dimension.
 A concept hierarchy may involve a single attribute or
several attributes.

Ex: The time line has four footprints: “day,” “month,”


“quarter,” and “year.”
57
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS

ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT

REGION
DIVISION
Location Organization
Promotion

Each circle is called a footprint

Fig: A Star-Net Query Model


4 - Data Warehouse Architecture
 Data Warehouse Architecture is a description of the elements and
services of the warehouse, with details showing how the
components will fit together and how the system will grow over
time is called “Data Warehouse Architecture.”
 DW Architecture is a structure, which comprises top tier(Front-end
tools), Middle tier(OLAP Server), and Bottom tier(DW Server).
 An integrated set of products that enable the extraction and
transformation of operational data to be loaded into a database for
end-user analysis and reporting.

 It consists of
1. Steps for the Design and Construction of DW
2. Three-Tier Data Warehouse Architecture
3. Data Warehouse Back-End Tools and Utilities
4. Metadata Repository
5. Types of OLAP Servers 59
1. Steps for the Design and Construction of Data
Warehouses : Design of Data Warehouse is a Business
Analysis Framework:

 Four views regarding the design of a data warehouse


1. Top-down view: allows selection of the relevant
information necessary for the data warehouse.
2. Data source view: exposes the information being
captured, stored, and managed by operational systems.
3. Data warehouse view: consists of fact tables and
dimension tables.
4. Business query view : sees the perspectives of data in
the warehouse from the view of end-user.

60
 Typical Data Warehouse Design Process: These are:
1. Choose a business process to model, e.g., orders,
invoices, etc.
2. Choose the grain (atomic level of data) of the business
process.
3. Choose the dimensions that will apply to each fact table
record.
4. Choose the measure that will populate each fact table
record

 Implementing a Warehouse
 Monitoring: Sending data from sources
 Integrating: Loading, cleansing,...
 Processing: Query processing, indexing, ...
 Managing: Metadata, design, ... 61
2. Three-Tier Data Warehouse Architecture

 The bottom tier is a warehouse database server that is


almost always a relational database system.
 Back-end tools and utilities are used to feed data into the
bottom tier from operational databases or other external
sources.

 The middle tier is an OLAP server that is typically


implemented using either. a relational OLAP (ROLAP)
model and a multidimensional OLAP (MOLAP) model.
 The top tier is a front-end client layer, which contains
query and reporting tools, analysis tools, and/or data
mining tools. 62
Fig: Three-Tier Data Warehouse Architecture
 There are three Data Warehouse Models are Enterprise
warehouse, Data Mart and Virtual warehouse models.

1. Enterprise Warehouse: Collects all of the information


about subjects spanning the entire organization

2. Data Mart: Subset of corporate-wide data that is of


value to a specific groups of users.

3. Virtual Warehouse: A set of views over operational


databases.

64
Multi-Tier Data
Warehouse
Distributed
Data Marts

Data Data Enterprise


Mart Mart Data
Warehouse

Model refinement Model refinement

Define a high-level corporate data model

Fig: Data Warehouse Development: A Recommended Approach


65
3. Data Warehouse Back-End Tools and Utilities
1. Data Extraction: Get data from multiple, heterogeneous,
and external sources
2. Data Cleaning: Detect errors in the data and rectify them
when possible
3. Data Transformation: Convert data from legacy or host
format to warehouse format
4. Load Data : Sort, summarize, consolidate, compute
views, check integrity, and build indices and partitions
5. Refresh the Data: Propagate the updates from the data
sources to the warehouse

66
4. Metadata Repository: (data about data) Meta data is the
data defining warehouse objects.
 Description of the structure of the data warehouse is schema, view,
dimensions, hierarchies, derived data definition, data mart
locations and contents.
 Operational meta-data is data lineage, currency of data, monitoring
information.
 The algorithms used for summarization and the mapping from
operational environment to the data warehouse.

5. OLAP Server Architectures: OLAP servers are:


1. ROLAP: Relational Online Analytical Processing
2. MOLAP: Multidimensional Online Analytical Processing
3. HOLAP: Hybrid Online Analytical Processing
67
(1) Relational OLAP (ROLAP)
 Special schema designs are star, snowflake
 Special indexes are bitmap, multi-table join
 Proven technology with relational model, DBMS and tend to
outperform specialized MDDB especially on large data sets.
 Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware.

 Example are
 Telecommunication startup: call data records (CDRs)
 Ecommerce Site
 Credit Card Company
 Products are
 IBM DB2, Oracle, Sybase IQ, RedBrick, Informix.
 Relational and specialized relational DBMS
 OLAP middleware to support missing pieces

68
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
Fast indexing to pre-computed summarized data
Facts stored in multi-dimensional arrays
Dimensions used to index array

Examples are
Budgeting in a financial department and
Sales analysis.

Products are Pilot, Arbor Essbase, Gentia.

69
(3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools.
Flexibility, e.g., low level: relational, high-level: array

• Examples are
 Sales department of a multi-national company
 Banks and Financial Service Providers

• Products / Tools are


ORACLE 8i,10g and 11i
ORACLE Express Serve
ORACLE Relational Access Manager
ORACLE Express Clients
70

You might also like