0% found this document useful (0 votes)

35 views70 pages

DWDM - Unit - I

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views70 pages

DWDM - Unit - I

Uploaded by

Saidulu Inamanamelluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 70

UNIT - III

Data Warehouse consists of

1. What is Data Warehouse

2. Differences between OLAP and OLTP
3. Multidimensional Data Model
4. Data Warehouse Architecture

1
1. What is Data Warehouse: Data warehouse is a
subject-oriented, integrated, time-variant, and
nonvolatile collection of data in support of
management’s decision-making process.[Bill W. H.
Inmon ]

 A Data warehouse is a copy transaction data

specifically structured for query and analysis.[Ralph
Kimball]
 Father of Data warehousing is and the term Data
Warehouse was coined by Bill W. H. Inmon and
Ralph Kimball in 1990.
 Data warehousing is the process of constructing and
2
using data warehouses.
IOT Internet of Things

Big data Analytics Deals with 3V’s big data like face book

Data Mining Extracting Meaningful data

Data warehouse Collection of Data Marts/OLAP

Data Mart Subset of a DWH

DBMS Collection of related information / OLTP

Information Processed Data

Data Raw material/facts/images

Fig: Hierarchy of Data warehousing and Data Mining

 The term "data warehouse" refers to a special type of
database that acts as the central repository for company
data.

 A data warehouse is a relational or multidimensional

database that is designed for query and analysis.

 It can be thought of as a database archive that is

segregated from the operational databases, and used
primarily for reporting and data mining purposes.

 Data warehouse enables knowledge worker to make

faster and better decisions .
4
5
 Data Warehouse can be defined as collection of Data
marts.

 Data warehousing is a collection of decision support

technologies, aimed at enabling the knowledge worker
to make better decisions.

 Requirements of a Data Warehouse system

Efficient cube computation
Better access methods
Efficient query processing

6
Data in Tape Distributed
DW with
files Mainframe DBMS
Server

7
 Characteristics are

1. Subject Oriented: Data that gives information about a particular

subject instead of about a company's ongoing operations. A data
warehouse can be used to analyze a particular subject area.
 Ex: "Sales" can be a particular subject. Customer, product,
sales, weather data, stock market.

2. Integrated: Data that is gathered into the data warehouse from a

variety of sources and merged into a coherent whole. A data
warehouse integrates data from multiple data sources. Data
cleaning and data integration techniques are applied to ensure
consistency in encoding structures, naming conventions, attribute
measures etc.
 Ex: Constructed by integrating multiple, heterogeneous data
sources, Relational databases, flat files, on-line transaction
records.
8
Process Oriented Subject Oriented

Entry

Sales
Sales
Sales Rep
Quantity Sold
Date
Customer Name Customers
Customers
Product Description
Unit Price
Mail Address
Products
Products

Transactional Storage Data Warehouse Storage

Fig: Subject Oriented

Appl. A - M, F
Encoding Appl. B - 1, 0 M, F
Appl. C - X, Y

Appl. A - pipeline cm.

Unit of Appl. B - pipeline inches pipeline cm
Attributes Appl. C - pipeline m

Appl. A - balance dec(13,2)

Integration
Physical Appl. B - balance char 9(9)V99 balance dec(13, 2)
Attributes Appl. C - balance float

Appl. A - bal-on-hand
Naming Appl. B - current_balance balance
Conventions Appl. C - balance

Appl. A - date (Julian)

Data Appl. B - date (yymmdd) date (Julian)
Consistency Appl. C - date (absolute)

Transactional Storage Data Warehouse Storage

Fig: Integrated
3. Time-variant: All data in the data warehouse is
identified with a particular time period. Historical data is
kept in a data warehouse.
 Ex: Operational database: current value data, Data warehouse
data: provide information from a historical perspective (e.g.,
past 5-10 years).

4. Non-volatile: Data is stable in a data warehouse. More

data is added but data is never removed. Once data is in
the data warehouse, it will not change.
 Ex: Initial loading of data and access of data, No update of data
allowed and Only loading and access of data operations.
11
Current Data Historical Data

Sales ( Region , Year - Year 97 - 1st Qtr)

15
Sales ( in lakhs
10 East
)
West
5 North

0
January February March
Year97

Transactional Storage Data Warehouse Storage

Fig: Time Variant

Volatile Non-Volatile

Insert Change

Delete
Access

Insert Load

Change
Access

Record-by-Record Data Manipulation Mass Load / Access of Data

Transactional Storage Data Warehouse Storage

Fig: Non - Volatile

Need of Data Warehousing

 Data warehousing is capable of storing and consolidating the

past information.
 Provides support for sophisticated multidimensional queries.
 Data warehouse uses update driven approach than query
driven approach.
 To understand current business trends and forecasting
decisions.
 To generate reporting & analysis
 Knowledge Discovery and Decision Support
 To improve the Performance of stock market data
14
DBMS, OLAP and Data Mining

DBMS(OLTP) OLAP(DW) Data Mining

 Knowledge
 Extraction of  Summaries,
discovery of
Task detailed and trends and
hidden patterns
summary data forecasts
and insights
 Insight and
Type of Result  Information  Analysis
Prediction

 Multidimensional  Induction (Build

 Deduction (Ask
data modeling, the model, apply
Method the question,
Aggregation, it to new data, get
verify with data)
Statistics the result)

 What is the 
 Who purchased Who will buy a
average income
mutual fund in
Example question mutual funds in of mutual fund
the next 6 months
the last 3 years? buyers by region
and why?
by year?
15
Benefits of a Data Warehouse

1. A Data Warehouse Delivers Enhanced Business Intelligence.

2. A Data Warehouse Saves Time.

3. A Data Warehouse Enhances Data Quality and Consistency.

4. A Data Warehouse Provides Historical Intelligence

5. A Data Warehouse Generates a High ROI(Return of Investment).

16
 DW Tools

1. Informatica - Power Center

2. IBM - Websphere DataStage, Cognos Data Manager
3. SAP - Business Objects Data Integrator
4. Microsoft - SQL Server Integration Services
5. Oracle - Data Integrator, Warehouse Builder
6. SAS - Data Integration Studio
7. AB Initio

17
Data Warehousing Tools
Tool Category Products

ETL Tools Informatica, Ab Initio, IBM Infosphere Data Stage, Oracle

Warehouse Builder, Business Objects Data Integrator etc.

OLAP Server Oracle Express Server, Oracle Essbase, IBM Cognos, SAP
Netweaver OLAP, Microsoft Analysis Services

OLAP Tools Oracle Express Suite, Oracle Essbase, Cognos Powerplay,

Business Objects. Micro Strategy

Data Warehouse Oracle, Informix, Teradata, DB2

Data Mining & SAS Enterprise Miner, IBM Intelligent Miner

Analytics
APPLICATION AREAS FOR DATA WAREHOUSING

1. Financial Data Analysis

2. Data mining in the retail industry
3. Telecommunication Industry
4. Biological Data Analysis
5. Scientific Applications
6. Intrusion Detection
7. Sales and Marketing
8. Health Care and Insurance
9. e-Commerce 19
2. OLTP(Online Transaction OLAP(Online Analytical Processing)
Processing
1. It is market oriented
1. It is customer oriented 2. Users are knowledge worker
2. Users are clerk, IT professional 3. Function is decision support
3. Function is day to day operations. 4. DB design is subject-oriented
4. DB design is application-oriented 5. Data is historical, summarized, multi
5. Data is current, up-to-date, detailed, dimensional, integrated,
flat relational, Isolated. consolidated.
6. Usage is Repetitive 6. Usage is ad-hoc
7. Access is read/write, index/hash on 7. Access requires lots of scans
primary key
8. Unit of work is short, simple 8. Unit of work is complex query
transaction.(uses 3NF) 9. No. of records accessed is
9. No. of records accessed is tens millions(Uses 2NF)
10. No. of users are thousands 10. No. of users are hundreds
11. DB size is 100MB-GB 11. DB size is 100GB-TB
12. Low processing time 12. High processing time
13. Metric is transaction throughput 13. Metric is query throughput, response.
3 - Multi-Dimensional Data Model
 The Dimensional Model was Developed for
Implementing data warehouse and data marts.
 MDDM provide both a mechanism to store data and a
way for business analysis.

 Data warehouses and OLAP tools are based on a

multidimensional data model.
 Multidimensional data model is typically used for the
design of corporate data warehouses and departmental
data marts.
 Multidimensional data model is represented through
Data cubes. 21
 The core of the multidimensional model is the data cube, which
consists of a large set of facts (or measures) and a number of
dimensions.
 Component of MDDM are two primary components of
dimensional models are Dimensions and Facts.
 Data cube consists of dimensions & measures and
multidimensional view of data is the foundation of OLAP.

 Data cube consists of a lattice of cuboids, each corresponding to a

different degree of summarization of the given multidimensional
data.
 Dimension Table: It consists of tuple of attributes of dimension, It
is simple primary key, dimensions are texture attributes to analyses
data.
 Fact Table: A Fact table contains keys to each of related dimension
tables, Facts are numeric volume to analyze business.
22
Types of Facts: These are
1. Additive Facts
2. Semi Additive Facts
3. Non- Additive Facts

1. Additive Facts: Additive facts are facts that can be

summed up through all of the dimensions in the fact table.

Ex: Number of products sold on day 1 = 500

Number of products sold on day 2 = 200
-----------
700
-----------
23
2. Semi-Additive Facts: These are the facts that can be
summed up for some of dimensions in the fact table but
not others.
Ex: Balance of company's a/c1 on day 1= 5000
Balance of company's a/c 1 on day 2= 3000

_________
8000
_________
3. Non-Additive Facts: These are the facts that can not be
summed up for any of the dimensions present in fact
table.
Ex: Profit margin for day 1= 30%
Profit margin for day 2 = 80% 24
Types of Dimensions
1. Slowly Changing Dimensions(SCD) – Dimensions that change
slowly over time rather than changing on regular schedule , time
based.
Type 1: Overwrite old value
Type 2 : Add new row
Type 3: Add new column

Id Year Name City

1 2000 James New York

Id Year Name City

1 2000 James New York
1 2004 James Claifornia

Id Start Date End Date Name City

1 1st January 2013 31st December 2013 James New York

1 1st January 2014 31st December 2014 James Claifornia

2. Conformed Dimensions – Dimension that has
exactly the same meaning and content when being
referred from different fact tables.
Ex: Products, Time, Location for Sales fact

3. Degenerate Dimensions – Dimension that don’t

require any dimension table in specific related to
fact table.
Ex: From Location to City

26
 Data warehouse is based on a multidimensional data model which
views data in the form of a data cube.
 Multi Dimensional Models are two types of views are
i. Logical view: Easy understanding for user, e.g. to formulate
queries or to understand result presentation
ii. Physical view: Storage in computer memory, access methods
Sparse vs. Dense.

 Dimensions are the entities, attributes and Dimensional modeling

elements are
Hierarchies:
Ex Mandal->District->State->India ->World
Ex: Day->Week->Month->Quarter->Half year->Year
 Facts are measures .
 Ex: Cost, Revenue, Quantity , No.of Units
27
Ex: Sales volume as a function of product, month,
and region are representing through Data Cubes.

Dimensions: Product, Location, Time

Hierarchical summarization paths

o n
gi

Industry Region Year

Category Country Quarter

Product

Product City Month Week

Office Day

Month
28
It consists of
1. Tables to Data Cubes
2. Stars, Snowflakes and Fact Constellations
3. Schemas with DMQL
4. Measures
5. Concept Hierarchies
6. OLAP Operations
7. Starnet Query Model 29
1. Tables to Data Cubes
 A Data Warehouse is based on a multidimensional data
model, which views data in the form of a data cube.

 When data is grouped or combined together in

multidimensional matrices called Data Cubes.

 A Data Cube, such as sales, allows data to be modeled and

viewed in multiple dimensions.
– Dimension tables, such as item (item_name, brand, type), or
time(day, week, month, quarter, year)

– Fact Table contains measures (such as dollars_sold) and keys to

each of the related dimension tables
30
region

Fig: Sales Product shown in Table and Cube 31

 Multidimensional data model is to view it as a cube.

 The cube on the right associates sales number (unit sold)

with dimensions-product type, market and time with the
unit variables organized as cell in an array.

 In cube as number of dimensions increases number of cubes

cell increase exponentially.

 Dimensions are hierarchical in nature i.e. time dimension

may contain hierarchies for years, quarters, months, weak
and day.
 Lattice Cuboid: An n-D base cube is called a base cuboid.
 The top most 0-D cuboid, which holds the highest-level of
summarization, is called the apex cuboid.
 The lattice of cuboids forms a data cube.

all 0-D(apex) cuboid

time item location supplier

1-D cuboids

time,location item, location location,supplier

time,item
time, supplier
2-D cuboids
item,supplier
time,location,supplier
3-D cuboids
time,item,location
time,item,supplier item,location,supplier
4-D(base) cuboid
time, item, location, supplier

Fig : Lattice Cuboid

2. Stars, Snowflakes, and Fact Constellations
 The most popular data model for a data warehouse is a
multidimensional model.

 Modeling data warehouses with dimension & measure

tables.

 There are three types of schema’s

1. Star schema
2. Snowflake schema
3. Fact constellations
34
1. Star schema: It is also known as Star Join Schema.
 It is the simplest style of data warehouse schema.
 It is called a Star Schema because the entity relationship diagram
of this Schema resembles a star, with points radiating from
central table.

 A star query is a join between a fact table and a number of

dimension table.
 Each dimension table is joined to the fact table using primary
key to foreign key join but dimension table are not joined to each
other.
 A typical fact table contain key and measure.
 In a Star schema , a dimension table will not have any parent
table.
 Each dimension table have primary key that corresponds exactly
to one of the component s of the composite key in the fact table.
 Star schema is the selfish model(Subject-oriented). 35
time
time_key item
day item_key
day_of_the_week Sales Fact Table item_name
month brand
quarter time_key type
year supplier_type
item_key
branch_key
branch location
location_key
branch_key location_key
branch_name units_sold street
branch_type city
dollars_sold state_or_province
country
avg_sales
Measures

Fig: Star Schema 36

Characteristics of Star Schema:
i. Simple structure: Easy to understand schema
ii. Great query effectives: Small number of tables to join
iii. Relatively long time of loading data into dimension tables: de-
normalization, redundancy data caused that size of the table
could be large.
iv. The most commonly used in the data warehouse
implementations: Widely supported by a large number
of business intelligence tools

Advantage of Star Schema Model

1. Provide highly optimized performance for typical star queries.
2. Provide a direct and intuitive mapping between the business
entities being analyzed by end users and the schema design.

37
2. Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a
third normal form and forms a set of smaller
dimensional tables.

 It is a combination of star schemas.

 It keeps same fact table structure as star schema.
 In the dimension, it has multiple levels with multiple
hierarchies.
 From each hierarchy of levels any one level can be
attached to fact table.
 Mostly lowest level hierarchy is attached to fact table.
 The snowflake structure can reduce the effectiveness of
browsing, since more joins will be needed to execute a
query. 38
 The snowflake schema architecture is a more complex
variation of the star schema used in a data warehouse,
because the tables which describe the dimensions are
normalized.
 The Snow Flake Schema is represented by centralized fact
table which are connected to multiple dimensions.
 The Snow Flaking effecting only affecting the dimension
tables and not the fact tables.
Benefits of Snow flake schema
1. It is easier to implement a snow flake Schema when a
multidimensional is added to the typically normalized
tables.
2. A Snow flake schema can reflect the same data to the
database. 39
time
time_key item
day item_key supplier
day_of_the_week Sales Fact Table item_name supplier_key
month brand supplier_type
quarter time_key type
year item_key supplier_key

branch_key
branch location
location_key
location_key
branch_key
units_sold street
branch_name
city_key
branch_type
dollars_sold city
city_key
avg_sales city
state_or_province
Measures country

Fig: Snowflake Schema 40

Star vs. Snowflake
Star Schema Snowflake Schema
1. Has redundant data 1. No redundancy, saves storage
2. Lower query complexity and space
easy to understand 2. More complex queries and less
3. Less number of foreign keys easy to understand
and shorter execution time 3. More foreign keys and long query
4. One simple query analysis execution time
5. Less number of joins 4. Many simple query analysis
6. Contains single dimensions 5. More joins
7. When dimension table contains 6. Contains multiple dimensions-
less rows level
8. Both dimension and fact tables 7. When dimension table is big
are de-normalized 8. Dimension tables are normalized
and fact tables are denormalised
3. Fact constellations:
 It is set of fact tables that share some dimensions tables.

 The fact constellation architecture contains multiple fact

tables that share many dimension tables.

 Multiple fact tables share dimension tables, viewed as a

collection of stars, therefore called Galaxy schema or fact
constellation.

 Fact constellation is a combination of more than one fact

table and many dimension tables.

42
time
time_key item Shipping Fact Table
day item_key
day_of_the_week Sales Fact Table item_name time_key
month brand
quarter time_key type item_key
year supplier_type shipper_key
item_key
branch_key from_location

branch location_key location to_location

branch_key location_key dollars_cost
branch_name units_sold
street
branch_type dollars_sold city units_shipped
province_or_state
avg_sales country shipper
Measures shipper_key
shipper_name
location_key
shipper_type

Fig: Fact Constellation Schema 43

3. Schemas with DMQL
 A Data Mining Query Language for Relational Databases.
 Data Mining Query Languages can be designed to support ad hoc
and interactive data mining.

 Two language primitives:

i) Cube definition
Syntax: Define cube <cube_name> <dimension _list><measure_list>.
Ex: define cube sales_star [time, item, branch,location]
dollars_sold=sum(sales_in_dollars), units_sold=count(*).

ii) Dimension definition

Syntax: define dimension <dimension_name> as
(<attribute_or_subdimension list>)
Ex: Define dimension time as (time_key, day, day_of_week, month,
quarter, year)
44
There are three schemas with DMQL syntax is

1. Defining Star Schema in DMQL: Syntax

 Sales_star [time, item, branch, location]: dollars_sold =
sum(sales_in_dollars), avg_sales = avg(sales_in_dollars),
units_sold = count(*).
2. Defining Snowflake Schema in DMQL: Syntax
 Sales_snowflake [time, item, branch, location]: dollars_sold =
sum(sales_in_dollars), avg_sales = avg(sales_in_dollars),
units_sold = count(*).
3. Defining Fact Constellation in DMQL: Syntax
 Sales_factconstellation [time, item, branch, location]:
dollars_sold = sum(sales_in_dollars), avg_sales =
avg(sales_in_dollars), units_sold = count(*).
45
4. Measures
 Measures of Data Cube: There are three Categories:

1.Distributive: if the result derived by applying the function to n

aggregate values is the same as that derived by applying the
function on all the data without partitioning
 E.g., count(), sum(), min(), max()
2.Algebraic: if it can be computed by an algebraic function with M
arguments (where M is a bounded integer), each of which is
obtained by applying a distributive aggregate function
 E.g., avg(), min_N(), standard_deviation()
3.Holistic: if there is no constant bound on the storage size needed to
describe a subaggregate.
 E.g., median(), mode(), rank()
46
5. Concept Hierarchies
 Concept hierarchies organize the values of attributes or
dimensions into gradual levels of abstraction and are
useful in mining at multiple levels of abstraction.

 A concept hierarchy defines a sequence of mappings

from a set of low-level concepts to higher-level, more
general concepts.

 Ex: Mandal->District->State->India ->World

Day->Week->Month->Quarter->Half year->Year
47
all all

region Europe ... North_America

country Germany ... Spain Canada ... Mexico

city Frankfurt ... Vancouver ... Toronto

office L. Chan ... M. Wind

Fig: A Concept Hierarchy: Dimension (location) 48

6. OLAP Operations

 On-Line Analytical Processing (OLAP) can be performed

in data warehouses/marts using the multidimensional data
model.
 Online Analytical Processing Server (OLAP) is based on
multidimensional data model. It allows the managers ,
analysts to get insight the information through fast,
consistent, interactive access to information.
 One of the most compelling front-end applications for
OLAP is a PC spreadsheet program.
 OLAP operations can be implemented efficiently using the
data cube structure.
 Typical OLAP operations include rollup, drill-(down,
across, through), slice-and-dice, pivot (rotate). 49
Fig. Typical OLAP Operations 50
OLAP Operations are
1. Roll up: It is also called as Drill-up. Summarize data-by
climbing up hierarchy or by dimension reduction. Roll-up
takes the current aggregation level of fact values and does
a further aggregation on one or more of the dimensions.
 Equivalent to doing GROUP BY to this dimension by
using attribute hierarchy.
 Decreases a number of dimensions and removes row
headers.

 Ex: SELECT [attribute list], SUM [attribute names]

FROM [table list]
WHERE [condition list]
GROUP BY [grouping list]; 51
2. Drill-down: it is also called as Roll down. It is reverse of roll-up-
from higher level summary to lower level summary or detailed
data, or introducing new dimensions
 Opposite of roll-up.
 Summarizes data at a lower level of a dimension hierarchy, thereby
viewing data in a more specialized level within a dimension.
 Increases a number of dimensions - adds new headers

3. Slice: it is defined project and Select

 Performs a selection on one dimension of the given cube, resulting
in a sub-cube.
 Reduces the dimensionality of the cubes.
 Sets one or more dimensions to specific values and keeps a subset
of dimensions for selected values.

52
4. Dice: Define a sub-cube by performing a selection of one
or more dimensions.
 Refers to range select condition on one dimension, or to
select condition on more than one dimension.
 Reduces the number of member values of one or more
dimensions.

5. Pivot it is also called as rotate. Reorient the cube,

visualization, 3D to series of 2D planes
 Rotates the data axis to view the data from different
perspectives.
 Groups data with different dimensions.

53
6. Drill across: Involving (across) more than one fact table.
 Accesses more than one fact table that is linked by
common dimensions.
 Combines cubes that share one or more dimensions.

7. Drill through: Through the bottom level of the cube to its

back-end relational tables (using SQL)
 Drill down to the bottom level of a data cube down to its
back-end relational tables.

8. Cross-tab: Spreadsheet style row/column aggregates.

54
7. Starnet Query Model
 The querying of multidimensional databases can be
based on a Starnet model.

 A starnet model consists of radial lines emanating

from a central point, where each line represents a
concept hierarchy for a dimension, each abstraction
level in the hierarchy is called a footprint.

 These represent the granularities available for use by

OLAP operations such as drill-down and roll-up.
55
Fig: Modeling business queries: a Starnet Model.
 Starnet query model for the all Electronics data
warehouse.
 Starnet consists of four radial lines, representing concept
hierarchies for the dimensions location, customer, item,
and time, respectively.

 Each line consists of footprints representing abstraction

levels of the dimension.
 A concept hierarchy may involve a single attribute or
several attributes.

Ex: The time line has four footprints: “day,” “month,”

“quarter,” and “year.”
57
Customer Orders
Shipping Method
Customer
CONTRACTS
AIR-EXPRESS

ORDER
TRUCK
PRODUCT LINE
Time Product
ANNUALY QTRLY DAILY PRODUCT ITEM PRODUCT GROUP
CITY
SALES PERSON
COUNTRY
DISTRICT

REGION
DIVISION
Location Organization
Promotion

Each circle is called a footprint

Fig: A Star-Net Query Model

4 - Data Warehouse Architecture
 Data Warehouse Architecture is a description of the elements and
services of the warehouse, with details showing how the
components will fit together and how the system will grow over
time is called “Data Warehouse Architecture.”
 DW Architecture is a structure, which comprises top tier(Front-end
tools), Middle tier(OLAP Server), and Bottom tier(DW Server).
 An integrated set of products that enable the extraction and
transformation of operational data to be loaded into a database for
end-user analysis and reporting.

 It consists of
1. Steps for the Design and Construction of DW
2. Three-Tier Data Warehouse Architecture
3. Data Warehouse Back-End Tools and Utilities
4. Metadata Repository
5. Types of OLAP Servers 59
1. Steps for the Design and Construction of Data
Warehouses : Design of Data Warehouse is a Business
Analysis Framework:

 Four views regarding the design of a data warehouse

1. Top-down view: allows selection of the relevant
information necessary for the data warehouse.
2. Data source view: exposes the information being
captured, stored, and managed by operational systems.
3. Data warehouse view: consists of fact tables and
dimension tables.
4. Business query view : sees the perspectives of data in
the warehouse from the view of end-user.

60
 Typical Data Warehouse Design Process: These are:
1. Choose a business process to model, e.g., orders,
invoices, etc.
2. Choose the grain (atomic level of data) of the business
process.
3. Choose the dimensions that will apply to each fact table
record.
4. Choose the measure that will populate each fact table
record

 Implementing a Warehouse
 Monitoring: Sending data from sources
 Integrating: Loading, cleansing,...
 Processing: Query processing, indexing, ...
 Managing: Metadata, design, ... 61
2. Three-Tier Data Warehouse Architecture

 The bottom tier is a warehouse database server that is

almost always a relational database system.
 Back-end tools and utilities are used to feed data into the
bottom tier from operational databases or other external
sources.

 The middle tier is an OLAP server that is typically

implemented using either. a relational OLAP (ROLAP)
model and a multidimensional OLAP (MOLAP) model.
 The top tier is a front-end client layer, which contains
query and reporting tools, analysis tools, and/or data
mining tools. 62
Fig: Three-Tier Data Warehouse Architecture
 There are three Data Warehouse Models are Enterprise
warehouse, Data Mart and Virtual warehouse models.

1. Enterprise Warehouse: Collects all of the information

about subjects spanning the entire organization

2. Data Mart: Subset of corporate-wide data that is of

value to a specific groups of users.

3. Virtual Warehouse: A set of views over operational

databases.

64
Multi-Tier Data
Warehouse
Distributed
Data Marts

Data Data Enterprise

Mart Mart Data
Warehouse

Model refinement Model refinement

Define a high-level corporate data model

Fig: Data Warehouse Development: A Recommended Approach

65
3. Data Warehouse Back-End Tools and Utilities
1. Data Extraction: Get data from multiple, heterogeneous,
and external sources
2. Data Cleaning: Detect errors in the data and rectify them
when possible
3. Data Transformation: Convert data from legacy or host
format to warehouse format
4. Load Data : Sort, summarize, consolidate, compute
views, check integrity, and build indices and partitions
5. Refresh the Data: Propagate the updates from the data
sources to the warehouse

66
4. Metadata Repository: (data about data) Meta data is the
data defining warehouse objects.
 Description of the structure of the data warehouse is schema, view,
dimensions, hierarchies, derived data definition, data mart
locations and contents.
 Operational meta-data is data lineage, currency of data, monitoring
information.
 The algorithms used for summarization and the mapping from
operational environment to the data warehouse.

5. OLAP Server Architectures: OLAP servers are:

1. ROLAP: Relational Online Analytical Processing
2. MOLAP: Multidimensional Online Analytical Processing
3. HOLAP: Hybrid Online Analytical Processing
67
(1) Relational OLAP (ROLAP)
 Special schema designs are star, snowflake
 Special indexes are bitmap, multi-table join
 Proven technology with relational model, DBMS and tend to
outperform specialized MDDB especially on large data sets.
 Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware.

 Example are
 Telecommunication startup: call data records (CDRs)
 Ecommerce Site
 Credit Card Company
 Products are
 IBM DB2, Oracle, Sybase IQ, RedBrick, Informix.
 Relational and specialized relational DBMS
 OLAP middleware to support missing pieces

68
(2) Multidimensional OLAP (MOLAP)
Array-based storage structures
Direct access to array data structures
Fast indexing to pre-computed summarized data
Facts stored in multi-dimensional arrays
Dimensions used to index array

Examples are
Budgeting in a financial department and
Sales analysis.

Products are Pilot, Arbor Essbase, Gentia.

69
(3) Hybrid OLAP (HOLAP)
Storing detailed data in RDBMS
Storing aggregated data in MDBMS
User access via MOLAP tools.
Flexibility, e.g., low level: relational, high-level: array

• Examples are
 Sales department of a multi-national company
 Banks and Financial Service Providers

• Products / Tools are

ORACLE 8i,10g and 11i
ORACLE Express Serve
ORACLE Relational Access Manager
ORACLE Express Clients
70

04OLAP
100% (1)
04OLAP
58 pages
Unit 01 Data Warehousing
No ratings yet
Unit 01 Data Warehousing
45 pages
Presentasi - Clinical Data Repository Data Warehouse 20240507
No ratings yet
Presentasi - Clinical Data Repository Data Warehouse 20240507
67 pages
Data Warehousing for Decision Makers
No ratings yet
Data Warehousing for Decision Makers
31 pages
Data Download HGM9510 V1.2 en
100% (1)
Data Download HGM9510 V1.2 en
64 pages
Data Warehousing and Data Mining: UNIT-1
No ratings yet
Data Warehousing and Data Mining: UNIT-1
118 pages
DATA WAREHOUSE Basic Concepts
No ratings yet
DATA WAREHOUSE Basic Concepts
26 pages
Unit-I DW - Architecture
100% (1)
Unit-I DW - Architecture
96 pages
Lec09-Data Warehousing
No ratings yet
Lec09-Data Warehousing
32 pages
Enzymes in Industrial Applications
No ratings yet
Enzymes in Industrial Applications
18 pages
Data Warehousing
No ratings yet
Data Warehousing
77 pages
FDS Unit 2
No ratings yet
FDS Unit 2
21 pages
Datascience Unit 02 1
No ratings yet
Datascience Unit 02 1
53 pages
Module 1-1basic Concepts
No ratings yet
Module 1-1basic Concepts
40 pages
Data Warehousing Concepts Guide
No ratings yet
Data Warehousing Concepts Guide
2 pages
CS2202 DataWarehouse OLAP
No ratings yet
CS2202 DataWarehouse OLAP
49 pages
Data Mining 4
No ratings yet
Data Mining 4
59 pages
Data Warehousing
100% (1)
Data Warehousing
51 pages
4-Data Warehousing and Integration in Business
No ratings yet
4-Data Warehousing and Integration in Business
39 pages
Warehouse
No ratings yet
Warehouse
60 pages
02 DataWarehousing and OLAP
No ratings yet
02 DataWarehousing and OLAP
66 pages
Data Warehouse Notes
No ratings yet
Data Warehouse Notes
41 pages
Unit-1.1 Data Warehouse
No ratings yet
Unit-1.1 Data Warehouse
29 pages
Industrial IoT Lab Manual
No ratings yet
Industrial IoT Lab Manual
33 pages
DWDM Unit 1
No ratings yet
DWDM Unit 1
122 pages
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
No ratings yet
Chap3 PIEAS DCIS BSCIS DM 23 Topic 03 DWH OLAP
46 pages
1 & 2 Data Warehousing - 021052
No ratings yet
1 & 2 Data Warehousing - 021052
80 pages
Week 02 Part 01
No ratings yet
Week 02 Part 01
15 pages
DMW Unit 1
No ratings yet
DMW Unit 1
56 pages
Data Mining Unit-2 Notes
No ratings yet
Data Mining Unit-2 Notes
8 pages
Data Warehousing for Business Analysts
100% (1)
Data Warehousing for Business Analysts
28 pages
Data Warehousing
No ratings yet
Data Warehousing
23 pages
Module 3 - Datawarehousing
No ratings yet
Module 3 - Datawarehousing
45 pages
2025-Handouts - OLAP - Lecture 1
No ratings yet
2025-Handouts - OLAP - Lecture 1
10 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
CH 1
No ratings yet
CH 1
53 pages
Dmi Unit 6
No ratings yet
Dmi Unit 6
6 pages
DWDM
No ratings yet
DWDM
15 pages
DM Module 1
No ratings yet
DM Module 1
16 pages
CH 1
No ratings yet
CH 1
65 pages
7 Data Warehousing - 1
No ratings yet
7 Data Warehousing - 1
32 pages
Bi Units F
No ratings yet
Bi Units F
53 pages
Data Warehousing and OLAP
No ratings yet
Data Warehousing and OLAP
47 pages
Unit 1
No ratings yet
Unit 1
22 pages
Data Warehousing and Mining Guide
No ratings yet
Data Warehousing and Mining Guide
46 pages
Data Warehouse
No ratings yet
Data Warehouse
4 pages
Higher Education Data Template
No ratings yet
Higher Education Data Template
3 pages
DMDW1
No ratings yet
DMDW1
13 pages
DBMS II Seven 7
No ratings yet
DBMS II Seven 7
13 pages
9 DMW Olap PPT 11.2
No ratings yet
9 DMW Olap PPT 11.2
12 pages
Advanced Database Presentation
No ratings yet
Advanced Database Presentation
11 pages
Data Warehousing AND Data Mining
No ratings yet
Data Warehousing AND Data Mining
51 pages
Bookkeeping (Second Part)
100% (3)
Bookkeeping (Second Part)
38 pages
Data Warehouse Components
No ratings yet
Data Warehouse Components
26 pages
Data Warehouse OLAP OLTP
No ratings yet
Data Warehouse OLAP OLTP
12 pages
R16 4-2 DataMining Notes UNIT-I
No ratings yet
R16 4-2 DataMining Notes UNIT-I
31 pages
HY Syllabus Class 12 - 2024-25
No ratings yet
HY Syllabus Class 12 - 2024-25
4 pages
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
100% (1)
Advance Concept in Data Bases Unit-5 by Arun Pratap Singh
82 pages
Offer Letter 2024-10-15
No ratings yet
Offer Letter 2024-10-15
10 pages
Unit 2 Data Warehousing and OLAP
No ratings yet
Unit 2 Data Warehousing and OLAP
72 pages
Data Warehousing Guide for IT Students
No ratings yet
Data Warehousing Guide for IT Students
77 pages
Industrial IoT Insights & Use Cases
100% (1)
Industrial IoT Insights & Use Cases
6 pages
DM Part 2
No ratings yet
DM Part 2
24 pages
Data Warehousing and On-Line Analytical Processing
No ratings yet
Data Warehousing and On-Line Analytical Processing
40 pages
Entrepenureship Data Template
No ratings yet
Entrepenureship Data Template
3 pages
Labour Welfare Scheme
No ratings yet
Labour Welfare Scheme
20 pages
GMP Dolphin-G Series
0% (1)
GMP Dolphin-G Series
1 page
IIOT Slideshare
No ratings yet
IIOT Slideshare
12 pages
Iot - Mid-Ii - Bit Bank
No ratings yet
Iot - Mid-Ii - Bit Bank
21 pages
IoT Applications & Examples Guide
No ratings yet
IoT Applications & Examples Guide
23 pages
Higher Education Proofs 2020-2021
No ratings yet
Higher Education Proofs 2020-2021
32 pages
DWDM - Unit - VIII
No ratings yet
DWDM - Unit - VIII
32 pages
Cse Nirf 2024
No ratings yet
Cse Nirf 2024
57 pages
Reference Manual
No ratings yet
Reference Manual
77 pages
Data Warehouse Fundamentals
No ratings yet
Data Warehouse Fundamentals
30 pages
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
No ratings yet
Rotary Valve Fast Cycle Pressure Swing Adsorption Paper
14 pages
Project Report For ME
No ratings yet
Project Report For ME
49 pages
RA100Z - Manual - I56-0508 - Indicador Visual
No ratings yet
RA100Z - Manual - I56-0508 - Indicador Visual
2 pages
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
No ratings yet
SDS - Barrier 90 - Comp. B - Marine - Protective - English (Uk) - Australia - 2524 - 30.10.2012
7 pages
Presented By: Nosheen Mehfooz M.Awais Anum Aziz M.Shayan S. Hammad S. Rameez Khalid
No ratings yet
Presented By: Nosheen Mehfooz M.Awais Anum Aziz M.Shayan S. Hammad S. Rameez Khalid
19 pages
LED LIGHTING Research Report Abstract
0% (1)
LED LIGHTING Research Report Abstract
14 pages
Corporate Banking Analysis Guide
No ratings yet
Corporate Banking Analysis Guide
38 pages
Industril Iot Unit Test 1
No ratings yet
Industril Iot Unit Test 1
1 page
Industrial IoT Laboratory
No ratings yet
Industrial IoT Laboratory
1 page
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
No ratings yet
DeltaX - Product Analyst - Job Description - Campus Hiring 2025
3 pages
Privacy Information For Installation Features Windows 7 Privacy Statement For Installation Features
No ratings yet
Privacy Information For Installation Features Windows 7 Privacy Statement For Installation Features
13 pages
Volume 3 ENG
0% (1)
Volume 3 ENG
475 pages
Flexitallic Flexpro Brochure 11-30-2017
No ratings yet
Flexitallic Flexpro Brochure 11-30-2017
8 pages
En Brochure Digital Use
No ratings yet
En Brochure Digital Use
4 pages
US NAVY Aeromedical Reference and Waiver Guide-2014
No ratings yet
US NAVY Aeromedical Reference and Waiver Guide-2014
317 pages
Property Dispute: No Forgery Found
No ratings yet
Property Dispute: No Forgery Found
1 page
Blume Expando T
No ratings yet
Blume Expando T
24 pages
Basic Conducting Online Lesson Plan 3 31
No ratings yet
Basic Conducting Online Lesson Plan 3 31
1 page
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
No ratings yet
Victoria Adaugo Onyekwere - 8109678605 - 20250102202313
43 pages
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
No ratings yet
Visualizing Association Rules: Introduction To The R-Extension Package Arulesviz
24 pages
TN206
No ratings yet
TN206
37 pages
Review Of: Generated On 2022-12-20
No ratings yet
Review Of: Generated On 2022-12-20
21 pages
Acquiring Skills in Basketball Through Observational Learning
No ratings yet
Acquiring Skills in Basketball Through Observational Learning
19 pages
Legal Procedures for Suits
No ratings yet
Legal Procedures for Suits
13 pages
D1-211 - 2020 Failure Analysis of 400 KV Insulator
No ratings yet
D1-211 - 2020 Failure Analysis of 400 KV Insulator
12 pages
Affidavit for Name Discrepancy Correction
No ratings yet
Affidavit for Name Discrepancy Correction
5 pages
CaseStudy Ch8 (3) Eng
No ratings yet
CaseStudy Ch8 (3) Eng
2 pages