Data Warehouse Implementation
Madava Viranjan
You will learn
• Cube operations
• Materialization
• Bit Map Indexing
• Join Index
• ROLAP, MOLAP, HOLAP Sever Architectures
Data Warehouse
Implementation
• Data Warehouse contains huge volume
of data. OLAP Servers should be able to
answer OLAP queries in seconds.
• Pre-computation of all or part of a data
cube can greatly enhance the
performance.
• It is challenging task as it requires
substantial computation time and
storage space.
Data Warehouse
Implementation
contd.
• What is the purpose
of GROUP BY in SQL?
• Queries
– Compute the
sum of sales
group by City &
Item
– Compute the
sum of sales
group by City
• Total Number of
Cuboids or Group By?
The compute cube Operator
• This operator computes aggregations over all subsets of the dimensions
specified in the operation
– compute cube sales_cube
• Proposed and studied by Gray et al
The Curse of Dimensionality
• Pre-computation of most, if not all, cuboids are required.
• Storage is the issue.
• When dimensions have concept hierarchies' things get worse.
No Materialization
• Do not precompile any non
based cuboids
Full Materialization
Cube
• Precompute all cuboids
Materialization
Partial Materialization
• Selectively compute a proper
subset of the all possible
cuboids
Full Cube
• Base Cell: Cell in the base cuboid
– { Colombo, Computer, 2018, 15000}
• Aggregate Cell: Cell from non based cuboid
– { Colombo, * , 2018, 150000}
• Compute all the cells of all the cuboids
Full Cube Contd.
• Multi dimensional array-based cube construction is used.
• Best performance in query processing
• Many Cells may little or no interest in query processing
Iceberg Cube
• Partially materialized cube
• Threshold values defined what to be included.
compute cube sales iceberg as
select month, city, customer group, count(*)
from salesInfo
cube by month, city, customer group
having count(*) >D min sup
Bitmap Indexing
• Record ID is used
• Distinct Bit Vector is defined for each value in attribute domain
• If the attribute has the value defined in the database row, then the bit
representing that value will set to 1.
Bitmap Indexing
RID Item City
R1 Computer Colombo
R2 Phone Colombo
R3 Computer Gampaha
R4 Home Ent. Colombo
R5 Computer Colombo
R6 Phone Gampaha
Base table RID H C P S
R1 0 1 0 0
R2 0 0 1 0
R3 0 1 0 0
R4 1 0 0 0
R5 0 1 0 0
Item bitmap index R6 0 0 1 0
table
Bitmap Indexing
• Comparison, aggregation, join kind of operations reduced to bit arithmetic.
• Storage space is saved due to bits.
Join Indexing
• Registers the joinable rows of two relations from a relational database.
• Start schema has significant benefit.
Join Indexing
Location SalesKey
…… ……
Main T57
Street
Main T238
Street
Main T884
Street
…… ……
Start Schema Fact and Dimensions Join Index Location and Sales
Join Indexing
Item SalesKey
…… ……
Sony-TV T57
Sony-TV T459
…… ……
…… ……
Start Schema Fact and Dimensions Join Index Item and Sales
Join Indexing
Item Location SalesKey
…… ……
Sony-TV Main T57
Street
Start Schema Fact and Dimensions
…… ……
…… ……
Join index table for location and item to sales
OLAP Server Architectures
• Business users want data to be stored in multi dimensional way.
• Physical implementation needs to consider storage issues.
• Three types
– Relational OLAP (ROLAP) Servers
– Multidimensional OLAP (MOLAP) Servers
– Hybrid OLAP (HOLAP) Servers
ROLAP Servers
• Intermediate Servers between relational servers and client tools
• Relational or extended relational DBMS to data store OLAP middleware for
rest.
• Greater scalability
• Decision Support Systems mostly used ROLAP Servers
MOLAP Servers
• Supports multi dimensional data views
• Array based multi dimensional storage engines
• Faster computation
HOLAP Servers
• Combines ROLAP and MOLAP architectures
• Large volume of detailed data stored in relational database while
aggregations are kept in separate MOLAP Servers
• Greater scalability from ROLAP and faster computation from MOLAP