Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
26 views15 pages

Lec 04

cs685

Uploaded by

Akash Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views15 pages

Lec 04

cs685

Uploaded by

Akash Verma
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

CS685: Data Mining

Data Warehousing

Arnab Bhattacharya
[email protected]

Computer Science and Engineering,


Indian Institute of Technology, Kanpur
http://web.cse.iitk.ac.in/~cs685/

1st semester, 2018-19


Mon, Thu 1030-1145 at RM101

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 1/7


Data Warehousing

A data warehouse is a data storage system, usually separate from the


original database

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 2/7


Data Warehousing

A data warehouse is a data storage system, usually separate from the


original database
It has four important features
1 Subject-oriented: It is modeled around subjects, e.g., sales,
customers, etc.
2 Integrated: It organizes information from multiple sources into a
single storage
3 Time-variant: It stores information across different time points
4 Non-volatile: It stores data permanently and requires only two
operations, construction and access

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 2/7


Data Warehousing

A data warehouse is a data storage system, usually separate from the


original database
It has four important features
1 Subject-oriented: It is modeled around subjects, e.g., sales,
customers, etc.
2 Integrated: It organizes information from multiple sources into a
single storage
3 Time-variant: It stores information across different time points
4 Non-volatile: It stores data permanently and requires only two
operations, construction and access
A data warehouse is a semantically consistent data store that serves
as a physical implementation of a decision support model
Data warehousing is the process of constructing and using data
warehouses
Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 2/7
Data Warehouse Model

A data warehouse is modeled as a multidimensional data model or


data cube
Dimensions of a data cube are attributes important for that analysis
Each dimension has a corresponding dimension table that stores
metadata about the dimension
Numeric values about the subject of the data warehouse arefacts
The fact table stores information about them

E 90 75 95
A 80 45
S3
P 60 60 S2
S1
C1 C2 C3 C4

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 3/7


Cuboids

Any subset of a data cube is a cuboid


It is essentially the result of “group by” operator

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 4/7


Cuboids

Any subset of a data cube is a cuboid


It is essentially the result of “group by” operator
All cuboids together form a lattice of cuboids
Base cuboid: no summarization, at level nD
Apex cuboid: full summarization, at level 0D

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 4/7


Cube Operations

compute cube operator computes aggregation over all subsets of


dimensions specified
For example, specifying the dimensions as item, time and loc, the
cuboids computed are (item, time, loc), (item, time), (time, loc),
(loc, item), (item), (time), (loc) and ()
Total of 2n cuboids
() implies empty group by, i.e., dimensions are not grouped

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 5/7


Cube Operations

compute cube operator computes aggregation over all subsets of


dimensions specified
For example, specifying the dimensions as item, time and loc, the
cuboids computed are (item, time, loc), (item, time), (time, loc),
(loc, item), (item), (time), (loc) and ()
Total of 2n cuboids
() implies empty group by, i.e., dimensions are not grouped
Cuboids can be pre-computed and materialized
No materialization: No non-base cuboid is precomputed
Full materialization: Full cube is precomputed
Partial materialization: Some subcubes are precomputed based on
usage and storage
Iceberg cube: computes those subcubes whose size (number of
tuples) is above a threshold
Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 5/7
OLAP Operations

OLAP stands for online analytical processing


OLTP stands for online transactional processing

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 6/7


OLAP Operations

OLAP stands for online analytical processing


OLTP stands for online transactional processing
Different operations
Roll up (drill up): Summarize by going up the level
Drill down (roll down): Go down the level
Slice: Project operation; on only one dimension
Dice: Select operation; on more than one dimensions
Pivot (rotate): Rotate for better or alternate visualization
Drill across: Summarize across different fact tables
Drill through: Access underlying relational data through base cuboids

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 6/7


OLAP Operations

OLAP stands for online analytical processing


OLTP stands for online transactional processing
Different operations
Roll up (drill up): Summarize by going up the level
Drill down (roll down): Go down the level
Slice: Project operation; on only one dimension
Dice: Select operation; on more than one dimensions
Pivot (rotate): Rotate for better or alternate visualization
Drill across: Summarize across different fact tables
Drill through: Access underlying relational data through base cuboids
How is OLAP related to data mining?

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 6/7


OLAP Operations

OLAP stands for online analytical processing


OLTP stands for online transactional processing
Different operations
Roll up (drill up): Summarize by going up the level
Drill down (roll down): Go down the level
Slice: Project operation; on only one dimension
Dice: Select operation; on more than one dimensions
Pivot (rotate): Rotate for better or alternate visualization
Drill across: Summarize across different fact tables
Drill through: Access underlying relational data through base cuboids
How is OLAP related to data mining?
It essentially facilitates data analysis by efficiently providing
summaries, projections, etc.

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 6/7


OLAP implementation

Different server models to implement OLAP operations


Relational OLAP (ROLAP): Uses a relational database backend
Multidimensional OLAP (MOLAP): Uses multidimensional arrays
Hybrid OLAP (HOLAP): Hybrid system that tries to exploit scalability
of ROLAP in lower levels and efficiency of MOLAP in higher levels

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 7/7


OLAP implementation

Different server models to implement OLAP operations


Relational OLAP (ROLAP): Uses a relational database backend
Multidimensional OLAP (MOLAP): Uses multidimensional arrays
Hybrid OLAP (HOLAP): Hybrid system that tries to exploit scalability
of ROLAP in lower levels and efficiency of MOLAP in higher levels
For data mining, OLAM systems
OLAM stands for online analytical mining
Integrates data mining operations directly into OLAP systems

Arnab Bhattacharya ([email protected]) CS685: Warehousing 2018-19 7/7

You might also like