ON-LINE ANALYTICAL PROCESSING
-
Analyzing Data Resources
ADITI PAUL
MCS/08/20
REGISTRATION NO – 003834 OF 2008
POST GRADUATE DEPARTMENT OF COMPUTER
SCIENCE
ST.XAVIERS COLLEGE (AUTONOMOUS)
WHAT IS OLAP ?
Basic idea:
Quickly answer multi-dimensional analytical queries.
Convert data into information that decision makers
need
It is a continuous , iterative, and preferably
interactive process.
WHO USES OLAP ?
It is used in an organization to carry out the
different ORGANIZATIONAL FUNCTIONS in :
• Finance departments
• Sales analysis and forecasting
• Marketing departments
» Cardinal Goal
“ Provide managers with the information they
need to make effective decisions ”
Understanding
Online Analytical Processing - OLAP
3 part description
Part 1 – Online
Part 2 – Analytical
Part 3 – Processing
PART 1 – ONLINE
FLASH BACK
Data Stored in a Database
TYPE 1
Operational Data
• Data that “works”.
• Frequent Updates and Queries.
• Normalized for efficient search and updates.
• Fragmented & local relevance.
• Point Queries .
Examples of Operational Data
• Account Details of a Customer in a Bank
• Student Details in a College/School Database
• Employee Records
• Etc.
Example Queries on Operational Data
• What is the salary of Mr.Chatterjee ? ( point
query)
• What is the address and phone number of the
person in charge of the hardware department ?
• How many students have received an
“distinction” credential in the latest exam?
Operational Data pertain to what we call
“ONLINE TRANSACTION PROCESSING”
As the name suggests these sorts of data are used
for day to day ‘operations’ like data entry
/retrieval .
For example : An ATM is a commercial online
transaction system.
Types of Data in a Database
Type 2
Historical Data
Data that “tells”.
Very Infrequent updates.
Integrated data set with global relevance.
Analytical queries that require huge amounts of
aggregation.
Performance issues mainly in query response time.
Examples of Historical Data
• Last set of 10 transactions on a particular bank
account of a customer
• Record of sales of a product in the last 15
years in a company’s database
• The profits incurred by a company stored
month wise in a whole fiscal year.
Example Queries on Historical Data
• How is the student marks percentage scene
changing over the years in college?
• Is there a correlation between the geographical
location of a company unit and excellent
employee appraisals?
• How is the employee attrition changing over
the years across the company?
Historical Data pertains to the phenomenon that is
“Online Analytical Processing”
where queries thus do not just depend on seeing
one part of a tuple .
For example to find out the employee attrition, we
have to find out some aggregate employee attrition
and then map it against time. Thus these queries
require “analyzing” certain facts and then
producing a correct output .
The necessity that these queries be ONLINE
means that the queries need to be responded to
in an
“ONLINE INTERACTIVE RESPONSE TIME”
as the waiting time of users is of the order of a
few seconds.
The differences Between OLAP and OLTP thus are
PART 2 - ANALYTICAL
Analysis of the Data
In order to “Analyze” this Historical Data , it needs to
be stored in a certain formatted and organized manner.
This is accomplished by a Data Warehouse.
• Data warehouse is an infrastructure to manage
historical data from various sources.
• It is designed to support OLAP Queries involving
gratuitous use of aggregation.
• Subject Oriented , Integrated ,Time-Variant and Non
Volatile collection of data in support of management’s
decision making process.
WAREHOUSING SCHEMATIC DATA
DIAGRAM
Dimensions of Data Warehouse Modeling
Measures –
• Key performance indicator that we want to evaluate.
• Typically numerical , including volume, sales and cost.
• A Rule of Thumb : if a number makes(business) sense
when aggregated, then it is a measure.
• Affects what should be stored in Data Warehouse.
• Example : Aggregate daily volume to month , quarter
and year
Dimensions –
• Categories of data analysis
• Typical dimensions include product, time, region.
• A Rule of Thumb : when a report is requested “by”
something, that something is usually a dimension.
• Example :In sales report , view sales by month,by
region,so the two dimensions needed are time and region.
Dimensions and measures are
physically represented by a STAR
SCHEMA.
• The Data Model Which is adhered to while
handling Historical Data to populate a Data
Warehouse is a
“MULTIDIMENSIONAL DATA MODEL.”
• One way to look at a multidimensional data
model is to view it as a CUBE.
CUBE
• It is a data structure that allows fast analysis of
data.
• It can also be defined as the capability of
manipulating and analyzing data from multiple
perspectives.
BASIC STRUCTURE OF A CUBE
The response time
of the
multidimensiona-l
query still depends
on how many cells
have to be added on
the fly
• n-D base cube is called a BASE CUBOID. The top
most 0-D cuboid, which holds the highest-level of
summarization, is called the APEX CUBOID. The
lattice of cuboids forms a data CUBE.
PART 3 - PROCESSING
PROCESSING DATA TO INFORMATION
Now that we have the Required Data in the
Requisite form , how do we get the Desired output
to a Query which requires analyzing of the data?
This is Accomplished by
• OLAP Operations
• OLAP Functions
• SQL Extensions for OLAP.
OLAP OPERATIONS
• Dimension Tables
• Market (Market_ID, City , Region)
• Product (Product_ID,Name,Category,Price)
• Time(Time_ID,Week,Month,Quarter)
• Fact table
• Sales(Market_ID, Product_ID,Time_ID,Amount)
OLAP OPERATIONS
1. Aggregation – doing the ‘total’ of a measure
over one or more dimensions.
MARKET ID CITY REGION TIME ID WEEK MONTH QUARTER
M1 KOLKATA EAST T1 1 JAN 1
M2 KOLKATA EAST T2 1 JUNE 2
M3 DELHI NORTH
PRODUC NAME CATEGO PRICE
T ID RY
P1 FROOTI DRINKS 20
P2 LIMCA DRINKS 30
MARKET_I PRODUCT TIME_ID AMOUNT
D _ID
M1 P1 T1 250
QUERY :
M1 P1 T2 350
Find the Total Sales
M2 P2 T1 400 (over time) of
M2 P2 T2 200 each product in each
market
M3 P1 T1 100
M3 P2 T1 50
MARKET ID PRODCT ID SUM(AMT)
SELECT Market_ID
M1 P1 600
,Product_ID ,SUM(AMOUNT)
FROM Sales M2 P2 600
GROUP BY Market_ID ,
M3 P1 100
Product_ID;
OLAP OPERATIONS
2. ROLL UP
Specific grouping on one dimension where we
go from lower level of aggregation to a higher.
Example :
“ROLL UP sales on MARKET from CITY to
REGION”
Firsty, the TOTAL SALE of a PARTICULAR Product in a city at
a given time is done.
Then,we use the CITY and Product ID of a city belonging to a
REGION to project sales in that region
Select S.Product_Id,M.City,SUM(S.Amount)
INTO City_Sales
FROM Sales S,Market M
WHERE
M.Market_ID = S.Market_ID
GROUP BY S.Product_ID,M.City
SELECT T.Product_ID,M.Region,SUM(T.Amount)
FROM City_Sales T,Market M
Where T.City=M.City
GROUP BY T.Product_ID,M.Region
OLAP OPERATIONS
3.DRILL DOWN
• Finer –grained view on aggregated data,i.e.
going from higher to lower aggregation
• Converse of Roll-up
• E.g disaggregate county sales by region/city.
OLAP OPERATIONS
4.PIVOTING
Select A different dimension(orientation) for analysis
OLAP OPERATIONS
5. SLICE and DICE
Slicing : Selection on one or more dimensions
Example : “Choosing sales only in week 12”
Slicing the data cube in the Time Dimension
SELECT S.*
FROM Sales S,Time T
WHERE T.Time_ID = S.Time_ID
AND T.WEEK=’Week 12’
OLAP OPERATIONS
Dicing: A range selection in a hypercube. Partition or
group on one or more dimensions.
Example :
“ Total sales for each product in each quarter “
Dicing sales in the time dimension :
SELECT S.Product_ID,T.Quarter,SUM(S.Amount)
FROM Sales S,Time T
WHERE T.Time_ID=S.Time_ID
GROUP BY T.Quarter,S.Product_ID
SQL
ID FNAME LNAME MARKS SEM
EXTENSIONS
FOR OLAP 1 ANAL ACHARYA 300 1
2 ROMIT BEED 325 1
1.ROLL UP
3 JAYATI GHOSH.D 390 2
SELECT
SEM,SUM(MARKS), 4 KAUSHIK GOSWA 350 2
RANK() OVER MI
(ORDER BY SUM
(MARKS) DESC)
5 SILADITY MUKHER 275 3
AS rank FROM A JEE
TEACHERS
GROUP BY ROLL
6 SONALI SEN 310 3
UP(SEM) ORDER
BY SEM
SEM SUM(MARKS) RANK
2 740 1
1 625 2
3 585 3
ROLL UP thus provides subtotals of
aggregate rows.
SQL EXTENSIONS
ID FNAME LNAME MARKS SEM
2.CUBE 1 ANAL ACHARY 300 1
A
SELECT SEM, 2 ROMIT BEED 325 1
3 JAYATI GHOSH. 390 2
SUM(MARKS) D
4 KAUSHIK GOSWA 350 2
FROM TEACHERS MI
5 SILADITY MUKHER 275 3
GROUP BY A JEE
6 SONALI SEN 310 3
CUBE(SEM)
SEM SUM(MARKS)
1 625
2 740
3 585
1950
The CUBE operator provides
subtotals of aggregate values in the
result set
SQL EXTENSIONS
3. GROUPING SETS lets us compute groups
on several different sets of grouping columns
in the same query.
This Query returns subtotal rows for each year,
but not for the individual quarters.
SQL EXTENSIONS
Select YEAR as YEAR , QUARTER as QUARTER,
COUNT(*)
as ORDERS from SALES
GROUP BY
GROUPING SETS(YEAR,QUARTER),(YEAR))
ORDER BY YEAR & QUARTER
OLAP FUNCTIONS
1. RANK FUNCTION – Lets us compile a list of
values from your data set in
ranked order.
Example : The SQL query that follows finds the
male and female employees from Kolkata
and ranks them in descending order according to
salary.
SELECT emp_lname, salary, sex,
RANK () OVER (ORDER BY salary DESC) "Rank"
FROM employee
WHERE city IN (’KOL’)
OLAP FUNCTIONS
2.REPORTING FUNCTION : Reporting functions
lets us compare non-aggregate values to aggregate
values.
Example : The following query returns a result set
that shows a list of the products that
sold higher than the average number of sales. The
result set is partitioned by
year.
SELECT *
FROM (SELECT year(order_date) AS Year, prod_id,SUM( quantity ) AS
Q,AVG (SUM(quantity))
OVER (PARTITION BY Year) AS Average FROM sales_order JOIN
sales_order_items
GROUP BY year(order_date), prod_idORDER BY Year)AS derived_table
WHERE Q > Average
• For the year 2000, the average number of orders was 1787. Four products
• (700, 601, 600, and 400) sold higher than that amount. In 2001, the average
• number of orders was 1048 and three products exceeded that amount.
OLAP FUNCTIONS
WINDOW FUNCTIONS
Window functions lets us analyze ourdata by
computing aggregate values over windows
surrounding each row. The result set returns a
summary value representing a set of rows.
The query returns a result set that partitions the data by department and then
provides a cumulative summary of employees’ salaries starting with the
employee who has been at the company the longest. The result set includes
only those employees who reside in West Bengal, BBSR, Maharashtra, or
Arunachal. The column Sum Salary provides the cumulative total of
employees’ salaries.
SELECT dept_id, emp_lname, start_date, salary,
SUM(salary) OVER (PARTITION BY dept_id
ORDER BY start_date
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
AS "Sum_
Salary"
FROM employee
WHERE state IN (’WB’, ’BBSR’, ’MH’, ’AR’) AND dept_id IN (’100’,
’200’)
ORDER BY dept_id, start_date;
On Line Analytical Processing
Thus Online Analytical Processing as a whole
can be understood to be a method which takes
in raw data , processes it through various
functions and operations and produces
Information as a Response to
Multidimensional Queries in Real Time
SERVER ARCHITECTURES
MOLAP : Multidimensional OLAP
The database is stored in a special, usually proprietary,
structure that is optimized for multidimensional analysis.
+ : very fast query response time because data is mostly
pre-calculated
-: practical limit on the size because the time taken to
calculate the database and the space required to hold
these pre-calculated values
SERVER ARCHICTECTURES
ROLAP – Relational OLAP
The database is a standard relational database and the
database model is a multidimensional model, often
referred to as a star or snowflake model or schema.
+: more scalable solution
-: performance of the queries will be largely governed
by the complexity of the SQL and the number and size
of the tables being joined in the query
SERVER ARCHITECTURES
HOLAP – HYBRID OLAP
A hybrid of ROLAP and MOLAP
can be thought of as a virtual database whereby the
higher levels of the database are implemented as
MOLAP and the lower levels of the database as
ROLAP
SERVER ARCHITECTURES
DOLAP –DESKTOP OLAP
The previous terms are used to refer to server based OLAP
technologies
DOLAP (Desktop OLAP)
DOLAP enables users to quickly pull together small cubes that
run on their desktops or laptops .
COMMERCIAL OLAP SYSTEMS
IBM DB2 DATAWAREHOUSING
•ENTERPRIZE EDITION
•STANDARD EDITION
•BASE EDITION
ORACLE 9i ENTERPRIZE EDITION
MICROSOFT SQL SERVER 2005 BUSINESS INTELLIGEN
CE
WORKBENCH PLATFORM
OLAP Challenges and Future Scope
• Analytical Complexity
• Business questions can be rarely answered by a single query
• Complex queries are hard to understand,write and execute
efficiently
• Need for good business analysts
• Data Cubes can be HUGE
• But also can be sparse
• Can compute in advance,compute on demand , or some
combination.
• OLAP forms the underlying structure of DDAS –Distributed Data
Analysis and Dissemination System.
• From On line Analytical Processing to Online Analytical Mining
• ( OLAP to OLAM)
BIBLIOGRAPHY
• Data Warehousing , Data Mining and OLAP – Alex Berson,Stephen J.Smith
• Data Warehousing And OLAp - Hector Garcia-Molina
• Stanford University
• A Hitchhiker’s guide to OLAP – Paul Burton and Howard ong.
• Data mining data warehousing – Dr.Hani Saleeb
• DATA WAREHOUSE
AND OLAP TECHNOLOGY Prof. Anita Wasilewska
• Data Mining:
Concepts and Techniques Jiawei Han, Micheline Kamber, and Jian Pei
• University of Illinois at Urbana-Champaign &
• Simon Fraser University
• Wikipedia.
• Data Warehousing, Filtering, and Mining-Temple University
• Data Mining- Professor Maytal Saar-Tsechansky
THANK YOU !