Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
41 views21 pages

DWM 2

Uploaded by

bhimapasare45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views21 pages

DWM 2

Uploaded by

bhimapasare45
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Samarth

Unit 2.Data Warehouse Modeling and Online Analytical Processing-I

Dimentional Modeling-
-The concept of Dimensional Modeling was developed by Ralph Kimball and consists of “fact”
and “dimension” tables.
-It is a logical design technique used for data warehouse.
-Every dimensional model is composed of at least one table with a multipart key called the fact
table and a set of smaller tables called dimension tables.
Elements of Dimensional Data Model

Fact

It is a collection of associated data items, consisting of measures and context data. It typically
represents business items or business transactions.

Dimensions

It is a collection of data which describe one business dimension. In simple terms, they give who,
what, where of a fact. In the Sales business process, for the fact quarterly sales number,
dimensions would be

● Who – Customer Names


● Where – Location
● What – Product Name
In other words, a dimension is a window to view information in the facts.

Fact Table
Fact tables are used to data facts or measures in the business
A fact table is a primary table in dimension modeling.
A Fact Table contains
1. Measurements/facts
2. Foreign key to dimension table

Dimension Table
-Dimension tables establish the context of the facts. Dimensional tables store fields that
describe the facts.

● A dimension table contains dimensions of a fact.


● They are joined to the fact table via a foreign key.
● Dimension tables are denormalized tables.
● The dimension can also contain one or more hierarchical relationships
Samarth

Comparison between Database & Data Warehouse


Database Data Warehouse

It Supports OLTP (Online Transaction It Supports OLAP (Online Analytical


Processing). Processing).

An organized accumulation of data called a A big, centralized repository of data that is


database. It makes it easier to access, specially created for reporting and data
retrieve, and manipulate information. analysis is known as a data warehouse.

It is designed for the purpose of storing the It is designed for the purpose of analysing the
data. data.

Designing is done using ER modelling Designing is done using data modelling


methods. methods.

A database contains detailed data. Data warehouses keep highly summarized


data.

databases are typically smaller in size. When compared to databases, data


warehouses are larger.

Data present in it is frequently updated to Data present in data warehouses are usually
maintain accuracy and consistency within the static and historical. Therefore, this already-
database. existing data can be utilised for effective data
analysis.

Applications developers and operational Business analysts and executives frequently


employees frequently use databases. use Data warehouses.

A few examples of databases are MySQL, A few examples of data warehouses are
Oracle, etc. Google BigQuery, IBM Db2, etc.

Data Cube:-
-When data is grouped or combined in multidimensional matrices called Data Cubes. The data
cube method has a few alternative names or a few variants, such as "Multidimensional
databases," "materialized views," and "OLAP (On-Line Analytical Processing)."
Samarth

-The general idea of this approach is to materialize certain expensive computations that are
frequently inquired.

-For example, a relation with the schema sales (part, supplier, customer, and sale-price) can be
materialized into a set of eight views as shown in fig, where psc indicates a view consisting of
aggregate function value (such as total-sales) computed by grouping three attributes part,
supplier, and customer, p indicates a view composed of the corresponding aggregate function
values calculated by grouping part alone, etc.

-The model view data in the form of a data cube. OLAP tools are based on the multidimensional
data model. Data cubes usually model n-dimensional data.

-A data cube enables data to be modeled and viewed in multiple dimensions. A


multidimensional data model is organized around a central theme, like sales and transactions. A
fact table represents this theme. Facts are numerical measures. Thus, the fact table contains
measure (such as Rs_sold) and keys to each of the related dimensional tables.

-Dimensions are a fact that defines a data cube. Facts are generally quantities, which are used
for analyzing the relationship between dimensions..

Example: In the 2-D representation, we will look at the All Electronics sales data for items sold
per quarter in the city of Vancouver. The measured display in dollars sold (in thousands).
Samarth

3-Dimensional Cuboids
Let suppose we would like to view the sales data with a third dimension. For example, suppose
we would like to view the data according to time, item as well as the location for the cities
Chicago, New York, Toronto, and Vancouver. The measured display in dollars sold (in
thousands). These 3-D data are shown in the table. The 3-D data of the table are represented
as a series of 2-D tables.

Conceptually, we may represent the same data in the form of 3-D data cubes, as shown in fig:

Let us suppose that we would like to view our sales data with an additional fourth dimension,
such as a supplier.
Samarth

In data warehousing, the data cubes are n-dimensional. The cuboid which holds the lowest level
of summarization is called a base cuboid.

For example, the 4-D cuboid in the figure is the base cuboid for the given time, item, location,
and supplier dimensions.

Figure is shown a 4-D data cube representation of sales data, according to the dimensions time,
item, location, and supplier. The measure displayed is dollars sold (in thousands).

Consider the data of a shop for items sold per quarter in the city of Delhi. The data is shown in
the table. In this 2D representation, the sales for Delhi are shown for the time dimension
(organized in quarters) and the item dimension (classified according to the types of an item
sold). The fact or measure displayed in rupee_sold (in thousands).

Now, if we want to view the sales data with a third dimension, For example, suppose the data
according to time and item, as well as the location is considered for the cities Chennai, Kolkata,
Mumbai, and Delhi. These 3D data are shown in the table. The 3D data of the table are
represented as a series of 2D tables.
Samarth

Conceptually, it may also be represented by the same data in the form of a 3D data cube, as
shown in fig:

Schema:-
-Schema is a logical description of the entire database. It includes the name and description of
records of all record types including all associated data-items and aggregates. Much like a
database, a data warehouse also requires to maintain a schema. A database uses relational
model, while a data warehouse uses Star, Snowflake, and Fact Constellation schema.

Types of Schema
1.Star Schema
2.Snowflake Schema
3.Fact Constellation Schema

Star Schema
Samarth

-star schema is the most popular schema design for a data warehouse.
-In a star schema, as the structure of a star, there is one fact table in the middle and a number
of associated dimension tables. This structure resembles a star and hence it is known as a star
schema.
-The primary key which is present in each dimension is related to a foreign key which is present
in the fact table.
-The size of the fact tables is large as compared to the dimension tables.

Figure – General structure of Star Schema


The following diagram shows the sales data of a company with respect to the four dimensions,
namely time, item, branch, and location.

-There is a fact table at the center. It contains the keys to each of four dimensions.

-The fact table also contains the attributes, namely dollars sold and units sold.

Advantages of star Schema-

1.Simplest DW Schema
Samarth

2.Easy to understand
3.Most Suitable for query processing
4.It is fully denormalized schema

Disadvantages of Star schema-

1.Data redundancy: Star schema can result in data redundancy, as the same data
may be stored in multiple places in the schema. This can lead to data
inconsistencies and difficulties in maintaining data accuracy.
2.Increased costs: Adding redundant data increases computing and storage costs.
This can be especially troubling when handling large datasets.

Snowflake Schema

-Snowflake schema acts like an extended version of a star schema.

-In a snowflake schema, the fact table is still located at the center of the
schema,surrounded by the dimension tables. However, each dimension table is further
broken down into multiple related tables.

Figure – General structure of Snowflake Schema

-Some dimension tables in the Snowflake schema are normalized.


Samarth

-Unlike Star schema, the dimensions table in a snowflake schema are normalized. For example,

the item dimension table in star schema is normalized and split into two dimension tables,

namely item and supplier table.

Now the item dimension table contains the attributes item_key, item_name, type, brand, and
supplier-key.

The supplier key is linked to the supplier dimension table. The supplier dimension table
contains the attributes supplier_key and supplier_type.
Due to normalization in the Snowflake schema, the redundancy is reduced and
therefore, it becomes easy to maintain and the save storage space.

Advantages of Snowflake Schema


1.It provides structured data which reduces the problem of data integrity.
2.It uses small disk space because data are highly structured.

Disadvantage of Snowflake Schema

1. The primary disadvantage of the snowflake schema is the additional maintenance efforts
required due to the increasing number of lookup tables. It is also known as a multi fact
star schema.

2. There are more complex queries and hence, difficult to understand.


Samarth

3. More tables more join so more query execution time.

Difference Between Star Schema and Snowflake Schema

Parameters Star Schema Snowflake Schema

Definition A star schema contains both A snowflake schema contains all three-
and Meaning dimension tables and fact dimension tables, fact tables, and sub-
tables in it. dimension tables.

Type of It is a top-down model type. It is a bottom-up model type.


Model

Space It makes use of more allotted It makes use of less allotted space.
Occupied space.

Time Taken With the Star Schema, the With the Snowflake Schema, the
for Queries process of execution of process of execution of queries takes
queries takes less time. more time.

Use of The Star Schema does not The Snowflake Schema makes use of
Normalizatio make use of normalization. both Denormalization as well as
n Normalization.

Complexity The design of a Star Schema The designing of a Snowflake Schema


of Design is very simple. is very complex.
Samarth

Complexity It is very easy to understand a It is comparatively more difficult to


of Star Schema. understand a Snowflake Schema.
Understandi
ng

Total The total number of foreign The total number of foreign keys is
Number of keys is less in the case of a more in the case of a Snowflake
Foreign Star Schema. Schema.
Keys

Data Data redundancy is Data redundancy is comparatively


Redundancy comparatively higher in Star lower in Snowflake Schema.
Schema.

Fact Constellation Schema

A fact constellation has multiple fact tables. It is also known as galaxy schema.
A Fact constellation means two or more fact tables sharing one or more dimensions.

■ Figure – General structure of Fact Constellation

The following diagram shows two fact tables, namely sales and shipping.
Samarth

The sales fact table is same as that in the star schema.


The shipping fact table has the five dimensions, namely item_key, time_key, shipper_key,
from_location, to_location.
The shipping fact table also contains two measures, namely dollars sold and units sold.
It is also possible to share dimension tables between fact tables. For example, time, item, and
location dimension tables are shared between the sales and shipping fact table.

Advantage: Provides a flexible schema.

Disadvantage: It is much more complex and hence, hard to implement and maintain.

Difference Between Fact Constellation Schema and Snowflake Schema

OLAP:-
-Online Analytical Processing Server (OLAP) is based on the multidimensional data model.
OLAP is a classification of software technology which authorizes analysts, managers, and
executives to gain insight into information through fast, consistent, interactive access in a wide
variety of possible views of data that has been transformed from raw information to reflect the
real dimensionality of the enterprise as understood by the clients.
Samarth

-Online Analytical Processing(OLAP) refers to a set of software tools used for data analysis in
order to make business decisions.
-It provides easy & efficient access to the various views of information to the users.
-The complex queries are also processed by using OLAP.
-It is a powerful technology for data discovery.
-It performs multidimensional analysis of business data.
-It has the ability to achieve fast access to shared multidimensional information.

Difference Between OLAP & OLTP

Types of OLAP:-
OLAP can be divided into following types:
1. MOCAP
2 .ROLAP
3. HOLAP

1. MOLAP
Samarth

-MOLAP stands for Multidimensional Online Analytical Processing,an application based on


multidimensional DBMSs.
-It is the classical form of OLAP & stores the data in an optimized multi-dimensional array
storage

Advantages

Excellent Performance: A MOLAP cube is built for fast information retrieval, and is optimal for
slicing and dicing operations.

Can perform complex calculations: All evaluations have been pre-generated when the cube is
created. Hence, complex calculations are not only possible, but they return quickly.

It performs fast query operation due to optimized storage, multidimensional indexing & caching.

Disadvantages

Limited in the amount of information it can handle: Because all calculations are performed when
the cube is built, it is not possible to contain a large amount of data in the cube itself.

Requires additional investment: Cube technology is generally proprietary and does not already
exist in the organization. Therefore, to adopt MOLAP technology, chances are other
investments in human and capital resources are needed.

-MOLAP comes with data redundancy.

-Sometimes the processing step can be lengthy, especially on large data

2. ROLAP

-ROLAP stands for Relational Online Analytical Processing., an application based on relational
DBMSs
-It works with relational databases.
-ROLAP depends on specialized schema design.
-It has the ability to drill down to the lowest level in the database.

Advantages

Can handle large amounts of information: The data size limitation of ROLAP technology
depends on the data size of the underlying RDBMS. So, ROLAP itself does not restrict the data
amount.

RDBMS already comes with a lot of features. So ROLAP technologies, (works on top of the
RDBMS) can control these functionalities.
Data can be stored efficiently.
Samarth

Disadvantages

Performance can be slow: Each ROLAP report is a SQL query (or multiple SQL queries) in the
relational database, the query time can be prolonged if the underlying data size is large.

Limited by SQL functionalities: ROLAP technology relies on upon developing SQL statements to
query the relational database, and SQL statements do not suit all needs.

3. HOLAP

-HOLAP stands for Hybrid Online Analytical Processing,an application using both relational and
multidimensional techniques.
- It uses relational tables to hold the larger quantities of detailed data.

Advantages of HOLAP
1.HOLAP provides benefits of both MOLAP and ROLAP.
2.It provides fast access at all levels of aggregation.
3.HOLAP balances the disk space requirement, as it only stores the aggregate information on
the OLAP server and the detail record remains in the relational database. So no duplicate copy
of the detail record is maintained.

Disadvantages of HOLAP

HOLAP architecture is very complicated because it supports both MOLAP and ROLAP servers.

Advantages of OLAP

-It enables managers to solve the problems.

-It controls the access to strategic information for more effective decision making.

-It enables the organization to respond more quickly to market demands.

-It enables users to analyze multidimensional data interactively from multiple perspectives.

-It does not require a large data warehouse.

Need of OLAP

-It Supports Multidimensional data.

-It provides fast,steady and proficient access to the various views of the information.

-Complex Queries can be processed.


Samarth

-It’s easy to analyze information by processing complex queries on multidimensional views of


data.

OLAP Guidelines

Dr. E.F. Codd the father of the relational model, created a list of rules to deal with the OLAP
systems. Users should priorities these rules according to their needs to match their business
requirements. These rules are as follows:

1. Multidimensional conceptual view: The OLAP should provide an appropriate multidimensional


Business model that suits the Business problems & Requirements.

2. Transparency: The OLAP tool should provide transparency to the input data for the users.

3. Accessibility: The OLAP tool should only access the data required only to the analysis
needed.

4. Consistent reporting performance: The Size of the database should not affect in any way the
performance.

5. Client/server architecture: The OLAP tool should use the client server architecture to ensure
better performance & flexibility.

6. Generic dimensionality: Data entered should be equivalent to structure & operation


requirements.
Samarth

7. Dynamic sparse matrix handling: The OLAP too should be able to manage the sparse matrix
& so maintain the level of performance.

8.Multi-user support: The OLAP should allow several user working concurrently to work
together.

9. Unrestricted cross-dimensional operations- The OLAP should be able to perform operations


across the dimensions of the cube.

10) Intuitive Data Manipulation: Data Manipulation fundamental the consolidation direction like
as reorientation (pivoting), drill-down and roll-up, and another manipulation to be accomplished
naturally and precisely via point-and-click and drag and drop methods on the cells of the
scientific model. It avoids the use of a menu or multiple trips to a user interface.

11. Flexible reporting: It is the ability of the tool to present the rows & column in a manner
suitable to be analyzed.

12. Unlimited dimensions & aggregation levels: This depends on the kind of Business, where
multiple dimensions & defining hierarchies can be made.

OLAP Operations:-

1.Roll-up
2.Drill-down

3.Slice and dice


4.Pivot (rotate)
5.Roll-up

1.Roll-up

-Roll-up performs aggregation on a data cube in any of the following ways −

-By climbing up a concept hierarchy for a dimension


-By dimension reduction

The following diagram illustrates how roll-up works.


Samarth

-Roll-up is performed by climbing up a concept hierarchy for the dimension location.Initially the
concept hierarchy was "street < city < province < country".

-On rolling up, the data is aggregated by ascending the location hierarchy from the level of city
to the level of country.

-The data is grouped into cities rather than countries.

-When roll-up is performed, one or more dimensions from the data cube are removed.

2.Drill-down

-Drill-down is the reverse operation of roll-up. It is performed by either of the following

ways −

-By stepping down a concept hierarchy for a dimension


-By introducing a new dimension.

The following diagram illustrates how drill-down works −


Samarth

-Drill-down is performed by stepping down a concept hierarchy for the dimension time.

-Initially the concept hierarchy was "day < month < quarter < year."

-On drilling down, the time dimension is descended from the level of quarter to the level of
month.

-When drill-down is performed, one or more dimensions from the data cube are added.

-It navigates the data from less detailed data to highly detailed data.

3.Slice

The slice operation selects one particular dimension from a given cube and provides a new sub-
cube. Consider the following diagram that shows how slice works.
Samarth

-Here Slice is performed for the dimension "time" using the criterion time = "Q1".
-It will form a new sub-cube by selecting one or more dimensions.

4.Dice

-Dice selects two or more dimensions from a given cube and provides a new sub-cube.
Consider the following diagram that shows the dice operation.
Samarth

-The dice operation on the cube based on the following selection criteria involves three
dimensions.

(location = "Toronto" or "Vancouver")


(time = "Q1" or "Q2")
(item =" Mobile" or "Modem")

5.Pivot

The pivot operation is also known as rotation. It rotates the data axes in view in order to provide
an alternative presentation of data. Consider the following diagram that shows the pivot
operation.

You might also like