Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
5 views10 pages

OLAP Operations in The Multidimensional Data Model

dmdw

Uploaded by

Subhadra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views10 pages

OLAP Operations in The Multidimensional Data Model

dmdw

Uploaded by

Subhadra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

OLAP Operations in the Multidimensional Data Model

In the multidimensional model, the records are organized into various dimensions, and each dimension
includes multiple levels of abstraction described by concept hierarchies. This organization support
users with the flexibility to view data from various perspectives. A number of OLAP data cube
operation exist to demonstrate these different views, allowing interactive queries and search of the
record at hand. Hence, OLAP supports a user-friendly environment for interactive data analysis.

Consider the OLAP operations which are to be performed on multidimensional data. The figure shows
data cubes for sales of a shop. The cube contains the dimensions, location, and time and item, where
the location is aggregated with regard to city values, time is aggregated with respect to quarters,
and an item is aggregated with respect to item types.

Roll-Up
The roll-up operation (also known as drill-up or aggregation operation) performs aggregation on
a data cube, by climbing down concept hierarchies, i.e., dimension reduction. Roll-up is like zooming-
out on the data cubes. Figure shows the result of roll-up operations performed on the dimension
location. The hierarchy for the location is defined as the Order Street, city, province, or state, country.
The roll-up operation aggregates the data by ascending the location hierarchy from the level of the
city to the level of the country.

When a roll-up is performed by dimensions reduction, one or more dimensions are removed from the
cube. For example, consider a sales data cube having two dimensions, location and time. Roll-up may
be performed by removing, the time dimensions, appearing in an aggregation of the total sales by
location, relatively than by location and by timen

Example

Consider the following cubes illustrating temperature of certain days recorded weekly:

Temperature 64 65 68 69 70 71 72 75 80 81 83 85

Week1 1 0 1 0 1 0 0 0 0 0 1 0

Week2 0 0 0 1 0 0 1 2 0 1 0 0

Consider that we want to set up levels (hot (80-85), mild (70-75), cool (64-69)) in temperature from
the above cubes.

To do this, we have to group column and add up the value according to the concept hierarchies. This
operation is known as a roll-up.

By doing this, we contain the following cube:

Temperature cool mild hot

Week1 2 1 1

Week2 2 1 1

The roll-up operation groups the information by levels of temperature.

The following diagram illustrates how roll-up works.


Drill-Down
The drill-down operation (also called roll-down) is the reverse operation of roll-up. Drill-down is
like zooming-in on the data cube. It navigates from less detailed record to more detailed data. Drill-
down can be performed by either stepping down a concept hierarchy for a dimension or adding
additional dimensions.

Figure shows a drill-down operation performed on the dimension time by stepping down a concept
hierarchy which is defined as day, month, quarter, and year. Drill-down appears by descending the
time hierarchy from the level of the quarter to a more detailed level of the month.

Because a drill-down adds more details to the given data, it can also be performed by adding a new
dimension to a cube. For example, a drill-down on the central cubes of the figure can occur by
introducing an additional dimension, such as a customer group.

Example

Drill-down adds more details to the given data

Temperature cool mild hot

Day 1 0 0 0

Day 2 0 0 0

Day 3 0 0 1

Day 4 0 1 0
Day 5 1 0 0

Day 6 0 0 0

Day 7 1 0 0

Day 8 0 0 0

Day 9 1 0 0

Day 10 0 1 0

Day 11 0 1 0

Day 12 0 1 0

Day 13 0 0 1

Day 14 0 0 0

The following diagram illustrates how Drill-down works.

Slice
A slice is a subset of the cubes corresponding to a single value for one or more members of the
dimension. For example, a slice operation is executed when the customer wants a selection on one
dimension of a three-dimensional cube resulting in a two-dimensional site. So, the Slice operations
perform a selection on one dimension of the given cube, thus resulting in a subcube.

For example, if we make the selection, temperature=cool we will obtain the following cube:
Temperature cool

Day 1 0

Day 2 0

Day 3 0

Day 4 0

Day 5 1

Day 6 1

Day 7 1

Day 8 1

Day 9 1

Day 11 0

Day 12 0

Day 13 0

Day 14 0

The following diagram illustrates how Slice works.


Here Slice is functioning for the dimensions "time" using the criterion time = "Q1".

It will form a new sub-cubes by selecting one or more dimensions.

Dice
The dice operation describes a subcube by operating a selection on two or more dimension.

For example, Implement the selection (time = day 3 OR time = day 4) AND (temperature = cool OR
temperature = hot) to the original cubes we get the following subcube (still two-dimensional)

Temperature cool hot

Day 3 0 1

Day 4 0 0

Consider the following diagram, which shows the dice operations.


The dice operation on the cubes based on the following selection criteria involves three dimensions.

o (location = "Toronto" or "Vancouver")


o (time = "Q1" or "Q2")
o (item =" Mobile" or "Modem")

Data Mining - Issues


Data mining is not an easy task, as the algorithms used can get very complex and data is not always
available at one place. It needs to be integrated from various heterogeneous data sources. These
factors also create some issues. Here in this tutorial, we will discuss the major issues regarding −

 Mining Methodology and User Interaction


 Performance Issues
 Diverse Data Types Issues
The following diagram describes the major issues.
Mining Methodology and User Interaction Issues

It refers to the following kinds of issues −


Mining different kinds of knowledge in databases − Different users may be interested in
different kinds of knowledge. Therefore it is necessary for data mining to cover a broad range of
knowledge discovery task.
Interactive mining of knowledge at multiple levels of abstraction − The data mining process
needs to be interactive because it allows users to focus the search for patterns, providing and refining
data mining requests based on the returned results.
Incorporation of background knowledge − To guide discovery process and to express the
discovered patterns, the background knowledge can be used. Background knowledge may be used to
express the discovered patterns not only in concise terms but at multiple levels of abstraction
Data mining query languages and ad hoc data mining − Data Mining Query language that
allows the user to describe ad hoc mining tasks, should be integrated with a data warehouse
query language and optimized for efficient and flexible data mining.
Presentation and visualization of data mining results − Once the patterns are discovered it
needs to be expressed in high level languages, and visual representations. These representations
should be easily understandable.
Handling noisy or incomplete data − The data cleaning methods are required to handle the noise
and incomplete objects while mining the data regularities. If the data cleaning methods are not there
then the accuracy of the discovered patterns will be poor.
Pattern evaluation − The patterns discovered should be interesting because either they
represent common knowledge or lack novelty.

Performance Issues

There can be performance-related issues such as follows −


Efficiency and scalability of data mining algorithms − In order to effectively extract the
information from huge amount of data in databases, data mining algorithm must be efficient and
scalable.
Parallel, distributed, and incremental mining algorithms − The factors such as huge size
of databases, wide distribution of data, and complexity of data mining methods motivate the
development of parallel and distributed data mining algorithms. These algorithms divide the data
into partitions which is further processed in a parallel fashion. Then the results from the
partitions is merged. The incremental algorithms, update databases without mining the data
again from scratch.

Diverse Data Types Issues

Handling of relational and complex types of data − The database may contain complex
data objects, multimedia data objects, spatial data, temporal data etc. It is not possible for one
system to mine all these kind of data.

Mining information from heterogeneous databases and global information systems −


The data is available at different data sources on LAN or WAN. These data source may be
structured, semi structured or unstructured. Therefore mining the knowledge from them adds
challenges to data mining.

Difference between Supervised and Unsupervised Learning


Supervised and Unsupervised learning are the two techniques of machine learning. But both the
techniques are used in different scenarios and with different datasets. Below the explanation of both
learning methods along with their difference table is given.

Supervised Machine Learning:


Supervised learning is a machine learning method in which models are trained using labeled data. In
supervised learning, models need to find the mapping function to map the input variable (X) with the
output variable (Y).

Supervised learning needs supervision to train the model, which is similar to as a student learns things
in the presence of a teacher. Supervised learning can be used for two types of
problems: Classification and Regression.

Learn more Supervised Machine Learning

Pause

Unmute

Current Time 0:08

Duration 18:10
Loaded: 4.40%

Fullscreen

Example: Suppose we have an image of different types of fruits. The task of our supervised learning
model is to identify the fruits and classify them accordingly. So to identify the image in supervised
learning, we will give the input data as well as output for that, which means we will train the model by
the shape, size, color, and taste of each fruit. Once the training is completed, we will test the model by
giving the new set of fruit. The model will identify the fruit and predict the output using a suitable
algorithm.
Unsupervised Machine Learning:
Unsupervised learning is another machine learning method in which patterns inferred from the
unlabeled input data. The goal of unsupervised learning is to find the structure and patterns from the
input data. Unsupervised learning does not need any supervision. Instead, it finds patterns from the
data by its own.

Learn more Unsupervised Machine Learning

Unsupervised learning can be used for two types of problems: Clustering and Association.

Example: To understand the unsupervised learning, we will use the example given above. So unlike
supervised learning, here we will not provide any supervision to the model. We will just provide the
input dataset to the model and allow the model to find the patterns from the data. With the help of a
suitable algorithm, the model will train itself and divide the fruits into different groups according to the
most similar features between them.

The main differences between Supervised and Unsupervised learning are given below:

Supervised Learning Unsupervised Learning

Supervised learning algorithms are Unsupervised learning algorithms are


trained using labeled data. trained using unlabeled data.

Supervised learning model takes direct Unsupervised learning model does not
feedback to check if it is predicting take any feedback.
correct output or not.

Supervised learning model predicts the Unsupervised learning model finds the
output. hidden patterns in data.

In supervised learning, input data is In unsupervised learning, only input


provided to the model along with the data is provided to the model.
output.

The goal of supervised learning is to The goal of unsupervised learning is to


train the model so that it can predict the find the hidden patterns and useful
output when it is given new data. insights from the unknown dataset.

Supervised learning needs supervision Unsupervised learning does not need


to train the model. any supervision to train the model.

Supervised learning can be categorized Unsupervised Learning can be


in Classification and Regression probl classified
ems. in Clustering and Associations probl
ems.

Supervised learning can be used for Unsupervised learning can be used for
those cases where we know the input as those cases where we have only input
well as corresponding outputs. data and no corresponding output
data.

Supervised learning model produces an Unsupervised learning model may


accurate result. give less accurate result as compared
to supervised learning.

Supervised learning is not close to true Unsupervised learning is more close to


Artificial intelligence as in this, we first the true Artificial Intelligence as it
train the model for each data, and then learns similarly as a child learns daily
only it can predict the correct output. routine things by his experiences.

It includes various algorithms such as It includes various algorithms such as


Linear Regression, Logistic Regression, Clustering, KNN, and Apriori alg
Support Vector Machine, Multi-class
Classification, Decision tree, Bayesian
Logic, etc.

You might also like