Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views18 pages

Unit 5

Uploaded by

deviltrek86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views18 pages

Unit 5

Uploaded by

deviltrek86
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Visualization and

UNIT
Overall Perspective

EET Aggregation:
CRY MOTns Fo 5-2D to 5-2D
Historical Information

Part-2 : Query Facility ............:.....---.- En RS 5-2D to 5-9D


OLAP Function and Tools
OLAP Servers
ROLAP, MOLAP, HOLAP

Part-3 : Data Mining Interface ...........-..----oc. 5-9D to 5-11D


Security : - :
Backup and Recovery

Part-4 : Tuning Data Warehouse and ............. 5-12D to 5-13D


Testing Data Warehouse
to 5-14D
| Part-5 : Warehousing Applications and ........... 5-13D
Recent Trends : Types of
Warehousing Applications

Web Mining «10... 5-14D to 5-18D


Part-6 :
Spatial Mining and
I,
ie Temporal Mining

5-1 D (CS/IT-6)

Scanned by CamScanner
\
ii
SC
a

py
ag
i

Scanned by CamScanner
Scanned by CamScanner
Scanned by CamScanner
|al

=
5-5 D (CS/IT-6)

Diagramma ticall y illustrate


and =discuss the architecture
of MOLAP
nt AKTU 2016-17, Marks 10|
OR
Explain how query performance
can be improved by cascading the
operations. [AKTU 2015-16, Marks 1
Types of OLAP servers are :
1. Relational OLAP : ROLAP servers are placed between relational back-
end server and client front-end
: tools. : To store and man age warehouse
data, ROLAP uses relational or extended-relational DBMS.
ROLAP architecture : ROLAP includes the following components :
1. Database server
2. ROLAP server
3. Front-end tool

ROLAP Server
Front end tool
nfo
Database uest
Server SQL nn

METADATA
| Request hen
<— —
| Processing Result
Result! Set
Bet

Advantages of ROLAP : |
nts of data.
L Itcan handle large amou se.
fun cti ona lit ies inh ere nt in the relational databa
2, Itcanleverag e =
P ser ver s can be eas ily used with existing —
3. ROLA be s ;
sto red eff ici ent ly, since no zero facts can
4, Datacan be oars
use pre -ca lcu lat ed data a
ROLAP tools do not :
5. the RO app
ro-strategY adopts
6. DSS server of mic
| vantages of ROLAP :
1. Performance can be slow
2. Limited
by SQL functionalities
3. Hardto maintain aggregate tables

Scanned by CamScannet
: izati
Data Visualization & Overa Il Perspect;
SPectiye
5-6 D (CS/IT-6) a

2 Mul tidimensional OLAP :


rather
L MOLAP stores in optimized multidimensional array storage,
_
than in arelational database.
With multidimensional data stores, the storage utilization may be
low if the data set is sparse.
Therefore, many MOLAP server use two levels of data storage
representation to handle dense and sparse data sets.
MOLAP architecture : MOLAP includes the following components :
1. Database server
2. MOLAP server
3. Front-end tool

MOLAP Server

Front end tool


Info
Database Request
Server SQL ——
— pe
METADATA
<—— | Request —
Result Processing Result
Set
Set

Fig. 5.4.2.
Advantages of MOLAP :
1. It is optimal for slice and dice operations,
2. Performance is better than ROLAP when data
is dense.
3. It can perform complex calculations.
4. MOLAP allows fastest indexing to the pre-computed
summarized
data.
5. Helps the users connected to a network
who need to analyze larger,
less-defined data.
6. Easier to use, therefore MOLAP is suitable
for inexperienced users.
Disadvantages of MOLAP ;
1. Difficult to change dimension
without re-aggregation.
2. MOLAP can handle limited
amount of data.
3. Some MOLAP methodologies
introduce data redundancy.
4. Requires additional inv
estment,
3. Hybrid OLAP:
L Hybrid OLAP is a combination
of both ROLAP and MOLAP.

Scanned by CamScanner
pata Warehousing & Data Mining
ii It offers higher scalah
j);

information. * Storing large data volumes of detailed


iv. The aggregations are sto
Advantages of HOLAP : red separ ately in MOLAP store.
1. HOLAP provides advanta
2.
: ges of both M
It provides fast access at all levels of a pe ROLAP.
Disadvantages of HOLAP : Ho ggregation.
because it support both MOLAP and we ips is very complex
Steps for efficient processing of OI aed Evers,
Processing of OLAP queries :
To speed up1p the query processing
in data cubes, . the the cuba; |
and ur index structures are constructed with op ona
L etermining which operation should
available cuboids: be performed on the
a. This involves transformation of operations specified in the query
into the corresponding SQL and/or OLAP operators.
b. These operations include roll-up, drill-down, projection, selection,
etc. .
"A F, or example, slicing and dicing operation on data cube can be
_
transformed into selection and/or projection operations on
materialized cuboids.
2. Determining on which materialized cuboids(s) the relevant
__ operations should be applied: In this, all of the materialized cuboids
are identified which may be useful for answering the query, pruning
| the relationships among the cuboids, estimating the cost of using the
© remaining materialized cuboids and selecting the cuboids with the least
cont.
25.5. | Define and describe the basic similarities and differences
LAP.
among ROLAP, MOLAP and HO

| Wade +: OR
Compare MOLAP vs HOLAP.
- OR
; t note on R OLAP vs MOLAP. ama | a
Write a shor

HOLAP : These three OLAP


Ker]
2 Simila rities between ROLAP, MOLA P 5 and they are related to the
re ho us es ,
Vers are used to implement data wa
model used to represent data.

ITS
Scanned by CamScanner
5-8 D (CS/IT-6) Data Visualization & Overall Perspective
: = a
Differences between ROLAP, MOLAP and HOLAP
S.No.) Basis ROLAP |MOLAP ao,

1 Storage location | Relational Multidimensional = Sanam


for detail data |database | database

2 Storage location | Relational | Multidimensional ol


| for summary database | database atabas
aggregations
3. Storage space | Large Medium Small
requirement al
4. Query-response | Slow Fast Medium
time hee

5. Processing time | Slow Fast Fast


6. Latency Low High Medium 1

5.6. | Give E.F. Codd's 12 guidelines for OLAP.

Dr. EF. Codd the father of the relational mode


l, created a list of rules to deal
with the OLAP systems.
1 Multidimensional conceptual view
: The OLAP should provide an
appropriate multidimensional busi
ness model that suits the business
problems and requirements.
2. Transparency : The OLAP tool
should provide transparency
input data for the users. to the
3. Accessibility : The OLAP too
onlyto the analysis
l should only access t
needed. ired
_—_ ore
not affect in any way
the performance.
5. Clien
Userver archith ecture : The
OLAP tool should use the
client

structure and operation ity : Data entered shoul i the


requirements.
‘ns et to

rt: LAP |
concurrently to work sen should allow several users working

Scanned by CamScanner
Data Warehousing & Data Mining

9. Unrestricted cross-dime

10. Intuitive data manipulation -

11. Flexible reporting : It is the ability of the


column in a manner suitable to be anal tool to present the rows and
yzed.
12. Unlimited dimensions and aggre gati
on levels : This depends on
the kind of busi ness, where multip le dimensions and defining
hierarchies
can be made.

CONCEPT OUTLINE

* Data Mining Interface (DMI) is a web-based, interactive, dynamic


report building module.

e | A KTU 2014-1
in details.
Describe data mining interfac

inte ovid es the mediumi that allows users to communicate |


mining
Data data. an s. It is difficult to use data mining a. ‘ecg
mining "ain
with
ica
(GUI) can be used to commun
= graphical user interface
:
tional compo nents
Ada ta
| min ing interface may consist of following func
. _ ition : It allows
L Data collection and data mining query me ere DG
user to specify task relevant data sets and to com
queries.

Scanned by CamScanner
ive
Data Visualization & Overall Perspect
5-10 D (CSAT-6)
of
ii. Presentation of disc1 overd ed patterns : It allows the display
s like tables, graphs, charts, anq
discovered patterns in various form
oy ON manipulation : It allows to do the
iii.
See TE hierarchy, either manually or automatically.
neadigatuiei bes of data mining primitives + Y as ie ae
adjustment of data mining operations like selection, display,
modification of concept hierarchies. .
v. Interactive multilevel mining : It allows the roll-up or drill-down
operations on discovered patterns. .
The design of data mining interface should also consider the different classes
of users. Users of data mining system can be classified into two categories :
business analysts and business executives.
Que 5.8. | Write short note on backup and recovery.

AKTU
| 2013-14, Marks 05
OR
Explain different backup and recovery models in data warehousing.

AKTU 2014-15,
Marks 10

1. Backup and recovery refers to the process


of backing up data in case of
a loss and setting up systems that allow
data recovery due to data loss.
A data warehouse is a complex system and
SS

it contains a huge volume of


data.
Therefore, it is important to bac
od

kup all the data so that it bec


available for recovery in future omes
as per requirement,
4. Some of the backup terminologies
are :
a. Complete backu p : It bac
kup the entire database at
time. the same
b. Partial backup :
day-to-day basis
"robin fashion on a
c. Co
eg ta P: , Cold backup is"paitaken when
the database isi completely
4d Hoth
wy SP :- Hot backAR:
up is taken when the database
engine is up
Ee e. Online backup: Jt is
quite
0 —_ are different backup similar to hot back up,
and recovery models ;
reco Wid, mode
detainee l ; |, Provides the
earlier point, in time, Most flexibility for recovering
y ,

pe —©
4
Scanned by CamScanner
Data Warehousing & Data Mining
B i: 5-11 D (CS/IT-6)
Bulk-logged recovery mode]: Dulk-logged reco
——

2. * .
rformance than lowe y provides higher
I T log space consump tion forvercertain large scale
3, Simple recovery model ; an
gi m Sane
performance and lowest log s Very Provides the highest
exposure to data loss in the event 10n but with the signi
of a system failure.
a
7; 2
Pee | How data backup and data recovery
is managed in data
Oi.

warehouse ? AKTU 2017-18, Marks 10

Managing the recovery of a large data warehouse is a difficult task and


traditional OLTP backup and recov ery strate :
needs of a data warehouse. y gies may mot=meet the
We should plan a backup strategy as part of our system design and
consider what to backup and how frequently to backup.
3. The most important variables in our backup design are the amount of
resources we have to perform a backup or recovery and the recovery
time objective.
a. NOLOGGING operations must be taken into account when
recovery,
planning a backup and recovery strategy. Traditional
from the archive log,
restoring a backup and applying the changes
does not apply for NOLOGGING operations.
G operation is taking
. Þ, Never make a backup when a NOLOGGIN
place.
or 2 combination of the following
c. Plan for one of the following
strategies :
@ backup that does not contain
ji. The ETL strategy : Recover
ctions and dreplay the ETL that has
non-recoverable transa th e fai lure.
n pl ac e be tw ee n the backup an
ta ke ckup
ck up st ra te gy * Perform a ba
ansaction has taken
tal ba
ii. The incremen a non-recoverable tr
0 as) immediately after
- The following
t pra cti ces
place.
for bac kup
d recovery up: and recovery
ne e% 5 back
Strategies and bes
ca n he lp us to implement our wa
practice s

1. Use ARCHIVELOG mode |


2. Use RMAN mode
3. Use read-only tablespaces
4. Plan for NOLOGGING operation®
ant
5. Not all tablespaces are equally import

1canner
Scanned by CamS
5-12 D (CS/IT-6) Data Visualization & Overall Perspective

| [ PART-4 |
| Tuning Data Warehouse and Testing Data Warehouse.

| L | Tuning in data warehouses are the processes of selecting adequate


optimization techniques in order to make queries and updates run faster.
2. Adatawarehouse is usually accessed by complex queries for key business
operations.
3. Therefore it becomes more difficult to tune a data warehouse system.
The tuning of data warehouse can be done to improve the performance.
4. Difficulties in data warehouse tuning are :
a. Data warehouse is dynamic; it never remains constant.
b. Itis very difficult to predict what query the user is going
to post in
the future.
c. Business requirements change with time.
d Users and their profiles keep changing.

Testing is very important for


data warehouse systems to mak
correctly and efficiently. There e them work
are three basic levels of testin
a data warehouse : g performed on

i. te testing = In unit testing, each component is separ


ately tested.
Mis modu] ©, 1.6., procedure. program, SQL Script, Unix shell is tested.
testis performed by the developer.

Scanned by CamScanne
Data Warehousing & Data Mining
oe
Integrati 7 (CS/IT-6)
g ig
of inputs.s me! |
It js Egg 1m wint=:
egration testing, the various module
s of
and then tested against the
well after integration. number
et whether the various compon
ents do

a e austen testing, the whole data wareho


whether the Sits r n use
he Purpose of system testing
: is to che ck
testing is performed by the works correct]
testing Wan.” oe
Challenges of data war
ehouse testing are
1. Dataselection fr
challenge nirom multiple source and analysis-
that follows pose great
2 Volume and comple
xity of the data.
3. Redundant data in a
data warehouse.
4. Inconsistent and inaccu
rate reports.
ETL testing is perfor
med in five stages :
1. ‘Identifying data sources and
requirements
2. Data acquisition
3. Implement business logics and dim
ensional modeling
4. Build and populate data
5. Build reports

CONCEPT OUTLINE
* Applications of data warehouse
are :
1 Airline
ii Banking
m. Healthcare
iv. Public sector
If

Scanned by CamScanner
5-14 D (CS/IT-6) Data Visualization & Overall Perspective
a

ng ?
5.12 . | Wha t are the app lications of data warehousi
Que
AKTU 2016-17, Marks 05

Answer |
are :
Applications of data warehousing
crew
1. Airline : In the Airline system, itis used for operation purpose like
ent flyer progr am
assignment, analysis of route profitability, frequ
promotions, etc.
the resources
Banking : It is widely used in the banking sector to manage
available on desk effectively. Few banks also used for the market
operations.
research, performance analysis of the product and
Healthcare : Healthcare sector also used data warehouse to strategize
and predict outcomes, generate patient’s treatment reports, share data
with tie-in insurance companies, medical aid services, etc.

Public sector : In the public sector, data warehouse is used for


intelligence gathering. It helps government agencies to maintain and
analyze tax records, health policy records, for every individual.
Investment and insurance sector : In this sector, the warehouses
are primarily used to analyze data patterns, customer trends, and to
track market movements.
Retain chain : In retail chains, data warehouse is widely used for
Aolp 5 distribution and marketing. It also helps to track items, customer buying
pattern, promotions and also used for determining pricing policy.
Telecommunication : A data warehouse is used in this sector for
product promotions, sales decisions and to make distribution decisions.
Hospitality industry : This industry utilizes warehouse services t0
design as well as estimate their advertising and promotion campaigns
where they want to target clients based on their feedback and travel
patterns.

PART-6 gazed
atial Mining and Temporal Mining.
oe ets)

CONCEPT OUTLINE
Web mining is of three types :
i Webcontent mining
i, Web usage mining

Scanned by CamScanner
P
Difference: 5

a. Unstructured a. Semi-structured Interactivity S


b. Structured b. Website as DB
[Main data a. Text documents a. Hypertext a. Link structure| a. Server logs
b. Hypertext documents documents b. Browser logs
|Representation| a. Bag of words, n-gram | a. Edge labeled graph | a. Graph a. Relational table
terms | b. Relational b. Graph So
b. Phrases, concepts or =
ontology -
c. Relational a
od a. Machine learning a. Proprietary a. Proprietary a. Machine E
b. Statistical (including algorithms algorithms learning =
NLP) b. Association rules b. Statistical 5
c. Association rules| | &
Applications a. Categorization a. Finding frequent | a. Categorization | a. Site construction ?
categories b. Clustering sub structures b. Clustering b. Adaptation and ry
c. Finding extract rules b. Web site schema management jw)
| d. Finding patterns in text discovery 2

Scanned by CamScanner
Data Warehousing & Da
ta Mining

2. Inspatial data mining,


to produce business in tellig
ence or other aphical or spatial information
results.

data tasks of temporal dat


a mining are :
a. Data characterization and com
parison
b. Cluster analysis
c. Classification
d. Association rules
e. Prediction and trend analysis
f. Pattern analysis
en
Compare and contrast spatial, temporal mining with

relevant examples.
AKTU 2016-17, Marks cs 15]

Spatial mining | Temporal mining


Mala
i ining is the ining
| Temporal mining isis the extraction
ate ae of OY of knowledge about Rear of
‘Spatial relationships and | an event or one FbaGan bay |
interesting measures that | follow cyclic, random, |
are not explicitly storedin | variations etc.

me "> — ial | It deals with implicit or explicit


Hs
(location, with apatia
geo-referen | temporal 1 Cc content, from large
ced)
___ | data. quantitien of gate

Scanned by CamScanner
Scanned by CamScanner

You might also like