Requirements of Cluster Analysis:
1. Scalability: Need highly scalable clustering algorithms to deal
Targe databases,
Ability to deal with different kinds of attributes: Algorithms
should be capable to be applied on any kind of data such as
Jnterval-based (numerical) data, categorical, and binary data,
3. Discovery of clusters with attribute shape: The clustering
algorithm should be capable of detecting clusters of arbitrary
‘shape. They should not he bounded to only distance measures
that tend to find spherical eluster of small sizes,
4. High dimensionality: the clustering algorithm should not only
be able to handle low-dimensional data but also the high
dimensional space.
5. Ability to deal with noisy data: Databases contain noisy,
missing or exroneous data, Some algorithms are sensitive to such
data and may lead to poor quality clusters.
6 Interpretability: The clustering results should be
comprehensible, and usable.
a ‘Attempt any THREE of the following: 2
a) _| Write about Business Analysis Framework for Data Warehouse aM
Design.
Ans. | Business framework for DW design:
‘The business analyst gets the information from the data warehouses | Comet
to measure the performance and make eritical adjustments ia order to | “Mgqato"
\win over other business holders in the market
Having a data warehouse offers the following advantages:
i. Since a data warehouse can gather information quickly and
Page 18/36
Dounlated by Rushkosh Chougle narenstoupist nae)
a
20/37, S
MAHARASHTRA STATE HOARD OF TECHNICAL EDUCATION
(Autonomous)
(ASOMEC - 27001 - 2005 Certified)
‘SUMMER - 2024 EXAMINATION
MODEL ANSWER
Subject: Data Warchousing with Mining Techniques Subject Code: |_22624
‘efficiently, ican eahance business productiviy.
‘A data warehouse provides us a consistent view of customers and
items; hence, it helps us manage customer relationship.
iii, A data warehouse also helps in bringing down the costs by
‘racking trends, pattems over a long period in a consistent and
reliable manner.
To design an effective and efficient data warehouse, we nced to
understand and analyze the business needs and construct a business
analysis framework. Each person has different views regarding the
design of data warehouse. These views ate as follows:
‘a, The top-down view: This view allows the selection of relevant
information needed for a data warehouse.
1b, The data source view: This view presents the information being
captured, stored, and managed by the operational system,
«. The data warehouse view: This view includes the fact tables and
dimension tables. It represents the information stored inside the data
warehouse
4. The business query view:
viewpoint ofthe end user.
I is the view of the data from the
D) _ | Give the architecture of typical DM system, aN
‘Ans. | Architecture of DM System:
Data mining means searching for knowledge (interesting patterns o
useful data) in data, Data mining refers to extraction of small | psy supe sas
information from large amount of data.
Diegram IMa
Ans.
“Aitempt any TWO of the following:
How to generate association rules from Frequent Ttemsets?
Explain with example
‘To gencrte association rules from frequent itemses, follow these
ol
om
Generation
steps feats 3
1. Wdentfy Frequent Htemsets:
“+ Findal itemsets that meet the minimum support threshold
2. Generate Assocation
‘+ Forcach frequent itemset 1/,create rules A-»BA-~B where AA
and BU are non-empty subsets of 1.
3. Caleulate Confidence:
‘+ Foreach rule A»B4—-B, calculate:
confidence(-B)=suppor(AUB)suppar(A}confidence(A—rB)=
suppor()suppor(AUB)
‘+ Keep rules that meet the minimum confidence threshold,
Example
Consider transaction data
[rransction 1D [tems Bought
Mil, Bread
Mik, Diaper, Beer, Bread
Isic Diaper, Bese
ilk Bread
Brad, Diaper. Beer
amples
Page 28 /36
(SES, wauanasierna state BOARD oF TECHNICAL EDUCATION
‘Subject: Data Warehousing with Mi
(Autonomous)
(ASOMEC - 27001 - 2005 Certified)
SUMMER - 2024 EXAMINATION
‘MODEL ANSWER
22621
we Techniques Subject Code:
‘Step I: Identify Frequent Itemsets
‘With a minimum support threshold of 6%:
‘+ Single items: {Milk (80%), Bread (80%), Diaper (60%), Beer
(oom)
+ Pairs: [Diaper Beer (60%)
Step2: Generate Association Rules
rom (Diaper, Beer|:
+ Diaper —+ Beer
+ Beer — Diaper
Step 3: Caleulate Confidence
+ Diaper —+ Beer: Confidence = 100%
1+ Beer — Diaper: Confidence = 100%
‘Boh rules meet the confidence threshold (7058).
Summary
“The astocation rules are:
iper ~» Beer (Confidence = 100%)
+ Boor — Diaper (Confidence = 100%)
‘State how to clean missing values of Noy data with example
(Clean missing values of nosy data:
‘Consider the Data set:
Rollin Name Fees Che
i .
2 ‘Atal 1000 iad
3 ‘Akash FY
4 ‘Ami 2000 FY
on
splanton
‘aM4d. Use the attribute mean to fill in the missing valuc
[Missing value is replaced by the average value of that conse or
abuts
[Ex for fees of roll 3, we ean put the mean value a 1500, beoz it
does not change the average of hat column
Use the most probable value to fil in the missing value.
‘We ean replace the missing value by most probable value which i
consistent fo that tribute
Ex for fees of rollo 3, we can pt the relevant value as 2000 or 1000,
a
How modeling performed with Data Cube? Explain with example
ofa.
‘Ans, | A multidimensional model views data inthe form ofa data-cube. | ase Cute,
1 daa cube enables data to be modelled and viewed in multiple | ectend
‘dimensions.
Mulidimersional data model consists of Fact table and dimension | pation
tables, MF
Fact Table:
‘This table contains primary key of multiple dimension tables.
contains facts or measures ike quantity sold, smount sold, te.
Dimension Table:
“Ths table provides descriptive information fr all measures recorded
in fac table, like product, item, location, time, ee
Example ofa Sales Data Cube sear
‘Consider a retail company that wants to analyze its sales data. The