Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views7 pages

DWM Extra

Notes

Uploaded by

pranav41494
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
4 views7 pages

DWM Extra

Notes

Uploaded by

pranav41494
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 7
Requirements of Cluster Analysis: 1. Scalability: Need highly scalable clustering algorithms to deal Targe databases, Ability to deal with different kinds of attributes: Algorithms should be capable to be applied on any kind of data such as Jnterval-based (numerical) data, categorical, and binary data, 3. Discovery of clusters with attribute shape: The clustering algorithm should be capable of detecting clusters of arbitrary ‘shape. They should not he bounded to only distance measures that tend to find spherical eluster of small sizes, 4. High dimensionality: the clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space. 5. Ability to deal with noisy data: Databases contain noisy, missing or exroneous data, Some algorithms are sensitive to such data and may lead to poor quality clusters. 6 Interpretability: The clustering results should be comprehensible, and usable. a ‘Attempt any THREE of the following: 2 a) _| Write about Business Analysis Framework for Data Warehouse aM Design. Ans. | Business framework for DW design: ‘The business analyst gets the information from the data warehouses | Comet to measure the performance and make eritical adjustments ia order to | “Mgqato" \win over other business holders in the market Having a data warehouse offers the following advantages: i. Since a data warehouse can gather information quickly and Page 18/36 Dounlated by Rushkosh Chougle narenstoupist nae) a 20/37, S MAHARASHTRA STATE HOARD OF TECHNICAL EDUCATION (Autonomous) (ASOMEC - 27001 - 2005 Certified) ‘SUMMER - 2024 EXAMINATION MODEL ANSWER Subject: Data Warchousing with Mining Techniques Subject Code: |_22624 ‘efficiently, ican eahance business productiviy. ‘A data warehouse provides us a consistent view of customers and items; hence, it helps us manage customer relationship. iii, A data warehouse also helps in bringing down the costs by ‘racking trends, pattems over a long period in a consistent and reliable manner. To design an effective and efficient data warehouse, we nced to understand and analyze the business needs and construct a business analysis framework. Each person has different views regarding the design of data warehouse. These views ate as follows: ‘a, The top-down view: This view allows the selection of relevant information needed for a data warehouse. 1b, The data source view: This view presents the information being captured, stored, and managed by the operational system, «. The data warehouse view: This view includes the fact tables and dimension tables. It represents the information stored inside the data warehouse 4. The business query view: viewpoint ofthe end user. I is the view of the data from the D) _ | Give the architecture of typical DM system, aN ‘Ans. | Architecture of DM System: Data mining means searching for knowledge (interesting patterns o useful data) in data, Data mining refers to extraction of small | psy supe sas information from large amount of data. Diegram IM a Ans. “Aitempt any TWO of the following: How to generate association rules from Frequent Ttemsets? Explain with example ‘To gencrte association rules from frequent itemses, follow these ol om Generation steps feats 3 1. Wdentfy Frequent Htemsets: “+ Findal itemsets that meet the minimum support threshold 2. Generate Assocation ‘+ Forcach frequent itemset 1/,create rules A-»BA-~B where AA and BU are non-empty subsets of 1. 3. Caleulate Confidence: ‘+ Foreach rule A»B4—-B, calculate: confidence(-B)=suppor(AUB)suppar(A}confidence(A—rB)= suppor()suppor(AUB) ‘+ Keep rules that meet the minimum confidence threshold, Example Consider transaction data [rransction 1D [tems Bought Mil, Bread Mik, Diaper, Beer, Bread Isic Diaper, Bese ilk Bread Brad, Diaper. Beer amples Page 28 /36 (SES, wauanasierna state BOARD oF TECHNICAL EDUCATION ‘Subject: Data Warehousing with Mi (Autonomous) (ASOMEC - 27001 - 2005 Certified) SUMMER - 2024 EXAMINATION ‘MODEL ANSWER 22621 we Techniques Subject Code: ‘Step I: Identify Frequent Itemsets ‘With a minimum support threshold of 6%: ‘+ Single items: {Milk (80%), Bread (80%), Diaper (60%), Beer (oom) + Pairs: [Diaper Beer (60%) Step2: Generate Association Rules rom (Diaper, Beer|: + Diaper —+ Beer + Beer — Diaper Step 3: Caleulate Confidence + Diaper —+ Beer: Confidence = 100% 1+ Beer — Diaper: Confidence = 100% ‘Boh rules meet the confidence threshold (7058). Summary “The astocation rules are: iper ~» Beer (Confidence = 100%) + Boor — Diaper (Confidence = 100%) ‘State how to clean missing values of Noy data with example (Clean missing values of nosy data: ‘Consider the Data set: Rollin Name Fees Che i . 2 ‘Atal 1000 iad 3 ‘Akash FY 4 ‘Ami 2000 FY on splanton ‘aM 4d. Use the attribute mean to fill in the missing valuc [Missing value is replaced by the average value of that conse or abuts [Ex for fees of roll 3, we ean put the mean value a 1500, beoz it does not change the average of hat column Use the most probable value to fil in the missing value. ‘We ean replace the missing value by most probable value which i consistent fo that tribute Ex for fees of rollo 3, we can pt the relevant value as 2000 or 1000, a How modeling performed with Data Cube? Explain with example ofa. ‘Ans, | A multidimensional model views data inthe form ofa data-cube. | ase Cute, 1 daa cube enables data to be modelled and viewed in multiple | ectend ‘dimensions. Mulidimersional data model consists of Fact table and dimension | pation tables, MF Fact Table: ‘This table contains primary key of multiple dimension tables. contains facts or measures ike quantity sold, smount sold, te. Dimension Table: “Ths table provides descriptive information fr all measures recorded in fac table, like product, item, location, time, ee Example ofa Sales Data Cube sear ‘Consider a retail company that wants to analyze its sales data. The