Data Mining and Data Warehouse

Data mining and data warehouse

Uploaded by

hhc407319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

18 views9 pages

Data Mining and Data Warehouse

Data mining and data warehouse

Uploaded by

hhc407319

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 9

fat Home Coding Ground np E> euorialencine = — Data Mining - Cluster Analysis Advertisements7:36 9 ® LO 1D Yo Si © Previous Page Next Page © Cluster is a group of objects that belongs to the same class. In other words, similar objects are grouped in one cluster and dissimilar objects are grouped in another cluster. What is Clustering? Clustering is the process of making a group of abstract objects into classes of similar objects. Points to Remember ® A cluster of data objects can be treated as one group. While doing cluster analysis, we first partition the set of data into groups based on data similarity and then assign the labels to the groups. o The main advantage of clustering over classification is that, it is adaptable to changes and helps single out useful features that distinguish different groups. Applications of Cluster Analysis736 39 LO 10 fo i © Applications of Cluster Analysis = Clustering analysis is broadly used in many applications such as market research, pattern recognition, data analysis, and image processing. = Clustering can also help marketers discover distinct groups in their customer base. And _ they can characterize their customer groups based on the purchasing patterns. 2 In the field of biology, it can be used to derive plant and animal taxonomies, categorize genes with similar functionalities and gain insight into structures inherent to populations. Clustering also helps in identification of areas of similar land use in an earth observation database. It also helps in the identification of groups of houses in a city according to house type, value, and geographic location. o Clustering also helps in classifying documents on the web for information discovery. 5 2 Clusterina is also used in outlier737 38 1O 01 te 4) a analysis serves as a tool to gain insignt into the distribution of data to observe characteristics of each cluster. Requirements of Clustering in Data Mining The following points throw light on why clustering is required in data mining - * Scalability - We need highly scalable o o clustering algorithms to deal with large databases. Ability to deal with different kinds of attributes - Algorithms should be capable to be applied on any kind of data such as interval-based (numerical) data, categorical, and binary data. Discovery of clusters with attribute shape - The clustering algorithm should be capable of detecting clusters of arbitrary shape. They should not be bounded to only distance measures that tend to find spherical cluster of small sizes. High dimensionality - The clustering algorithm should not only be able to handle low-dimensional data but Bw the high dimensional space.737 3 ¥1@Q 10! ta “i High dimensionality - The clustering algorithm should not only be able to handle low-dimensional data but also the high dimensional space. ci Ability to deal with noisy data - Databases contain noisy, missing or erroneous data. Some algorithms are sensitive to such data and may lead to poor quality clusters. Interpretability - The clustering results should be interpretable, comprehensible, and usable. Clustering Methods Clustering methods can be classified into the following categories - = Partitioning Method = Hierarchical Method = Density-based Method 2 Grid-Based Method = Model-Based Method ' Constraint-based Method Partitioning Method Suppose we are given a database of ‘n’ objects and the partitioning met eanctricte ‘l! nartitinn af data Fach nartitianPartitioning Method Suppose we are given a database of ‘n’ objects and the partitioning method constructs ‘k’ partition of data. Each partition will represent a cluster and k < n. It means that it will classify the data into k groups, which satisfy the following requirements - = Each group contains at least one object. = Each object must belong to exactly one group. Points to remember - = For a given number of partitions (say k), the partitioning method will create an initial partitioning. = Then it uses the iterative relocation technique to improve the partitioning by moving objects from one group to other. Hierarchical Methods This method creates a_ hierarchical decomposition of the given set of ‘a objects. We can classify hierarchi737 3 9 1O 10 ft fi methods on the basis of how the hierarchical decomposition is formed. There are two approaches here - = Agglomerative Approach 2 Divisive Approach Agglomerative Approach This approach is also known as the bottom- up approach. In this, we start with each object forming a separate group. It keeps on merging the objects or groups that are close to one another. It keep on doing so until all of the groups are merged into one or until the termination condition holds. Divisive Approach This approach is also known as the top-down approach. In this, we start with all of the objects in the same cluster. In the continuous iteration, a cluster is split up into smaller clusters. It is down until each object in one cluster or the termination condition holds. This method is rigid, i.e., once a merging or splitting is done, it can never be undone. Approaches to Improve Quality of Hierarchical Clustering a Lane ae thn han annem bran that nen enna tn7:37 3 9 1O Of Si OO Approaches to Improve Quality of Hierarchical Clustering Here are the two approaches that are used to improve the quality of hierarchical clustering = Perform careful analysis of object linkages at each hierarchical partitioning. 5 Integrate hierarchical agglomeration by first using a hierarchical agglomerative algorithm to group objects into micro- clusters, and then performing macro- clustering on the micro-clusters. Density-based Method This method is based on the notion of density. The basic idea is to continue growing the given cluster as long as the density in the neighborhood exceeds some threshold, i.e., for each data point within a given cluster, the radius of a given cluster has to contain at least a minimum number of points. Grid-based Method In this, the objects together form a grid. a obiect space is auantized into finite number737 3 ® LOS 10! Me © Grid-based Method In this, the objects together form a grid. The object space is quantized into finite number of cells that form a grid structure. Advantages = The major advantage of this method is fast processing time. 2 It is dependent only on the number of cells in each dimension in the quantized space. Model-based methods In this method, a model is hypothesized for each cluster to find the best fit of data for a given model. This method locates the clusters by clustering the density function. It reflects spatial distribution of the data points. This method also provides a way to automatically determine the number of clusters based on standard statistics, taking outlier or noise into account. It therefore yields robust clustering methods. Constraint-based Method g In this method the clusterina is nerfarmed hv

Unit 5
No ratings yet
Unit 5
27 pages
Clustering Techniques Overview
No ratings yet
Clustering Techniques Overview
52 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Practical Software Testing
No ratings yet
Practical Software Testing
3 pages
DM Cluster Analysis
No ratings yet
DM Cluster Analysis
3 pages
Clustering Techniques Explained
No ratings yet
Clustering Techniques Explained
12 pages
Data Mining: Cluster Analysis Guide
No ratings yet
Data Mining: Cluster Analysis Guide
40 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Clustering Insights for Data Analysts
No ratings yet
Clustering Insights for Data Analysts
4 pages
Unit 4
No ratings yet
Unit 4
4 pages
UNIT 3 DWDM Notes
No ratings yet
UNIT 3 DWDM Notes
32 pages
Clustering Techniques for Data Scientists
No ratings yet
Clustering Techniques for Data Scientists
5 pages
Screenshot 2024-05-17 at 3.30.05 PM
No ratings yet
Screenshot 2024-05-17 at 3.30.05 PM
31 pages
Unit 5
No ratings yet
Unit 5
27 pages
6 - Clustering and Applications and Trends in Datamining
No ratings yet
6 - Clustering and Applications and Trends in Datamining
66 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Clustering for Data Analysts
No ratings yet
Clustering for Data Analysts
6 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Clustering: Methods and Applications
No ratings yet
Clustering: Methods and Applications
69 pages
Unit 4
No ratings yet
Unit 4
21 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
DM Unit-4 Part1
No ratings yet
DM Unit-4 Part1
21 pages
A New Hierarchical Clustering Algorithm
No ratings yet
A New Hierarchical Clustering Algorithm
5 pages
Data Mining Clustering Insights
No ratings yet
Data Mining Clustering Insights
3 pages
DWM Exp6 A49
No ratings yet
DWM Exp6 A49
7 pages
Unit-V (Dmwh6em)
No ratings yet
Unit-V (Dmwh6em)
30 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Unit 4
No ratings yet
Unit 4
106 pages
CLUSTER ANALYSIS Unit 3 Data Mining
No ratings yet
CLUSTER ANALYSIS Unit 3 Data Mining
84 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
Module V
No ratings yet
Module V
16 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
Clustering New
No ratings yet
Clustering New
6 pages
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
No ratings yet
Clustering and Applications and Trends in Datamining Lecture:-30 To 35
66 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
Unit 15
No ratings yet
Unit 15
26 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Cluster Analysis
No ratings yet
Cluster Analysis
20 pages
BD Unit 3
No ratings yet
BD Unit 3
27 pages
Clustering
No ratings yet
Clustering
41 pages
5.cluster Analysis
No ratings yet
5.cluster Analysis
16 pages
Clustering Part 1
No ratings yet
Clustering Part 1
12 pages
Unt III (DS)
No ratings yet
Unt III (DS)
49 pages
DMT Unit-5
No ratings yet
DMT Unit-5
10 pages
DM Notes - UNIT 4
No ratings yet
DM Notes - UNIT 4
31 pages
Data Mining With Clustering: Dr. Mahesh Fernando
No ratings yet
Data Mining With Clustering: Dr. Mahesh Fernando
55 pages

Data Mining and Data Warehouse

Uploaded by

Data Mining and Data Warehouse

Uploaded by

You might also like