0% found this document useful (0 votes)

114 views8 pages

Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique

The document contains 10 multiple choice questions related to clustering algorithms such as k-means, hierarchical clustering, and k-medoids. It also contains short answer questions about unsupervised learning applications and measuring clustering quality in k-means.

Uploaded by

tirth patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

114 views8 pages

Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique

Uploaded by

tirth patel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

MCQ

1. k-means clustering algorithm is an example of which type of clustering method?

1. Hierarchical 2. Partitioning 3. Density based 4. Random

2. . Which of the below statement describes the difference between agglomerative and divisive
clustering techniques correctly?

1. Agglomerative is a bottom-up technique, but divisive is a top-down technique

2. Agglomerative is a top-down technique, but divisive is a bottomup technique

3. Agglomerative technique can start with a single cluster

4. Divisive technique can end with a single cluster

3. Which of the below is an advantage of k-medoids algorithm over k-means algorithm?

1. both are equally error prone

2. k-medoids can handle larger data set than k-means

3. k-medoids helps in reducing the effect of the outliers in the objects

4. k-medoids needs less computation to arrive at the final clustering

4. The principle underlying the Market Basket Analysis is known as

1. Association rule 2. Bisecting rule 3. k-means 4. Bayes’ theorem

5. A Voronoi diagram is used in which type of clustering?

1. Hierarchical 2. Partitioning 3. Density based 4. Intuition based

6. SSE of a clustering measures:

1. Initial number of set clusters

2. Number of clusters generated

3. Cost of clustering

4. Quality of clustering

7. One of the disadvantages of k-means algorithm is that the outliers may reduce the quality of the
final clustering.

1. True 2. False

8. Which of the following can be possible termination conditions in K-Means?

1. For a fixed number of iterations.

2. Assignment of observations to clusters does not change between iterations. Except for cases with
a bad local minimum.
3. Centroids do not change between successive iterations.

4. All of the above

9. Which of the following clustering algorithm is most sensitive to outliers?

1. K-means clustering algorithm

2. K-medians clustering algorithm

3. K-medoids clustering algorithm

4. K-modes clustering algorithm

10. In which of the following situations the K-Means clustering fails to give good results?

1. Data points with outliers

2. Data points with different densities

3. Data points with round shapes

4. All of the above

ShortQue

1. How unsupervised learning is different from supervised learning? Explain with some examples.

UNSUPERVISED VS SUPERVISED LEARNING Till now, we have discussed about supervised learning
where the aim was to predict the outcome variable Y on the basis of the feature set X :X :… X , and
we discussed methods such as regression and classification for the same. We will now introduce the
concept of unsupervised learning where the objective is to observe only the features X :X :… X ; we
are not going to predict any outcome variable, but rather our intention is to find out the association
between the features or their grouping to understand the nature of the data. This analysis may
reveal an interesting correlation between the features or a common behaviour within the subgroup
of the data, which provides better understanding of the data. In terms of statistics, a supervised
learning algorithm will try to learn the probability of outcome Y for a particular input X, which is
called the posterior probability. Unsupervised learning is closely related to density estimation in
statistics. Here, every input and the corresponding targets are concatenated to create a new set of
input such as {(X , Y ), (X , Y ),…, (X , Y )}, which leads to a better understanding of the correlation of X
and Y; this probability notation is called the joint probability. Let us take an example of how
unsupervised learning helps in pushing movie promotions to the correct group of people. In 1 2 n 1 2
n 1 1 2 2 n n earlier days, movie promotions were blind push of the same data to all demography,
such that everyone used to watch the same posters or trailers irrespective of their choice or
preference. So, in most of the cases, the person watching the promotion or trailer would end up
ignoring it, which leads to waste of effort and money on the promotion. But with the advent of
smart devices and apps, there is now a huge database available to understand what type of movie is
liked by what segment of the demography. Machine learning helps to find out the pattern or the
repeated behaviour of the smaller groups/clusters within this database to provide the intelligence
about liking or disliking of certain types of movies by different groups within the demography. So, by
using this intelligence, the smart apps can push only the relevant movie promotions or trailers to the
selected groups, which will significantly increase the chance of targeting the right interested person
for the movie.

2. Mention few application areas of unsupervised learning.

APPLICATION OF UNSUPERVISED LEARNING Because of its flexibility that it can work on

uncategorized and unlabelled data, there are many domains where unsupervised learning finds its
application. Few examples of such applications are as follows: Segmentation of target consumer
populations by an advertisement consulting agency on the basis of few dimensions such as
demography, financial data, purchasing habits, etc. so that the advertisers can reach their target
consumers efficiently Anomaly or fraud detection in the banking sector by identifying the pattern of
loan defaulters Image processing and image segmentation such as face recognition, expression
identification, etc. Grouping of important characteristics in genes to identify important influencers in
new areas of genetics Utilization by data scientists to reduce the dimensionalities in sample data to
simplify modelling Document clustering and identifying potential labelling option

3. What are the broad three categories of clustering techniques? Explain the characteristics of
each briefly.

4. Describe how the quality of clustering is measured in the k-means algorithm?

5. Describe the main difference in the approach of k-means and k-medoids algorithms with a neat
diagram.

6. What is a dendrogram? Explain its use.

7. What is SSE? What is its use in the context of the k-means algorithm?
The assumption for selecting random centroids is that multiple subsequent runs will minimize the SSE
and identify the optimal clusters. But this is often not true on the basis of the spread of the data set and
the number of clusters sought. So, one effective approach is to employ the hierarchical clustering
technique on sample points from the data set and then arrive at sample K clusters. The centroids of
these initial K clusters are used as the initial centroids. This approach is practical when the data set has
small number of points and K is relatively small compared to the data points. There are procedures such
as bisecting k-means and use of postprocessing to fix initial clustering issues; these procedures can
produce better quality initial centroids and thus better SSE for the final clusters.

8. Explain the k-means method with a step-by-step algorithm.

9. Describe the concept of single link and complete link in the context of hierarchical clustering.

an agglomerative clustering, the merging iterations may be stopped once the MIN distance between two
neighbouring clusters becomes less than the user-defined threshold. So, when an algorithm uses the
minimum distance D to measure the distance between the clusters, then it is referred to as nearest
neighbour clustering algorithm, and if the decision to stop the algorithm is based on a user-defined limit
on D , then it is called single linkage algorithm. On the other hand, when an algorithm uses the
maximum distance D to measure the distance between the clusters, then it is referred to as furthest
neighbour clustering algorithm, and if the decision to stop the algorithm is based on a userdefined limit
on D then it is called complete linkage

10. How apriori principle helps in reducing the calculation overhead for a market basket analysis?
Provide an example to explain.

For example, if a seller is dealing with 100 different items, then the learner need to evaluate 2 = 1 × e
itemsets for arriving at the rule, which is computationally impossible. So, it is important to filter out the
most important (and thus manageable in size) itemsets and use the resources on those to arrive at the
reasonably efficient association rules. The first step for us is to decide the minimum support and
minimum confidence of the association rules. From a set of transaction T, let us assume that we will find
out all the rules that have support ≥ minS and confidence ≥ minC, where minS and minC are the support
and confidence thresholds, respectively, for the rules to be considered acceptable. Now, even if we put
the minS = 20% and minC = 50%, it is seen that more than 80% of the rules are discarded; this means
that a large portion of the computational efforts could have been avoided if the itemsets for
consideration were first pruned and the itemsets which cannot generate association rules with
reasonable support and confidence were removed. T

LongQue

2.Explain how the Market Basket Analysis uses the concepts of

association analysis.

=>In market basket analysis, association rules are used to predict the likelihood of products being
purchased together. Association rules count the frequency of items that occur together, seeking to find
associations that occur far more often than expected

=>Examples of market basket analysis

The Amazon website employs a well-known example of market basket analysis. On a product page,
Amazon presents users with related products, under the headings of “Frequently bought together” and
“Customers who bought this item also bought.”

=>Benefits of market basket analysis

Market basket analysis can increase sales and customer satisfaction. Using data to determine that
products are often purchased together, retailers can optimize product placement, offer special deals and
create new product bundles to encourage further sales of these combinations.

3.Explain the Apriori algorithm for association rule learning with an

example.

=>Steps Involved in Apriori Algorithm

1. Set a minimum value for support and confidence. This means that we are only inter-

ested in finding rules for the items that have certain default existence (e.g. support)

and have a minimum value for co-occurrence with other items (confidence).

2. Extract all the subsets having higher value of support than minimum threshold.

3. Select all the rules from the subsets with confidence value higher than minimum

threshold.

4. Order the rules by descending order of Lift.

9.5.3 The apriori algorithm for association rule learning As discussed earlier, the main challenge of
discovering an association rule and learning from it is the large volume of transactional data and the
related complexity. Because of the variation of features in transactional data, the number of feature sets
within a data set usually becomes very large. This leads to the problem of handling a very large number
of itemsets, which grows exponentially with the number of features. If there are k items which may or
may not be part of an itemset, then there is 2 ways of creating itemsets with k those items. For example,
if a seller is dealing with 100 different items, then the learner need to evaluate 2 = 1 × e itemsets for
arriving at the rule, which is computationally impossible. So, it is important to filter out the most
important (and thus manageable in size) itemsets and use the resources on those to arrive at the
reasonably efficient association rules. The first step for us is to decide the minimum support and
minimum confidence of the association rules. From a set of transaction T, let us assume that we will find
out all the rules that have support ≥ minS and confidence ≥ minC, where minS and minC are the support
and confidence thresholds, respectively, for the rules to be considered acceptable. Now, even if we put
the minS = 20% and minC = 50%, it is seen that more than 80% of the rules are discarded; this means
that a large portion of the computational efforts could have been avoided if the itemsets for
consideration were first pruned and the itemsets which cannot generate association rules with
reasonable support and confidence were removed. The approach to achieve this goal is discussed
below. Step 1: decouple the support and confidence requirements. According to formula 9.8, the
support of the rule X → Y is dependent only on the support of its corresponding itemsets. For example,
all the below rules have the same support as their itemsets are the same {Bread, Milk, Egg}: {Bread,
Milk} → {Egg} {Bread, Egg} → {Milk} {Egg, Milk} → {Bread} {Bread} → {Egg, Milk} {Milk} → {Bread, Egg}
{Egg} → {Bread, Milk} So, the same treatment can be applied to this association rule on the basis of the
frequency of the itemset. In this case, if 100 30 the itemset {Bread, Milk, Egg} is rare in the basket
transactions, then all these six rules can be discarded without computing their individual support and
confidence values. This identifies some important strategies for arriving at the association rules: 1.
Generate Frequent Itemset: Once the minS is set for a particular assignment, identify all the itemsets
that satisfy minS. These itemsets are called frequent itemsets. 2. Generate Rules: From the frequent
itemsets found in the previous step, discover all the high confidence rules. These are called strong rules.

4. How the distance between clusters is measured in hierarchical

clustering? Explain the use of this measure in making decision on when

to stop the iteration.

5. How to recompute the cluster centroids in the k-means algorithm?

1. Initialize Cluster Centroids

2. Assign datapoints to Clusters

3. Update Cluster centroids

4. Repeat step 2–3 until the stopping condition

OR
1. Select k centroids. These will be the center point for each segment.
2. Assign data points to nearest centroid.
3. Reassign centroid value to be the calculated mean value for each cluster.
4. Reassign data points to nearest centroid.
5. Repeat until data points stay in the same cluster.

6. Discuss one technique to choose the appropriate number of clusters at

the beginning of clustering exercise.

7. Discuss the strengths and weaknesses of the k-means algorithm.

8. Explain the concept of clustering with a neat diagram.

Clustering is the task of dividing the population or data points into a number of
groups such that data points in the same groups are more similar to other data
points in the same group and dissimilar to the data points in other groups. It is
basically a collection of objects on the basis of similarity and dissimilarity
between them.

Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Puppet Play Therapy - Hulburd
No ratings yet
Puppet Play Therapy - Hulburd
163 pages
How - To - Develop - Effective - Creative - Worship - PDF Filename UTF-8''How To Develop Effective Creat
No ratings yet
How - To - Develop - Effective - Creative - Worship - PDF Filename UTF-8''How To Develop Effective Creat
15 pages
Chapter 7
No ratings yet
Chapter 7
11 pages
ML UNIT-4 Answers
No ratings yet
ML UNIT-4 Answers
19 pages
Unit 5
No ratings yet
Unit 5
44 pages
Machine Learning Unsupervised Learning Methods
No ratings yet
Machine Learning Unsupervised Learning Methods
10 pages
ML Unit4
No ratings yet
ML Unit4
19 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Lab 10 Unsupervised
No ratings yet
Lab 10 Unsupervised
12 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Assessment Brief
No ratings yet
Assessment Brief
6 pages
Organizational Behaviour PPT Group 1
No ratings yet
Organizational Behaviour PPT Group 1
37 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
CP4252 ML Unit-Iii
No ratings yet
CP4252 ML Unit-Iii
18 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
Unit 4
No ratings yet
Unit 4
43 pages
D3IT Clustering April 2023
No ratings yet
D3IT Clustering April 2023
70 pages
UnSupervised Learning
No ratings yet
UnSupervised Learning
3 pages
Machine Learning Note Modul 4 5
No ratings yet
Machine Learning Note Modul 4 5
20 pages
M Learning
No ratings yet
M Learning
11 pages
TPR Method for Teaching English to Kids
No ratings yet
TPR Method for Teaching English to Kids
10 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Unit 2 Unsupervised Learning
No ratings yet
Unit 2 Unsupervised Learning
86 pages
Unsupervised Learning: Niveditha. GH
No ratings yet
Unsupervised Learning: Niveditha. GH
10 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
What Is Unsupervised Learning
No ratings yet
What Is Unsupervised Learning
9 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
R20 Machine Learning Unit 4
No ratings yet
R20 Machine Learning Unit 4
49 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Individual and Group Behaviour in An Organization
No ratings yet
Individual and Group Behaviour in An Organization
8 pages
ML Unit 4
No ratings yet
ML Unit 4
17 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unsupervised Learning Explained
No ratings yet
Unsupervised Learning Explained
20 pages
Mid 2
No ratings yet
Mid 2
11 pages
Chapter 3 p4
No ratings yet
Chapter 3 p4
18 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
ML Unsupervised
No ratings yet
ML Unsupervised
35 pages
Le #6 PDF
No ratings yet
Le #6 PDF
5 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Data Analytics-1
No ratings yet
Data Analytics-1
21 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Unit 4
No ratings yet
Unit 4
53 pages
Unit III 1
No ratings yet
Unit III 1
22 pages
Unsupervised Learning in Deep Learning
No ratings yet
Unsupervised Learning in Deep Learning
51 pages
Unit 4
No ratings yet
Unit 4
40 pages
2023 Article 1864
No ratings yet
2023 Article 1864
1 page
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unsupervised Learning Overview
No ratings yet
Unsupervised Learning Overview
25 pages
Module 6.1
No ratings yet
Module 6.1
42 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Unit 4
No ratings yet
Unit 4
29 pages
Faculty of Degree Engineering
No ratings yet
Faculty of Degree Engineering
22 pages
Group 9 Final Manuscript
No ratings yet
Group 9 Final Manuscript
24 pages
Unsupervised Learning Insights
No ratings yet
Unsupervised Learning Insights
10 pages
Amit Yadav
No ratings yet
Amit Yadav
2 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
Improving Students' Speaking Skill by Using Social Media
No ratings yet
Improving Students' Speaking Skill by Using Social Media
10 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
Module 5
No ratings yet
Module 5
91 pages
Unit 4
No ratings yet
Unit 4
74 pages
Active Learning Lesson Plan
No ratings yet
Active Learning Lesson Plan
4 pages
Unit 3 & 4 (p18)
No ratings yet
Unit 3 & 4 (p18)
18 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
Rethinking Executive Function and Its Development
No ratings yet
Rethinking Executive Function and Its Development
15 pages
Unsupervised Learning Guide
No ratings yet
Unsupervised Learning Guide
9 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
Teacher Flexibility and Judgment: A Multidynamic Literacy Theory
No ratings yet
Teacher Flexibility and Judgment: A Multidynamic Literacy Theory
34 pages
Unsupervised Learning for Students
No ratings yet
Unsupervised Learning for Students
59 pages
Machine Learning File
No ratings yet
Machine Learning File
7 pages
01 Introduction Clustering
No ratings yet
01 Introduction Clustering
11 pages
How Gestalt Therapy Views Depression
100% (2)
How Gestalt Therapy Views Depression
23 pages
Digital Collaboration in Education
No ratings yet
Digital Collaboration in Education
3 pages
Peer Socialization in School
No ratings yet
Peer Socialization in School
6 pages
Training Design - PRIMALS 4-6 (Science)
No ratings yet
Training Design - PRIMALS 4-6 (Science)
1 page
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Unit 2 Classification of Psychopathology: DSM Iv TR: Structure
No ratings yet
Unit 2 Classification of Psychopathology: DSM Iv TR: Structure
17 pages
Patel Tirthkumar Mahendrakumar: Date of Birth: 04/12/1999
No ratings yet
Patel Tirthkumar Mahendrakumar: Date of Birth: 04/12/1999
2 pages
Sikkim Manipal University
No ratings yet
Sikkim Manipal University
96 pages
Motivation Theories - Faiza Nadeem
No ratings yet
Motivation Theories - Faiza Nadeem
24 pages
MCA Sem 2 Exam Schedule 2021
No ratings yet
MCA Sem 2 Exam Schedule 2021
1 page
MCA Data Structures Syllabus
No ratings yet
MCA Data Structures Syllabus
6 pages
Valeria Borda Mendoza Resume
No ratings yet
Valeria Borda Mendoza Resume
1 page
Components: Job Requirements Current Status Target Status
No ratings yet
Components: Job Requirements Current Status Target Status
5 pages
Meg 101
No ratings yet
Meg 101
2 pages
Theory of Mediation
No ratings yet
Theory of Mediation
23 pages
Fluency Week Lesson Plan
No ratings yet
Fluency Week Lesson Plan
2 pages
Year 5 Sow 2021
No ratings yet
Year 5 Sow 2021
39 pages
Speech and Language Therapy For Children With Autism
No ratings yet
Speech and Language Therapy For Children With Autism
5 pages
Didactics For Reading and Listening in English
100% (2)
Didactics For Reading and Listening in English
19 pages
Logical Thinking: The Categories of Legitimate Reservation
No ratings yet
Logical Thinking: The Categories of Legitimate Reservation
4 pages

Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique

Uploaded by

Agglomerative Is A Bottom-Up Technique, But Divisive Is A Top-Down Technique

Uploaded by

MCQ

1. k-means clustering algorithm is an example of which type of clustering method?

1. Hierarchical 2. Partitioning 3. Density based 4. Random

1. Agglomerative is a bottom-up technique, but divisive is a top-down technique

2. Agglomerative is a top-down technique, but divisive is a bottomup technique

3. Agglomerative technique can start with a single cluster

4. Divisive technique can end with a single cluster

3. Which of the below is an advantage of k-medoids algorithm over k-means algorithm?

1. both are equally error prone

2. k-medoids can handle larger data set than k-means

3. k-medoids helps in reducing the effect of the outliers in the objects

4. k-medoids needs less computation to arrive at the final clustering

4. The principle underlying the Market Basket Analysis is known as

1. Association rule 2. Bisecting rule 3. k-means 4. Bayes’ theorem

5. A Voronoi diagram is used in which type of clustering?

1. Hierarchical 2. Partitioning 3. Density based 4. Intuition based

6. SSE of a clustering measures:

1. Initial number of set clusters

2. Number of clusters generated

8. Which of the following can be possible termination conditions in K-Means?

1. For a fixed number of iterations.

4. All of the above

9. Which of the following clustering algorithm is most sensitive to outliers?

1. K-means clustering algorithm

2. K-medians clustering algorithm

3. K-medoids clustering algorithm

4. K-modes clustering algorithm

1. Data points with outliers

2. Data points with different densities

3. Data points with round shapes

4. All of the above

2. Mention few application areas of unsupervised learning.

APPLICATION OF UNSUPERVISED LEARNING Because of its flexibility that it can work on

4. Describe how the quality of clustering is measured in the k-means algorithm?

6. What is a dendrogram? Explain its use.

8. Explain the k-means method with a step-by-step algorithm.

2.Explain how the Market Basket Analysis uses the concepts of

=>Examples of market basket analysis

=>Benefits of market basket analysis

3.Explain the Apriori algorithm for association rule learning with an

=>Steps Involved in Apriori Algorithm

4. Order the rules by descending order of Lift.

4. How the distance between clusters is measured in hierarchical

clustering? Explain the use of this measure in making decision on when

to stop the iteration.

5. How to recompute the cluster centroids in the k-means algorithm?

1. Initialize Cluster Centroids

2. Assign datapoints to Clusters

3. Update Cluster centroids

6. Discuss one technique to choose the appropriate number of clusters at

the beginning of clustering exercise.

7. Discuss the strengths and weaknesses of the k-means algorithm.

8. Explain the concept of clustering with a neat diagram.

You might also like