Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views3 pages

A. Decision Trees: o o o o o

This document discusses classification and clustering techniques in data analysis. It covers various classification methods such as Decision Trees, K-Nearest Neighbors, Logistic Regression, and Discriminant Analysis, along with their definitions, advantages, and use cases. Additionally, it introduces clustering techniques like K-Means, Hierarchical Clustering, and DBSCAN, as well as Market Basket Analysis for identifying associations between purchased items.

Uploaded by

ASOK KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

A. Decision Trees: o o o o o

This document discusses classification and clustering techniques in data analysis. It covers various classification methods such as Decision Trees, K-Nearest Neighbors, Logistic Regression, and Discriminant Analysis, along with their definitions, advantages, and use cases. Additionally, it introduces clustering techniques like K-Means, Hierarchical Clustering, and DBSCAN, as well as Market Basket Analysis for identifying associations between purchased items.

Uploaded by

ASOK KUMAR
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

UNIT – IV CLASSIFICATION AND CLUSTERING TECHNIQUES

1. Classification Techniques
Definition: Classification is a supervised learning technique where the goal is
to assign labels (categories) to data based on input features.
A. Decision Trees
 Definition: A tree-like structure where data is split based on conditions to
reach a decision.
 Components:
o Root Node: Represents the entire dataset.
o Internal Nodes: Represent tests on attributes.
o Leaf Nodes: Represent the outcome (class label).
 Advantages:
o Easy to understand and interpret.
o Works well with both numerical and categorical data.
 Use Case: Classifying loan applications as "Approved" or "Rejected"
based on income, credit score, etc.
B. K-Nearest Neighbors (KNN)
 Definition: Classifies a new data point based on the majority class of its
k nearest neighbors in the dataset.
 Steps:
1. Choose k (number of neighbors).
2. Calculate distance (e.g., Euclidean) between new point and all
others.
3. Assign the class most common among k neighbors.
 Use Case: Recommender systems (e.g., suggesting products similar to
others liked by a user).
C. Logistic Regression
 Definition: A statistical model used for binary classification (e.g.,
Yes/No, 0/1).
 Formula:
P(Y=1)=11+e−(β0+β1X1+β2X2+...+βnXn)P(Y=1) = \frac{1}{1 + e^{-(\beta_0 + \
beta_1X_1 + \beta_2X_2 + ... + \beta_nX_n)}}P(Y=1)=1+e−(β0+β1X1+β2X2+...+βnXn
)1
 Output: Probability of class membership (converted to 0 or 1 using a
threshold).
 Use Case: Predicting if a customer will buy a product or not.
D. Discriminant Analysis
 Definition: A classification technique that models the difference between
classes using linear combinations of features.
 Types:
o Linear Discriminant Analysis (LDA): Assumes equal variance
across classes.
o Quadratic Discriminant Analysis (QDA): Allows different
variances.
 Use Case: Classifying customers into loyalty tiers (e.g., Silver, Gold,
Platinum).

2. Clustering Techniques
Definition: Clustering is an unsupervised learning method where the goal is to
group data points into clusters based on similarity.
Key Concept: Unlike classification, clustering does not use pre-labeled
data.
Popular Methods:
 K-Means Clustering:
o Divides data into k clusters by minimizing distance between data
points and the cluster centroids.
o Use Case: Customer segmentation.
 Hierarchical Clustering:
o Builds a tree of clusters using either a bottom-up (agglomerative)
or top-down (divisive) approach.
o Use Case: Grouping documents or behaviors with unknown
categories.
 DBSCAN:
o Groups data based on density rather than distance, good for
arbitrary-shaped clusters.
o Use Case: Detecting anomalies or fraud.

3. Market Basket Analysis


 Definition: A technique used to find associations between items
purchased together.
 Goal: Identify frequent itemsets and generate association rules.
Key Terms:
 Support: Frequency of itemset in transactions.
 Confidence: Likelihood that item B is bought when item A is bought.
 Lift: Strength of a rule over random occurrence.
Use Case:
 In a retail store:
o If customers buy bread and butter → they also tend to buy milk.
o Rule: {Bread, Butter} → {Milk}
 Enables cross-selling and store layout optimization.

Comparison Table
Technique Type Use Case Output
Decision Trees Classification Loan approval Class label
K-Nearest Neighbors Classification Product recommendation Class label
Technique Type Use Case Output
Logistic Regression Classification Purchase prediction Probability + Class
Discriminant Analysis Classification Customer tier prediction Class label
K-Means Clustering Clustering Market segmentation Cluster assignment
Hierarchical Document or gene Cluster tree
Clustering
Clustering grouping (dendrogram)
Market Basket Association Cross-sell
Association rules
Analysis Mining recommendations

Visual Aids Suggestions for Lecture


 Flowchart of decision tree classification
 K-means scatterplot with clusters
 ROC curve for logistic regression
 Dendrogram from hierarchical clustering
 Table showing example of association rules: {Milk, Bread} → {Butter}

Conclusion
 Classification helps predict known categories using labeled data.
 Clustering helps discover hidden groupings without labels.
 Market Basket Analysis reveals buying patterns to improve marketing
and sales strategies.

You might also like