Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views52 pages

Classification

The document provides an introduction to classification methods in data mining, detailing three types of learning: supervised, unsupervised, and semi-supervised. It explains various classification techniques such as K-Nearest Neighbors, Decision Trees, and Naïve Bayes, along with their applications in fields like credit risk assessment and spam detection. Additionally, it discusses the process of forming decision trees and the importance of attribute selection in improving model accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views52 pages

Classification

The document provides an introduction to classification methods in data mining, detailing three types of learning: supervised, unsupervised, and semi-supervised. It explains various classification techniques such as K-Nearest Neighbors, Decision Trees, and Naïve Bayes, along with their applications in fields like credit risk assessment and spam detection. Additionally, it discusses the process of forming decision trees and the importance of attribute selection in improving model accuracy.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to Classification

Dr. techn. Annisa Maulida Ningtyas, M.Eng.


Metode Learning Algoritma Data Mining

Supervised Semi- Unsupervised


Supervised
Learning Learning Learning

Association based Learning

2
1. Supervised Learning

• Pembelajaran dengan guru, data set


memiliki target/label/class
• Sebagian besar algoritma data mining
(estimation, prediction/forecasting,
classification) adalah supervised learning
• Algoritma melakukan proses belajar
berdasarkan nilai dari variabel target yang
terasosiasi dengan nilai dari variable
prediktor

3
Dataset dengan Class

Attribute/Feature/Dimension Class/Label/Target

Nominal

Numerik
2. Unsupervised Learning

• Algoritma data mining mencari pola dari


semua variable (atribut)
• Variable (atribut) yang menjadi
target/label/class tidak ditentukan (tidak ada)
• Algoritma clustering adalah algoritma
unsupervised learning
Dataset tanpa Class

Attribute/Feature/Dimension

6
3. Semi-Supervised Learning

• Semi-supervised learning
adalah metode data
mining yang
menggunakan data
dengan label dan tidak
berlabel sekaligus dalam
proses pembelajarannya

• Data yang memiliki kelas


digunakan untuk
membentuk model
(pengetahuan), data tanpa
label digunakan untuk
membuat batasan antara
kelas 7
Metode Data Mining
1. Estimation (Estimasi):
Linear Regression (LR), Neural Network (NN), Deep Learning
(DL), Support Vector Machine (SVM), Generalized Linear Model
(GLM), etc
2. Forecasting (Prediksi/Peramalan):
Linear Regression (LR), Neural Network (NN), Deep Learning
(DL), Support Vector Machine (SVM), Generalized Linear Model
(GLM), etc
3. Classification (Klasifikasi):
Decision Tree (CART, ID3, C4.5, Credal DT, Credal C4.5,
Adaptative Credal C4.5), Naive Bayes (NB), K-Nearest Neighbor
(kNN), Linear Discriminant Analysis (LDA), Logistic Regression
(LogR), etc
4. Clustering (Klastering):
K-Means, K-Medoids, Self-Organizing Map (SOM), Fuzzy C-
Means (FCM), etc
5. Association (Asosiasi):
FP-Growth, A Priori, Coefficient of Correlation, Chi Square, etc

8
Output/Pola/Model/Knowledge

1. Formula/Function (Rumus atau Fungsi Regresi)


• WAKTU TEMPUH = 0.48 + 0.6 JARAK + 0.34 LAMPU + 0.2 PESANAN

2. Decision Tree (Pohon Keputusan)

3. Tingkat Korelasi

4. Rule (Aturan)
• IF ips3=2.8 THEN lulustepatwaktu

5. Cluster (Klaster)

9
Classification
What is Classification?

Approach:
• Given a collection of records (training set)
• each record contains a set of attributes
• one of the attributes is the class (label) that should be predicted.
• Learn a model for the class attribute as a function of the
values of other attributes.

Variants:
• single-class problems (class labels e.g. true/false or fraud/no fraud)
• multi-class problems (class labels e.g. low, medium, high)
• Multi-label problems (class labels e.g in text classification, an article
can be about 'Technology,' 'Health,' and 'Travel’)
Introduction to Classification

A Couple of Questions:
□ What is this?
□ Why do you know?
□ How have you come to that
knowledge?
Goal: Learn a model for recognizing a concept,
e.g. fruit

□ Training data:

"fruit" "fruit" "fruit"

“not a fruit" “not a fruit" “not a fruit"


Model Learning and Model Application
Process
Classification Examples

□ Credit Risk Assessment


• Attributes: your age, income, debts, …
• Class: Are you getting credit by your bank?
□ Marketing
• Attributes: previously bought products, browsing behaviour
• Class: Are you a target customer for a new product?
□ Tax Fraud
• Attributes: the values in your tax declaration
• Class: Are you trying to cheat?
□ SPAM Detection
• Attributes: words and header fields of an e-mail
• Class: Is it a spam e-mail?
Classification Techniques

1. K-Nearest-Neighbors
2. Decision Trees
3. Rule Learning
4. Naïve Bayes
5. Support Vector Machines
6. Artificial Neural Networks
7. Deep Neural Networks
8. Many others …
K-Nearest Neighbours (KNN)
K-Nearest-Neighbors

Example Problem
– Predict what the current weather
is in a certain place
– where there is no weather station.
– How could you do that?
Basic Idea

□ Use the average of


the nearest stations
□ Example:
• 3x sunny
• 2x cloudy
• result = sunny
□ This approach is called K-Nearest-Neighbors
• where k is the number of neighbors to consider
• in the example: k=5
• in the example: “near” denotes geographical proximity
Basic Idea
Impact of K value in KNN
Impact of K value in KNN
Discussion of K-Nearest-Neighbor
Classification

□ Often very accurate


• for instance for optical character
recognition (OCR)
□ … but slow
• as training data needs to be searched
□ Assumes that all attributes are equally important
• remedy: attribute selection or attribute weights
□ Can handle decision boundaries which are not
parallel to the axes (unlike decision trees)
Lazy versus Eager Learning

□ Lazy Learning
• Instance-based learning approaches, like KNN, are also called
lazy learning as no explicit knowledge (model) is learned
• Single goal: Classify unseen records as accurately as possible

□ Eager Learning
• but actually, we might have two goals
1. classify unseen records
2. understand the application domain as a human
• Eager learning approaches generate models that are (might be)
interpretable by humans
• Examples of eager techniques: Decision Tree Learning, Rule
Learning
Decision Tree
Why Decision Tree?
• A decision tree is a prediction model that works by creating a flowchart-
like structure (tree-like structure).
• Tree-like Structure: The model resembles an upside-down tree that starts
from a single point (the root) and branches out. Information flows from
top to bottom, with decisions being made at each step.
• Internal Nodes Represent Attribute Tests: Each non-leaf node in the tree
represents a question or test about a specific feature (attribute) in your
dataset. For example, in a medical diagnosis tree, a node might ask "Is
the patient's temperature above 100°F?"
• Branches Represent Attribute Values: The lines connecting nodes are
branches that represent possible answers or outcomes of the test at the
internal node. Following our temperature example, the branches might
be "Yes" and "No" coming out from the temperature test node.
• Leaf Nodes Represent Final Decisions or Predictions: When you follow a
path from the root to the end of a branch, you reach a leaf node, which
provides the final prediction or decision. In a classification problem, this
might be a category label (like "has heart disease" or "doesn't have heart
disease"). In a regression problem, it would be a numerical value.
Intuition
1. Root Node (Income)

First Question: “Is the person’s income greater than $50,000?”

• If Yes, proceed to the next question.

• If No, predict “No Purchase” (leaf node).

2. Internal Node (Age):

If the person’s income is greater than $50,000, ask: “Is the


person’s age above 30?”

• If Yes, proceed to the next question.

• If No, predict “No Purchase” (leaf node).

3. Internal Node (Previous Purchases):

• If the person is above 30 and has made previous


purchases, predict “Purchase” (leaf node).

• If the person is above 30 and has not made previous


purchases, predict “No Purchase” (leaf node).
Forming a Decision Tree From
Training Data

age income student credit_rating buys_computer Need to be understood first


<=30 high no fair no regarding:
<=30 high no excellent no • Dataset
31…40 high no fair yes • Atribute
>40 medium no fair yes • Label / class
>40 low yes fair yes
• Data types
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no How can we obtain a model from
<=30 low yes fair yes this training data that can classify
automatically?
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Forming a Decision Tree From
Training Data
age income student credit_rating buys_computer
<=30
<=30
high
high
no
no
fair
excellent
no
no
Model Decision Tree
31…40 high no fair yes
>40 medium no fair yes age?
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no <=30 31..40 >40
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes student? yes credit rating?
>40 medium no excellent no

no yes excellent fair


Rule:
IF ((age<=30) and (student) ) OR (age=31..40) OR
((age>40) and (credit_rating=fair))
THEN no yes yes
BELI_PC=YES
Membangun Decision Tree Dari
Data Latih
age income student credit_rating buys_computer
<=30
<=30
high
high
no
no
fair
excellent
no
no
Model Decision Tree
31…40 high no fair yes
>40 medium no fair yes age?
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no <=30 31..40 >40
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes student? yes credit rating?
>40 medium no excellent no

no yes excellent fair


How to choose which attribute
becomes the root? Presented first,
no yes yes
and so on
Attribute Selection

age income student credit_rating buys_computer


Let's try the attribute student as the root
<=30 high no fair no
<=30
31…40
high
high
no
no
excellent
fair
no
yes
of our decision tree.
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40
<=30
low
medium
yes
no
excellent
fair
yes
no
Yes
<=30 low yes fair yes
>40
<=30
medium
medium
yes
yes
fair
excellent
yes
yes
No
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
The attribute student is not
student? suitable as a separator, because it
cannot separate the labels well.
yes no
Attribute Selection

age income student credit_rating buys_computer


<=30
<=30
high
high
no
no
fair
excellent
no
no
Let's try the attribute age
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40
<=30
low
medium
yes
no
excellent
fair
yes
no
Yes
<=30 low yes fair yes
>40
<=30
medium
medium
yes
yes
fair
excellent
yes
yes
No
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
The age attribute is suitable as a
age? separator because it can effectively
separate the labels.
<=30 >40
30..40
Attribute Selection

age income student credit_rating buys_computer


<=30
<=30
high
high
no
no
fair
excellent
no
no
Let's try the attribute age
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40
<=30
low
medium
yes
no
excellent
fair
yes
no
Yes
<=30 low yes fair yes
>40
<=30
medium
medium
yes
yes
fair
excellent
yes
yes
No
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

age?
<=30 >40
30..40
• More Predictiveness
• Less Impurity
• Lower Entropy
Attribute Selection

age income student credit_rating buys_computer


<=30
<=30
high
high
no
no
fair
excellent
no
no
Let's try the attribute age
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40
<=30
low
medium
yes
no
excellent
fair
yes
no
Yes
<=30 low yes fair yes
>40
<=30
medium
medium
yes
yes
fair
excellent
yes
yes
No
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

age?
<=30 >40
30..40

• Pure node
Entropy

• A measure of purity, 1 yes 3 yes


diversity, randomness, or 7 no 5 no

uncertainty.
Low Entropy High Entropy
• The smaller the entropy
value, the more
homogeneous the 0 yes
8 no
4 yes
4 no
distribution, and the
purer the node. Entropy = 0 Entropy = 1
*Info(D) = Entropy (D)

m
Info( D) = − pi log 2 ( pi )
i =1
Entropy Calculation

age income student credit_rating buys_computer


<=30 high no fair no
Calculate the entropy of the 'buys_computer'
1
<=30 high no excellent no class/label (data entropy)
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes • buys_computer = “yes” → 9 data
>40 low yes excellent no • buys_computer = “no” → 5 data
31…40 low yes excellent yes
<=30 medium no fair no
• Total → 14 data
<=30 low yes fair yes
>40 medium yes fair yes 9 9 5 5
<=30 medium yes excellent yes Info( D) = I (9,5) = − log 2 ( ) − log 2 ( ) =0.940
14 14 14 14
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no

m v | Dj |
Info( D) = − pi log 2 ( pi ) InfoA ( D) =   I (D j )
i =1 j =1 |D|
Entropy Calculation

age income student credit_rating buys_computer


<=30 high no fair no 2 Calculate the entropy of each attribute
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes Attribute “age”
>40 low yes fair yes age “yes” “no” I(yes,no)
>40 low yes excellent no
31…40 low yes excellent yes <=30 2 3 0,971
<=30 medium no fair no
<=30 low yes fair yes 31..40 4 0 0
>40 medium yes fair yes >40 3 2 0,971
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes 2 2 3 3
>40 medium no excellent no Info(2,3) = I (2,3) = − log 2 ( ) − log 2 ( ) =0,971
5 3 5 5

m v | Dj | 5 4 5
Info( D) = − pi log 2 ( pi ) InfoA ( D) =   I (D j ) Infoage ( D) =
14
I (2,3) +
14
I (4,0) +
14
I (3,2) = 0,694
i =1 j =1 |D|
Entropy Calculation

age income student credit_rating buys_computer


<=30 high no fair no 3 Calculate the entropy of each attribute
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes Attribute “income”
>40 low yes fair yes income “yes” “no” I(yes,no)
>40 low yes excellent no
31…40 low yes excellent yes low 3 1 0,811
<=30 medium no fair no
<=30 low yes fair yes medium 4 2 0,918
>40 medium yes fair yes high 2 2 1
<=30 medium yes excellent yes
31…40 medium no excellent yes
4 6 4
31…40 high yes fair yes Infoincome ( D) = I (3,1) + I (4,2) + I (2,2) = 0,911
>40 medium no excellent no 14 14 14

m v | Dj |
Info( D) = − pi log 2 ( pi ) InfoA ( D) =   I (D j )
i =1 j =1 |D|
Entropy Calculation

age income student credit_rating buys_computer


<=30 high no fair no 4 Calculate the entropy of each attribute
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes Atribut “student”
>40 low yes fair yes student “yes” “no” I(yes,no)
>40 low yes excellent no
31…40 low yes excellent yes yes 6 1 0,592
<=30 medium no fair no
<=30 low yes fair yes no 3 4 0,985
>40 medium yes fair yes
<=30 medium yes excellent yes
7 7
31…40 medium no excellent yes Infostudent ( D) = I (6,1) + I (3,4) = 0,788
31…40 high yes fair yes 14 14
>40 medium no excellent no

m v | Dj |
Info( D) = − pi log 2 ( pi ) InfoA ( D) =   I (D j )
i =1 j =1 |D|
Entropy Calculation

age income student credit_rating buys_computer


<=30 high no fair no 5 Calculate the entropy of each attribute
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes Atribut “credit_rating”
>40 low yes fair yes credit_rating “yes” “no” I(yes,no)
>40 low yes excellent no
31…40 low yes excellent yes fair 6 2 0,811
<=30 medium no fair no
<=30 low yes fair yes excellent 3 3 1
>40 medium yes fair yes
<=30 medium yes excellent yes
8 6
31…40 medium no excellent yes Infocredit _ rating ( D) = I (6,2) + I (3,3) = 0,892
31…40 high yes fair yes 14 14
>40 medium no excellent no

m v | Dj |
Info( D) = − pi log 2 ( pi ) InfoA ( D) =   I (D j )
i =1 j =1 |D|
Information Gain

• Information gain (often referred to Entropy before the split

simply as 'gain') is the E = 0,940

information/value that increases the


degree of certainty of an attribute
after it is split
• Gain(A) describes how much the student?
entropy decreases due to attribute yes no
A. The larger the gain, the better.

E = 0,592 E = 0,985

Entropy after the split


Gain(A) = Info(D) − InfoA(D)
Hitung Information Gain Setiap
Atribut

Gain(A) = Info(D) − InfoA(D) Atribut Entropi Atribut


buys_computer 0,904
age 0,694
income 0,911
student 0,788
credit_rating 0,892

• Gain (age) = 0,940 – 0,694 = 0,246 The attribute age has


• Gain (income) = 0,940 – 0,911 = 0,029 the highest
• Gain (student) = 0,940 – 0,788 = 0,152 information gain, so it
• Gain (credit_rating) = 0,940 – 0,892 = 0,048 is selected as the initial
node (root)
Forming a Decision Tree

age? • After the age attribute, which


attribute comes next?
<=30 31..40 >40 • Processed for each branch as
long as there is more than one
class.
? yes ?
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
Tidak perlu diproses, karena 31…40 high no fair yes
>40 medium no fair yes
hanya 1 kemungkinan kelas
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>40 medium no excellent no
Forming a Decision Tree

Next... process data where age<=30


age income student credit_rating buys_computer 2 2 3 3
Info( D) = I (2,3) = − log 2 ( ) − log 2 ( ) =0.97
<=30 high no fair no 5 5 5 5
<=30 high no excellent no
<=30 medium no fair no
<=30 low yes fair yes
Calculate the attribute Gain:
<=30 medium yes excellent yes • Gain(Age): no need to calculate again
• Gain(income)
• Gain(student)
• Gain(credit_rating)

2 2 1
Infoincome ( D) = I (0,2) + I (1,1) + I (1,0) = 0.4 Gain (income) = Info(D) – Infoincome(D) = 0.97 – 0.4 =
5 5 5 0.57
3 2
Infostudent ( D) = I (0,3) + I (2,0) = 0 Gain (student) = Info(D) – Infostudent(D) = 0.97 – 0 = 0.97
5 5
3 2
Infocredit _ rating ( D) = I (1,2) + I (1,1) = 0.95 Gain (credit_rating) = Info(D) – Infocredit_rating(D) = 0.97 – 0.95 = 0.02
5 5
Forming a Decision Tree

age?

<=30 31..40 >40

student? yes ?
CONTINUE…
no yes

no yes
Another way to form a Decision
Tree
• Using Gini Index (GI) or Impurity
• Langkah:
• Calculate the Gini Index (GI) for each
attribute
• Determine the root based on the GI
value. The root is the attribute with the
smallest GI value
• Repeat steps 1 and 2 for the next levels
of the tree until the GI value = 0
Evaluation Metrics

• Choosing the right evaluation metrics is crucial


after selecting a classification model
• We will cover the most commonly used metrics
for classification tasks
Evaluation Metrics

https://www.datacamp.com/blog/classification-machine-learning
Evaluation Metrics

https://www.datacamp.com/blog/classification-machine-learning
Evaluation Metrics

https://www.datacamp.com/blog/classification-machine-learning
References:

• https://romisatriawahono.net/lecture/dm/romi-
dm-apr2020.pptx
• https://www.geeksforgeeks.org/decision-tree-
introduction-example/
• https://www.datacamp.com/blog/classification-
machine-learning
• DTS
Terimakasih

You might also like