Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views33 pages

Lecture 7 Classification

This lecture covers classification techniques, focusing on logistic regression and support vector machines (SVM). It explains logistic regression's types, assumptions, and evaluation metrics, as well as SVM's functionality and performance evaluation methods like confusion matrices and ROC curves. The lecture aims to equip students with the skills to implement these models and assess their effectiveness.

Uploaded by

seanjay665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views33 pages

Lecture 7 Classification

This lecture covers classification techniques, focusing on logistic regression and support vector machines (SVM). It explains logistic regression's types, assumptions, and evaluation metrics, as well as SVM's functionality and performance evaluation methods like confusion matrices and ROC curves. The lecture aims to equip students with the skills to implement these models and assess their effectiveness.

Uploaded by

seanjay665
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Lecture 7: Classification

Kobina Abakah-Paintsil

1
The objectives of this lecture are to:

1. Equip students to use logistic regression;

Lecture 2. Equip students to use support vector


machines;
Objectives
3. Equip students to understand how to
evaluate classification models.

2
Logistic Regression
• Logistic regression is a supervised machine learning algorithm widely used for
binary classification tasks.
• It estimates the probability that an instance belongs to a given class or not.
• Logistic regression predicts the output of a categorical dependent variable.
• The output can be either Yes or No, 0 or 1, true or False, etc. but instead of giving
the exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped
logistic function, which predicts two maximum values (0 or 1).
• The sigmoid function is a mathematical function used to map the predicted
values to probabilities.

3
Types of Logistic Regression
• Binomial: In binomial Logistic regression, there can be only two possible types of
the dependent variables, such as 0 or 1, Pass or Fail, etc.

• Multinomial: In multinomial Logistic regression, there can be 3 or more possible


unordered types of the dependent variable, such as “cat”, “dogs”, or “sheep”

• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered


types of dependent variables, such as “low”, “Medium”, or “High”.

4
Assumptions of Logistic Regression
• Independent observations: Each observation is independent of the other.
meaning there is no correlation between any input variables.
• Binary dependent variables: It takes the assumption that the dependent variable
must be binary or dichotomous, meaning it can take only two values. For more
than two categories SoftMax functions are used.
• Linearity relationship between independent variables and log odds: The
relationship between the independent variables and the log odds of the
dependent variable should be linear.
• No outliers: There should be no outliers in the dataset.
• Large sample size: The sample size is sufficiently large

5
Terminologies of Logistic Regression
• Logistic function: The formula used to represent how the independent and
dependent variables relate to one another. The logistic function transforms the
input variables into a probability value between 0 and 1.
• Odds: It is the ratio of something occurring to something not occurring. It is
different from probability as the probability is the ratio of something occurring to
everything that could possibly occur.
• Log-odds: The log-odds, also known as the logit function, is the natural logarithm
of the odds. In logistic regression, the log odds of the dependent variable are
modeled as a linear combination of the independent variables and the intercept.
• Coefficient: The logistic regression model’s estimated parameters, show how the
independent and dependent variables relate to one another.
• Intercept: A constant term in the logistic regression model, which represents the
log odds when all independent variables are equal to zero. 6
Logistic Regression
• Probability needs to satisfy two basic conditions:
• Always positive i.e. > 0
• Always less than or equal to 1
• 𝑦 = 𝑏0 + 𝑏1 𝑥 (SLR)
Always positive

• 𝑒𝑦
Make it less than 1

𝑒𝑦
• (Probability of success)
𝑒 𝑦 +1

7
Logistic Regression

𝑒𝑦 𝑒𝑦
P= then Probability of failure, 𝑄 = 1 − 𝑃 = 1 −
𝑒 𝑦 +1 𝑒 𝑦 +1
𝑒𝑦 + 1 − 𝑒𝑦 1
𝑄= 𝑦
= 𝑦
𝑒 +1 𝑒 +1
𝑃(𝑆𝑢𝑐𝑐𝑒𝑠𝑠) 𝑃
Therefore, 𝑂𝑑𝑑𝑠 = = = 𝑒𝑦
𝑃(𝐹𝑎𝑖𝑙𝑢𝑟𝑒) 1−𝑃
𝑃
𝑙𝑜𝑔 𝑂𝑑𝑑𝑠 = 𝑙𝑜𝑔 = 𝑙𝑜𝑔 𝑒 𝑦
1−𝑃
𝑃
𝑙𝑜𝑔 = 𝑦 = 𝑏0 + 𝑏1 𝑥
1−𝑃

8
Logistic Regression

9
Stratification
Splitting Data Without Stratification – Grouping of Like responses

Train Data

Test Data

10
Stratification
Splitting Data Without Stratification –Imbalanced Split

Train Data

Test Data

11
Stratification
Stratification with 50% split

Train Data

Test Data

12
Confusion Matrix
• A confusion matrix, also known as an error matrix, is a table that summarizes the
performance of a machine learning model on a set of test data. It is useful for
evaluating the performance of classification algorithms.

13
Support Vector Machine
• A Support Vector Machine (SVM) is a supervised machine learning algorithm that
classifies data by finding an optimal line or hyperplane in an N-dimensional
space.
• The goal is to maximize the distance between each class, ensuring effective
separation.
• SVMs can handle both linear and non-linear classification tasks using a
technique called the kernel trick, which implicitly maps data into higher-
dimensional feature spaces.
• They are useful for binary classification and regression analysis.

14
Support Vector Machine
Hyperplane
• A plane in 1D = point; 2D = line; 3D = plane; 4D = Hyperplane

15
Support Vector Machine
Hyperplane Margin is maximum distance between the
nearest data points and the hyperplane.

16
Support Vector Machine
The Mathematics
𝑥1 𝑥2
𝑉1 = 𝑉2 =
𝑦1 𝑦2
𝑉1 ∙ 𝑉2 = 𝑉1 𝑉2 cos 𝜃

𝑤
cos 𝜃 = ֜ 𝑤 = 𝑉1 cos 𝜃
𝑉1
𝑉1 ∙𝑉2 𝑉1 ∙𝑉2
Also, cos 𝜃 = ; 𝑤 = 𝑉1
𝑉1 𝑉2 𝑉1 𝑉2
𝒖
𝑤 = 𝑉1 ∙ 𝑢
Where u is a unit vector
17
Support Vector Machine
Selecting The Right Hyperplane and Class Determination If normal vector
magnitude is c
and b is the
distance between
the hyperplane
and positive
𝒘 = 𝑽𝟏 ∙ 𝒖 hyperplane then:

Margin = 2*b
𝑽𝟏
Normal Vector +ve plane = c+b

-ve plane = c-b


18
Support Vector Machine
Changing Perspectives

Gaussian
Transformation

19
Support Vector Machine
Radial Basis Function

20
Support Vector Machine
Radial Basis Function

1
𝛾= 2
2𝜎
21
Support Vector Machine
Radial Basis Function

22
Support Vector Machine
Types of Kernel Functions

23
Iris Dataset

24
Iris Dataset

25
Evaluating Classification Models
• Accuracy is the proportion of total number of correct predictions.
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠+𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
• Accuracy =
𝑇𝑜𝑡𝑎𝑙 𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠

• Accuracy is a poor evaluator for dealing with class-imbalanced data.

26
Evaluating Classification Models
• Precision is the proportion of correct positive results out of all predicted positive
results.
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
• Precision =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

• Recall (Sensitivity) is the proportion of actual positive cases predicted correctly.


𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠
• Recall =
𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠

• F1 score, also known as the F-measure, combines precision and recall into a
single metric. It is the harmonic mean of precision and recall.
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛×𝑟𝑒𝑐𝑎𝑙𝑙
• F1 score = 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙

• F1 score helps you find a balance between avoiding false positives (precision)
and minimizing false negatives (recall). 27
Evaluating Classification Models
• Specificity or Selectivity is the proportion of actual negative cases predicted
correctly.
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠
• Specificity =
𝑇𝑟𝑢𝑒 𝑁𝑒𝑔𝑎𝑡𝑖𝑣𝑒𝑠+𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒𝑠

• High Precision is preferred when its OK to have false negatives.


• High recall is preferred when cost of false negative is very high.

28
Evaluating Classification Models
Threshold and Adjusting Threshold

New Threshold 1
Accuracy = 0.8616
Precision = 0.84375
Recall = 0.9818
New Threshold 2

29
Evaluating Classification Models
AUC ROC Curve
• The AUC-ROC curve stands for the Area Under the Receiver Operating
Characteristic curve.
ROC (Receiver Operating Characteristics) Curve:
• The ROC curve is a graphical representation of how well a binary classification
model performs across different classification thresholds.
• It plots the True Positive Rate (TPR) against the False Positive Rate (FPR).
• TPR (Recall) represents the proportion of actual positive instances correctly
predicted by the model.
• FPR (1 - specificity) represents the proportion of actual negative instances
incorrectly predicted as positive by the model 30
Evaluating Classification Models
AUC ROC Curve
AUC (Area Under Curve):
• The AUC summarizes the overall performance of the binary classification model.
• It represents the area under the ROC curve.
• A greater AUC value indicates better model performance.
• The AUC measures the probability that the model assigns a randomly chosen
positive instance a higher predicted probability than a randomly chosen negative
instance.

31
Evaluating Classification Models
AUC ROC Curve

32
Reading Assignment
1. Read and write short notes on the following classification models:
i. Decision Trees
ii. Random Forest
2. Read and write short notes on the following classification evaluation
metrics:
i. Macro-averaged F1 score
ii. Micro-averaged F1 score
iii. Sample-weighted F1 score
iv. Fβ score

33

You might also like