Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
7 views36 pages

Linear-Classifier Lecture 4 SLB

The document introduces linear classifiers as a foundational method for classification tasks, emphasizing their simplicity compared to other techniques like decision trees. It outlines the classification process, examples, and various classification techniques, while also discussing the importance of model accuracy and evaluation through confusion matrices. Additionally, it highlights the geometric interpretation of classification problems and the role of linear discriminant functions in determining decision boundaries.

Uploaded by

Par Veen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views36 pages

Linear-Classifier Lecture 4 SLB

The document introduces linear classifiers as a foundational method for classification tasks, emphasizing their simplicity compared to other techniques like decision trees. It outlines the classification process, examples, and various classification techniques, while also discussing the importance of model accuracy and evaluation through confusion matrices. Additionally, it highlights the geometric interpretation of classification problems and the role of linear discriminant functions in determining decision boundaries.

Uploaded by

Par Veen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Linear Classification

Introduction to Classification using


Linear Classifiers

Last modified 1/1/19


1
Why Start with Linear Classifiers?

• Linear classifiers are the simplest classifiers


• Simpler than decision trees
• Textbook starts with decision trees
• We will use decision trees to introduce some of the more advanced concepts
• Learning method is linear regression
• We will use linear classifiers to introduce some concepts in
classification
• Linear classifier also provides yet one more classification algorithm
• Also helps demonstrate how different algorithms form different types of
decision boundaries

2
Classification: Definition
• Given a collection of records (training set )
• Each record contains a set of attributes and a class attribute
• Model the class attribute as a function of other attributes
• Goal: previously unseen records should be assigned a class
as accurately as possible (predictive accuracy)
• A test set is used to determine the accuracy of the model
• Usually the given labeled data is divided into training and test sets
• Training set used to build the model and test set to evaluate it

3
Classification Examples

• Predicting tumor cells as benign or malignant

• Classifying credit card transactions


as legitimate or fraudulent

• Classifying physical activities based on smartphone sensor data

• Categorizing news stories as finance,


weather, entertainment, sports, etc

4
Classification Techniques

• Decision Tree based Methods


• Memory based reasoning (Nearest Neighbor)
• Neural Networks
• Naïve Bayes
• Support Vector Machines
• Linear Regression (we start with this)

5
The Classification Problem Katydids

Given a collection of 5 instances of


Katydids and five Grasshoppers,
decide what type of insect the
unlabeled corresponds to.
Grasshoppers

Katydid or Grasshopper? 6
For any domain of interest,
we can measure features
Color {Green, Brown, Gray, Other} Has Wings?

Abdomen Thorax
Length Length Antennae
Length

Mandible
Size

Spiracle
Diameter
Leg Length

7
My_Collection
We can store features
Insect Abdomen Antennae Insect Class
in a database. ID Length Length
1 2.7 5.5 Grasshopper
2 8.0 9.1 Katydid
The classification Grasshopper
3 0.9 4.7
problem can now be 4 1.1 3.1 Grasshopper
expressed as: 5 5.4 8.5 Katydid
6 2.9 1.9 Grasshopper
• Given a training database, predict
the class label of a previously 7 6.1 6.6 Katydid
unseen instance 8 0.5 1.0 Grasshopper
9 8.3 6.6 Katydid
10 8.1 4.7 Katydids

previously unseen instance = 11 5.1 7.0 ???????


8
Grasshoppers Katydids

10
9
8
7
Antenna Length

6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Abdomen Length

9
Grasshoppers Katydids

10
9
8
7
Antenna Length

6
5 Each of these data
4 objects are called…
• exemplars
3
• (training) examples
2 • instances
1 • tuples

1 2 3 4 5 6 7 8 9 10
Abdomen Length

10
We will return to the previous
slide in two minutes. In the
meantime, we are going to play
a quick game.

11
Problem 1
Examples of class A Examples of class B

3 4 5 2.5

1.5 5 5 2

6 8 8 3

2.5 5 4.5 3 12
Problem 1 What class is this
object?
Examples of class A Examples of class B

3 4 5 2.5 8 1.5

What about this one,


1.5 5 5 2
A or B?

6 8 8 3

4.5 7
2.5 5 4.5 3 13
Problem 2 Oh! This ones hard!

Examples of class A Examples of class B

4 4 5 2.5 8 1.5

5 5 2 5

6 6 5 3

3 3 2.5 3 14
Problem 3
Examples of class A Examples of class B

6 6

4 4 5 6 This one is really hard!


What is this, A or B?

1 5 7 5

6 3 4 8

3 7 7 7 15
Why did we spend so much
time with this game?

Because we wanted to
show that almost all
classification problems
have a geometric
interpretation, check out
the next 3 slides… 16
10
Problem 1 9
8
Examples of class A Examples of class B 7
6

Left Bar
5
4
3
2
3 4 5 2.5 1

1 2 3 4 5 6 7 8 9 10
Right Bar
1.5 5 5 2

Here is the rule again.


If the left bar is smaller
than the right bar, it is
6 8 8 3
an A, otherwise it is a B.

2.5 5 4.5 3 17
10
Problem 2 9
8
Examples of class A Examples of class B 7
6

Left Bar
5
4
3
2
4 4 5 2.5 1

1 2 3 4 5 6 7 8 9 10
Right Bar
5 5 2 5
Let me look it up… here it is..
the rule is, if the two bars are
equal sizes, it is an A.
Otherwise it is a B.
6 6 5 3

3 3 2.5 3 18
100
Problem 3 90
80
Examples of class A Examples of class B 70
60

Left Bar
50
40
30
20
4 4 5 6 10

10 20 30 40 50 60 70 80 90 100
Right Bar
1 5 7 5

6 3 4 8
The rule again:
if the square of the sum of the
two bars is less than or equal
to 100, it is an A. Otherwise it
3 7 7 7
is a B. 19
Problem 3

• An alternative rule that works on the original training data is X + Y ≤


10 🡺 Class A; else B
• Which is better?
• Ultimately is one right and one wrong?

20
Grasshoppers Katydids

10
9
8
7
Antenna Length

6
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Abdomen Length

21
previously unseen instance = 11 5.1 7.0 ???????

• We can “project” the previously


10 unseen instance into the same space
as the database.
9
8 • We have now abstracted away the
details of our particular problem. It
7
will be much easier to talk about
Antenna Length

6 points in space.
5
4
3
2
1

1 2 3 4 5 6 7 8 9 10
Katydids
Abdomen Length Grasshoppers 22
Simple Linear Classifier

10
9
8
7 R.A. Fisher
1890-1962
6
5 If previously unseen instance above the line
4 then
class is Katydid
3
else
2 class is Grasshopper
1
Katydids
1 2 3 4 5 6 7 8 9 10 Grasshoppers
23
Fitting a Model to Data

• One way to build a predictive model is to specify the structure of the


model with some parameters missing
• Parameter learning or parameter modeling
• Common in statistics but includes data mining methods since fields overlap
• Linear regression, logistic regression, support vector machines

24
Linear Discriminant Functions
• Equation of a line is y = mx + b
• A classification function may look like:
• Class + : if 1.0 × age – 1.5 × balance + 60 > 0
• Class - : if 1.0 × age – 1.5 × balance + 60 ≤ 0
• General form is f(x) = w0 + w1x1 + w2x2 + …
• Parameterized model where the weights for each feature are the
parameters
• The larger the magnitude of the weight the more important the feature
• The separator is a line when 2D, plane with 3D, and hyperplane with
more than 3D

25
What is the Best Separator?
Each separator has a different
10 margin, which is the distance to
9 the closest point. The orange line
has the largest margin.
8
7 For support vector machines, the
6 line/plane with the largest margin
5 is best.
4
3
2
1

1 2 3 4 5 6 7 8 9 10

26
Scoring and Ranking Instances

• Sometimes we want to know which examples are most likely to


belong to a class
• Linear discriminant functions can give us this
• Closer to separator is less confident and further away is more confident
• In fact the magnitude of f(x) give us this where larger values are more
confident/likely

27
Class Probability Estimation

• Class probability estimation is also something you often want


• Often free with methods like decision trees
• More complicated with linear discriminant functions since the
distance from the separator not a probability
• Logistic regression solves this
• We will not go into the details in this class
• Logistic regression determines class probability estimate

28
Classification Accuracy
Predicted class

Class = Katydid (1) Class = Grasshopper (0)


Class = Katydid (1) f11 f10
Actual Class
Class = Grasshopper (0) f01 f00

Confusion Matrix

Number of correct predictions f11 + f00


Accuracy = --------------------------------------------- = -----------------------
Total number of predictions f11 + f10 + f01 + f00

Number of wrong predictions f10 + f01


Error rate = --------------------------------------------- = -----------------------
Total number of predictions f11 + f10 + f01 + f00
29
Confusion Matrix
• In a binary decision problem, a classifier labels examples as either positive
or negative.
• Classifiers produce confusion/ contingency matrix, which shows four
entities: TP (true positive), TN (true negative), FP (false positive), FN (false
negative)

Confusion Matrix

Predicted Predicted
Positive Negative
(+) (-)
Actual
Positive (Y) TP FN

Actual For now responsible for


Negative (N) FP TN knowing Recall and Precision
30
The simple linear classifier is defined for higher dimensional spaces…

31
… we can visualize it as being
an n-dimensional hyperplane

32
Which of the “Problems” can be solved by the Simple
Linear Classifier? 10
9
8
7
6
5
1) Perfect 4
2) Useless 3
2
3) Pretty Good 1
1 2 3 4 5 6 7 8 9 10

100 10
90 9
Problems that can be 80 8
70 7
solved by a linear 60 6
classifier are call 50 5
linearly separable. 40 4
30 3
20 2
10 1
10 20 30 40 50 60 70 80 90 100 1 2 3 4 5 6 7 8 9 10
33
A Famous Problem
R. A. Fisher’s Iris Dataset.
Virginica
• 3 classes
• 50 of each class Versicolor
The task is to classify Iris plants
into one of 3 varieties using Petal Setos
Length and Petal Width. a Setosa Versicolor
Data:
https://archive.ics.uci.edu/ml/datasets/iris

34
Iris Setosa Iris Versicolor Iris Virginica
We can generalize to N classes by fitting N-1 lines. In this case we first learn the line
to discriminate between Setosa and Virginica/Versicolor, then we learned to
approximately discriminate between Virginica and Versicolor.

Virginica

Setosa
Versicolor

If petal width > 3.272 – (0.325 * petal length) then class = Virginica
Elseif petal width… 35
How to Compare Classification Algorithms?

• What criteria do we care about? What matters?


• Performance- predictive accuracy etc.
• Speed and Scalability
• time to construct model
• time to use/apply the model
• Expressive Power
• how flexible is the decision boundary
• Interpretability
• understanding and insight provided by the model
• ability to explain/justify the results
• Robustness
• handling noise, missing values and irrelevant features, streaming data

36

You might also like