MACHINE
LEARNING
INTRODUCTION
1
COURSE INTRO
Pre-requisite
Introductory knowledge of Artificial Intelligence,
Probability, Statistics and Linear Algebra
Course Resources
Lectures slides, assignments (computer/written),
solutions to problems, projects.
2
Machine Learning
COURSE INTRO
Book
1. Introduction to Machine Learning, Ethem
Alpaydin, MIT Press, 2010.
2. Machine Learning, Tom, M., McGraw Hill, 1997
3. Deep Learning by Ian Goodfellow, Yoshua
Bengio, Aaron Courville(
http://www.deeplearningbook.org/)
4. Deep learning with python by Francoise Chollet,
ISBN-10: 9781617294433, 2017
3
INTRODUCTION
To solve a problem on a computer, we need
an algorithm. An algorithm is a sequence of
instructions that should be carried out to
transform the input to output.
For example, one can devise an algorithm for
sorting. The input is a set of numbers and the
output is their ordered list.
4
INTRODUCTION
For some tasks, however, we do not have an
algorithm—for example, to tell spam emails
from legitimate emails. We know what the
input is: an email document that in the
simplest case is a file of characters.
We know what the output should be: a yes/no
output indicating whether the message is
spam or not. We do not know how to
transform the input to the output.
5
INTRODUCTION – DATA DRIVEN
SOLUTIONS
What we lack in knowledge, we make up for it in DATA.
We can easily compile thousands of example messages
some of which we know to be spam and what we want
is to “learn” what consititutes spam from them.
In other words, we would like the computer (machine)
to extract automatically the algorithm for this task.
With advances in computer technology, we currently
have the ability to store and process large amounts of
data, as well as to access it from physically distant
locations over a computer network.
6
INTRODUCTION – DATA DRIVEN
SOLUTIONS
Examples:
In finance, banks analyze their past data to
build models to use in credit applications,
fraud detection, and the stock market.
In manufacturing, learning models are used
for optimization, control, and
troubleshooting.
In medicine, learning programs are used for
medical diagnosis.
In telecommunications, call patterns are
analyzed for network optimization and
7
maximizing the quality of service.
INTRODUCTION – DATA DRIVEN
SOLUTIONS
Machine learning also helps us find solutions to
many problems in vision, speech recognition, and
robotics.
Let us take the example of recognizing faces:
This is a task we do effortlessly; every day we
recognize family members.
But we do it unconsciously and are unable to
explain how we do it.
Each person’s face is a pattern composed of a
particular combination of these. By analyzing
sample face images of a person, a learning
program captures the pattern specific to that
8
person and then recognizes by checking for this
pattern in a given image.
INTRODUCTION – DATA DRIVEN
SOLUTIONS
In machine learning, we may not be able to identify
the process completely, but we believe we can
construct a good and useful approximation.
That approximation may not explain everything, but
may still be able to account for some part of the
data.
We believe that though identifying the complete
process may not be possible, we can still detect
certain patterns or regularities.
This is the role of machine learning. Such patterns
may help us understand the process, or we can use
those patterns to make predictions: Assuming that
the future, at least the near future, will not be much
9
different from the past when the sample data was
collected.
WHAT IS MACHINE LEARNING?
Make the machine Evaluate how good
‘learn’ some thing the machine has
‘learned’
10
MACHINE LEARNING
Field of study that gives computers
the ability to learn without being
explicitly programmed.
Arthur Sameul (1959)
11
MACHINE LEARNING
Machine learning is programming
computers to optimize a
performance criterion using
example data or past experience.
Tom Mitchell (1998)
12
LEARNING PROBLEMS – EXAMPLES
Learning = Improving with experience over
some task
Improve over task T,
With respect to performance measure P,
Based on experience E.
Example
T = Play checkers
P = % of games won in a tournament
E = opportunity to play against itself
13
LEARNING PROBLEMS – EXAMPLES
Handwriting recognition
learning problem
Task T: recognizing
handwritten words within
images
Performance measure P:
percent of words correctly
recognized
Training experience E: a
database of handwritten
words with given
classifications
14
LEARNING PROBLEMS – EXAMPLES
A robot driving learning
problem
Task T: driving on public four-
lane highways using vision
sensors
Performance measure P:
average distance traveled
before an error (as judged by
human overseer)
Training experience E: a
sequence of images and
steering commands recorded
while observing a human 15
driver
MACHINE LEARNING
Nicolas learns about trucks and dumpers
16
MACHINE LEARNING
But will he recognize others?
So learning involves ability to
generalize from labeled
examples
17
MACHINE LEARNING
There is no need to “learn” to calculate payroll
Learning is used in:
Data mining programs that learn to detect fraudulent
credit card transactions
Programs that learn to filter spam email
Programs that learn to play checkers/chess
Autonomous vehicles that learn to drive on public
highways
Self customizing programs
And many more…
18
MACHINE LEARNING
19
Applications
CREDIT SCORING
Differentiating
between low-risk and
high-risk customers
from their income
and savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk
ELSE high-risk 20
Applications
AUTONOMOUS DRIVING
ALVINN* – Drives 70mph on highways
*Autonomous Land Vehicle In 21
a Neural Network
Applications
FACE RECOGNITION
Training examples of a person
Test images
AT&T Laboratories, Cambridge UK
http://www.uk.research.att.com/facedatabase.html
22
Applications
OCR & HANDWRITING RECOGNITION
23
TRAINING DATA – SOME IMPORTANT
CONSIDERATIONS
The first design choice is the type of training
experience from which our system will learn. The type
of training experience available can have a significant
impact on success or failure of the learner.
A second important attribute of the training experience
is the degree to which the learner controls the
sequence of training examples.
A third important attribute of the training experience is
how well it represents the distribution of examples
over which the final system performance P must be
measured.
24
INTRODUCTION TO FEATURES
25
TEMPLATE MATCHING FOR PATTERN
RECOGNITION
Problem: Recognize letters A to Z
Image is converted into 12x12 bitmap.
26
TEMPLATE MATCHING
Bitmap is represented by 12x12-matrix or by 144-vector
with 0 and 1 coordinates.
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 1 0 1 1 0 0 0
0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 1 1
TEMPLATE MATCHING
Training samples – templates with corresponding class:
t1 { (0,0,0,0,1,1,...,0), ' A '}
t 2 { (0,0,0,0,0,1,...,0), ' A '}
.........
t k { (0,0,1,1,1,1,...,0), ' B'}
..........
Template of the image to be recognized:
T { (0,0,0,0,1,1,...,0), ' A? '}
Algorithm:
1. Find ti , so that ti T . 28
2. Assign image to the same class as ti .
TEMPLATE MATCHING
Number of templates to store: 2144
If fewer templates are stored, some images might not
be recognized.
Improvements
?
29
FEATURES
Features are the individual measurable
properties of the signal being observed.
The set of features used for learning/recognition
is called feature vector.
The number of used features is the
dimensionality of the feature vector.
n-dimensional feature vectors can be
represented as points in n-dimensional feature 30
space
FEATURES
height
x1
x x
weight 2
Class 1 Class 1
Class 2 Class 2
31
FEATURE SPACE
32
FEATURE EXTRACTION
Feature extraction aims to create
discriminative features good for learning
Good Features
Objects from the same class have similar feature
values.
Objects from different classes have different
values.
33
“Good” features “Bad” features
FEATURES
Use fewer features if possible
Use features that differentiate classes well
Character recognition example
Good features: aspect ratio, presence of
loops
Bad features: number of black pixels,
number of connected components
34
CATÉGORISATION OF MACHINE
LEARNING ALGORITHMS
Supervised learning
Classification
Regression
Unsupervised learning
Reinforcement learning
35
CLASSIFICATION
36
SUPERVISED LEARNING -
CLASSIFICATION
Objective
Make a 2 year old kid recognize what is an apple
and what is an orange
Strategy: Show some examples of each category
Apples Oranges
37
CLASSIFICATION
You had some training
What is this???
example or ‘training data’
The examples were ‘labeled’
You used those examples to
make the kid ‘learn’ the
difference between an apple
and an orange
Its an
apple!! 38
!
CLASSIFICATION
Apple
Pear
Tomato
Cow
Dog
Horse
Given: training images and their categories
What are the categories
of these test images? 39
CLASSIFIER: IDENTIFY THE CLASS OF
GIVEN PATTERN
Distance between Feature Vectors
Instead of finding template exactly matching input
template look at how close feature vectors are
Nearest neighbor classification algorithm:
1. Find template closest to
the input pattern. Class 1
2. Classify pattern to the Class 2
same class as closest
template.
40
CLASSIFIER
K-Nearest Neighbor Classifier
Use k nearest neighbors instead of 1 to classify
pattern.
Class 1
Class 2
41
CLASSIFIER
A classifier partitions feature space X into class-
labeled regions such that:
X X 1 X 2 X |Y | and X 1 X 2 X |Y | {0}
X1
X1 X3 X1
X2
X2 X3
The classification consists of determining to which region a feature
vector x belongs to. 42
Borders between regions are called decision boundaries
CLASSIFICATION
Cancer Diagnosis – Generally more than one
variables
Malignant
Benign
Age
Tumor Size
Why supervised – The algorithm is given a number
of patients with the RIGHT ANSWER and we want 43
the algorithm to learn to predict for new patients
CLASSIFICATION
Cancer Diagnosis – Generally more than one
variables Predict for this
patient
Malignant
Benign
Age
Tumor Size
We want the algorithm to learn the separation line.
Once a new patient arrives with a given age and 44
tumor size – Predict as Malignant or Benign
SUPERVISED LEARNING - EXAMPLE
Cancer diagnosis – Many more features
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1 5 20 118 Malignant
2 3 15 130 Benign
3 7 10 52 Benign
4 2 30 100 Malignant
Use this training set to learn how to classify patients
where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101 4 16 95 ?
102 9 22 125 ?
103 1 14 80 ?
Input Data Classification
45
CONTENTS
Supervised learning
Classification
Regression
Unsupervised learning
Reinforcement learning
46
REGRESSION
47
REGRESSION
CLASSIFICATION
The variable we are trying to predict
is DISCRETE
REGRESSION
The variable we are trying to predict
is CONTINUOUS
48
REGRESSION
Dataset giving the living areas and prices of
50 houses
49
REGRESSION
We can plot this data
Given data like
this, how can we
learn to predict the
prices of other
houses as a
function of the size
of their living
areas?
50
REGRESSION
The “input” variables – x(i) (living area in this
example)
The “output” or target variable that we are
trying to predict – y(i) (price)
A pair (x(i), y(i)) is called a training example
A list of m training examples {(x(i), y(i)); i =
1, . . . ,m}—is called a training set
X denote the space of input values, and Y
the space of output values
51
REGRESSION
Given a training set, to learn a function h :
X → Y so that h(x) is a “good” predictor for
the corresponding value of y. For historical
reasons, this function h is called a
hypothesis.
52
REGRESSION
53
REGRESSION
Example: Price of a
used car
x : car attributes
y : price
54
CONTENTS
Supervised learning
Classification
Regression
Unsupervised learning
Reinforcement learning
55
CLUSTERING
56
UNSUPERVISED LEARNING
CLUSTERING
There are two types of
fruit in the basket,
separate them into two
‘groups’
57
UNSUPERVISED LEARNING
CLUSTERING
The data was not ‘labeled’
you did not tell which are
apples which are oranges
May be the kid used the idea
that things in the same group
should be similar to one Separate groups or clusters
another as compared to
things in the other group
Groups - Clusters
58
CLUSTERING
Age
Tumor Size
We have the data for patients but NOT the RIGHT
ANSWERS. The objective is to find interesting
structures in data (in this case two clusters)
59
60
61
UNSUPERVISED LEARNING –
COCKTAIL PARTY EFFECT
Speakers recorded speaking simultaneously
62
CLASSIFICATION VS CLUSTERING
Challenges
Intra-class variability
Inter-class similarity
63
INTRA CLASS VARIABILITY
The letter “T” in different typefaces
64
Same face under different expression, pose,
illumination
INTER CLASS SIMILARITY
Characters that look similar
Identical twins
65
CONTENTS
Supervised learning
Classification
Regression
Unsupervised learning
Reinforcement learning
66
REINFORCEMENT
LEARNING
67
REINFORCEMENT LEARNING
In RL, the computer is simply given a goal to
achieve.
The computer then learns how to achieve that
goal by trial-and-error interactions with its
environment
System learns from success
and failure, reward and
punishment
68
REINFORCEMENT LEARNING
Similar to training a pet dog
Every time dog does
something good you pat him
and say ‘good dog’
Every time dog does some
thing bad you scold him
saying ‘bad dog’
Over time dog will learn to
do good things
69
LEARNING TO RIDE A BICYCLE
Goal given to RL System - To ride the bicycle
without falling over
The RL system begins riding the bicycle and
performs a series of actions that result in the
bicycle being tilted 45 degrees to the right
At this point two actions possible: turn the handle
bars left or turn them right.
RL system turns the handle bars to the left,
immediately crashes to the ground, and receives
a negative reinforcement.
The RL system has just learned not to turn the
handle bars left when tilted 45 degrees to the
70
right
LEARNING TO RIDE A BICYCLE
RL system turns the handle bars to the RIGHT
Result:CRASH!!!
Receives negative reinforcement
RL system has learned that the “state” of
being titled 45 degrees to the right is bad
….
71
A (SIMPLIFIED ) CLASSIFICATION
SYSTEM
Two Modes:
Classification Mode
test Feature
Preprocessing Classification
pattern Measurement
training Feature
pattern Preprocessing Extraction/ Learning
Selection
72
Training Mode
A fancy PR Example
73
A FANCY PROBLEM
Sorting incoming fish on a conveyor
according to species (salmon or sea
bass) using optical sensing
Salmon or sea bass?
(2 categories or
classes)
It is a
classification
problem. How
to solve it? 74
APPROACH
Data Collection: Take
some images using
optical sensor
75
APPROACH
Data collection
How to use it?
Preprocessing: Use a segmentation operation to isolate fish
from one another and from the background
Information from a single fish is sent to a feature extractor
whose purpose is to reduce the data by measuring certain
features
But which features to extract?
The features are passed to a classifier that evaluates the
76
evidence and then takes a final decision
How to design and realize a classifier?
APPROACH
Set up a camera and take some sample
images to extract features
Length
Lightness
Width
Number and shape of fins
Position of the mouth, etc…
This is the set of all suggested features to
explore for use in our classifier!
Challenges:
Variations in images – lightning, occlusion, camera view
angle
Position of the fish on the conveyer belt, etc… 77
FEATURE EXTRACTION
Feature extraction: use domain knowledge
The sea bass is generally longer than a salmon
The average lightness of sea bass scales is
greater than that of salmon
We will use training data in order to learn a
classification rule based on these features
(length of a fish and average lightness)
Length of fish and average lightness may not
be sufficient features i.e. they may not
78
guarantee 100% classification results
CLASSIFICATION – OPTION 1
Select the length of the fish as a possible
feature for discrimination between two
classes
Decision
Boundary
79
Histograms for the length feature for the two
COST OF TAKING A DECISION
A fish-packaging industry use the system to
pack fish in cans.
Two facts
People do not want to find sea bass in the cans
labeled salmon
People occasionally accepts to find salmon in the
cans labeled sea-bass
So the cost of taking a decision in favor of sea
bass when the true reality is salmon is not the
same as the converse
80
EVALUATION OF A CLASSIFIER
How to evaluate a certain classifier?
Classification error: The percentage of patterns
(e.g. fish) that are assigned to wrong category
Choose a classifier that gives minimum classification
error
Risk is the total expected cost of decisions
Choose a classifier that minimizes the risk
81
CLASSIFICATION – OPTION 2
Select the average lightness of the fish as a
possible feature for discrimination between two
classes
82
Histograms for the average lightness feature for the
two categories
CLASSIFICATION – OPTION 3 x x1 x2
Use both length and average lightness features
for classification. Use a simple line to
discriminate
Decisio
n
Bounda
ry
83
The two features of lightness and width for sea bass and
salmon. The dark line might serve as a decision boundary
of our classifier
CLASSIFICATION – OPTION 3
Use both length and average lightness features
for classification. Use a complex model to
discriminate
Overly complex models for the fish will lead to decision
boundaries that are complicated. While such a decision may84
lead to perfect classification (classification error is zero) of
our training samples, it would lead to poor performance on
COMMENTS
Model selection
A complex model seems not be correct one. It is
learning the training data by heart.
So how to choose correct model? (a difficult
question)
Occam Razor principle says “simpler models should
be preferred over complex ones”
Generalization error
The minimization of classification error on train
database does not guarantee minimization of
classification error on test database (generalization 85
error)
CLASSIFICATION – OPTION 3
Decision boundary with good generalization
The decision boundary shown might represent the
optimal tradeoff between performance on the training 86
set and simplicity of classifier.
WHERE YOU STAND…
Separation of different coins using a robotic
arm
87
PROPOSE A SET OF STATISTICAL
FEATURES FOR…..
Verifying the claimed identities of people from
images of their hands laid flat with fingers spread
Identifying models of automobiles from side view
photographs of unknown (variable) scales
88
The material in these slides has been taken from the following
sources
ACKNOWLEDGEMENTS
Machine Learning, Dr. Sneeha Amir, Bahria University,
Islamabad
Pattern Recognition, Dr Imran Siddiqui, Bahria Univeristy,
Islamabad.
Machine Intelligence, Dr M. Hanif, UET, Lahore
Machine Learning, S. Stock, University of Nebraska
Lecture Slides, Introduction to Machine Learning, E. Alpyadin,
MIT Press.
Machine Learning, Andrew Ng – Stanfrod University
Fisher kernels for image representation & generative
classification models, Jakob Verbeek
89