Module 1
CS 467 - Machine Learning
Syllabus
What is Machine Learning, Examples of Machine
Learning applications - Learning associations,
Classification, Regression, Unsupervised Learning,
Reinforcement Learning. Supervised learning- Input
representation, Hypothesis class, Version space,
Vapnik-Chervonenkis (VC) Dimension
2
Machine Learning is...
▷ Machine learning is about predicting the future
based on the past.
-- Hal Daume III
3
What is Machine Learning
Machine learning is a subset of artificial intelligence
in the field of computer science that often uses
statistical techniques to give computers the ability to
"learn" (i.e., progressively improve performance on a
specific task) with data, without being explicitly
programmed.
- Wikipedia
4
What is Machine Learning Cont..
▷ Machine learning is an application of artificial
intelligence (AI) that provides systems the
ability to automatically learn and improve from
experience without being explicitly
programmed. Machine learning focuses on the
development of computer programs that can
access data and use it learn for themselves.
5
Why “Learn” ?
▷ There is no need to “learn” to calculate payroll
▷ Learning is used when:
○ Human expertise does not exist (navigating on Mars),
○ Humans are unable to explain their expertise (speech
recognition)
○ Solution changes in time (routing on a computer
network)
○ Solution needs to be adapted to particular cases (user
biometrics)
Based on slide by E. Alpaydin 6
A classic example of a task that requires machine
learning: It is very hard to say what makes a 2
Slide credit: Geoffrey Hinton
7
What We Talk About When We Talk
About “Learning”
▷ Learning general models from a data of particular
examples
▷ Data is cheap and abundant (data warehouses,
data marts); knowledge is expensive and scarce.
▷ Example in retail: Customer transactions to
consumer behavior:
○ People who bought book A also bought book B
▷ Build a model that is a good and useful
approximation to the data.
8
Data Mining
▷ Retail: Market basket analysis, Customer relationship
management (CRM)
▷ Finance: Credit scoring, fraud detection
▷ Manufacturing: Control, robotics, troubleshooting
▷ Medicine: Medical diagnosis
▷ Telecommunications: Spam filters, intrusion detection
▷ Web mining: Search engines
▷ ...
9
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS
▷ Learning associations
○ Association rule learning is a machine learning
method for discovering interesting relations,
called “association rules”, between variables in
large databases using some measures of
“interestingness”.
○ How association rules are made use of: Consider
an association rule of the form
X => Y,
that is, if people buy X then they are also likely to
buy Y .
10
Learning associations
▷ we are interested in learning a conditional
probability of the form P(Y|X) where Y is the
product the customer may buy and X is the
product or the set of products the customer
has already purchased.
11
What We Talk About When We Talk
About “Learning”
▷ Learning general models from a data of particular
examples
▷ Data is cheap and abundant (data warehouses,
data marts); knowledge is expensive and scarce.
▷ Example in retail: Customer transactions to
consumer behavior:
○ People who bought book A also bought book B
▷ Build a model that is a good and useful
approximation to the data.
8
Learning associations
▷ There are several algorithms for generating
association rules. Some of the well-known
algorithms are listed below:
○ Apriori algorithm
○ Eclat algorithm
○ FP-Growth Algorithm (FP stands for Frequency
Pattern)
13
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS
▷ Classification
○ the problem of identifying to which of a set of
categories a new observation belongs, based on a
training set of data containing observations (or
instances) whose category membership is known.
14
Data Mining
▷ Retail: Market basket analysis, Customer relationship
management (CRM)
▷ Finance: Credit scoring, fraud detection
▷ Manufacturing: Control, robotics, troubleshooting
▷ Medicine: Medical diagnosis
▷ Telecommunications: Spam filters, intrusion detection
▷ Web mining: Search engines
▷ ...
9
Classification
▷ There are several machine-learning algorithms
for classification. The following are some of
the well-known algorithms.
○ Logistic regression
○ Naive Bayes algorithm
○ k-NN algorithm
○ Decision tree algorithm
○ Support vector machine algorithm
○ Random forest algorithm
16
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS
▷ Regression
○ the problem of predicting the value of a numeric
variable based on observed values of the variable.
17
Regression
Suppose we are required to estimate the price of a car aged 25
years with distance 53240 KM and weight 1200 pounds.
18
Regression
19
Regression
20
Kinds of Machine Learning
▷ Supervised Learning
○ Classification
○ Regression
▷ Unsupervised Learning
▷ Reinforcement Learning
21
Supervised Learning
▷ A majority of practical machine learning uses
supervised learning.
▷ In supervised learning, the system tries to
learn from the previous examples that are
given.
22
Supervised Learning
▷ train the machine using data which is well
labeled
▷ A machine is provided with new set of
examples(data)
▷ supervised learning algorithm analyses the
training data(set of training examples) and
produces a correct outcome from labeled data.
23
Supervised Learning
▷ If shape of object is rounded, with depression at top and
having color Red, then it will be labelled as –Apple.
▷ If shape of object is long curving cylinder, having color
Green-Yellow, then it will be labelled as –Banana.
24
Supervised Learning
▷ Supervised learning classified into two categories of
algorithms:
○ Classification: A classification problem is when the output
variable is a category, such as “Red” or “blue” or “disease” and
“no disease”.
○ Regression: A regression problem is when the output variable
is a real value, such as “dollars” or “weight”.
25
Exercise 1
26
Supervised Learning
FRUIT
No. SIZE COLOR SHAPE
NAME
27
Supervised Learning
FRUIT
No. SIZE COLOR SHAPE
NAME
Rounded shape with
1 Big Red Apple
depression at the top
Heart-shaped to nearly
2 Small Red Cherry
globular
3 Big Green Long curving cylinder Banana
Round to oval,Bunch shape
4 Small Green Grape
Cylindrical
28
Unsupervised Learning
▷ Training of machine using information that is neither
classified nor labeled and allowing the algorithm to
act on that information without guidance.
▷ Here the task of machine is to group unsorted
information according to similarities, patterns and
differences without any prior training of data.
▷ Used in clustering (task of grouping a set of objects in
such a way that objects in the same group are more
similar to each other than to those in other groups. )
29
Unsupervised Learning
30
Unsupervised Learning
▷ Suppose you have a basket and it is filled with some
different types of fruits and your task is to arrange
them as groups.
▷ This time, you don’t know anything about the fruits,
honestly saying this is the first time you have seen
them. You have no clue about those.
▷ So, how will you arrange them?
▷ What will you do first???
▷ You will take a fruit and you will arrange them by
considering the physical character of that particular
fruit
31
Unsupervised Learning
▷ Suppose you have considered color.
○ Then you will arrange them on considering base condition as color.
○ Then the groups will be something like this.
■ RED COLOR GROUP: apples & cherry fruits.
■ GREEN COLOR GROUP: bananas & grapes.
▷ So now you will take another physical character such as
size.
○ RED COLOR AND BIG SIZE: apple.
○ RED COLOR AND SMALL SIZE: cherry fruits.
○ GREEN COLOR AND BIG SIZE: bananas.
○ GREEN COLOR AND SMALL SIZE: grapes.
32
Reinforcement Learning
▷ a type of Machine Learning algorithms which allows
software agents and machines to
○ automatically determine the ideal behavior within a
specific context, to maximize its performance.
▷ A reinforcement learning algorithm, or agent, learns
by interacting with its environment.
▷ The agent receives rewards by performing correctly
and penalties for performing incorrectly.
▷ The agent learns (without intervention from a
human) by maximizing its reward and minimizing its
penalty.
33
Reinforcement Learning
▷ Consider an example of a child learning to walk.
○ child will observe how you are walking.
○ Soon he/she will understand that before walking, the child has
to stand up
○ now the child attempts to get up, staggering and slipping
○ Standing up was easy, but to remain still is another task
○ Now the real task for the child is to start walking.
○ But it’s easy to say than actually do it.
○ There are so many things to keep in mind, like balancing the
body weight, deciding which foot to put next and where to put
it.
34
Reinforcement Learning Cont..
▷ the “problem statement” of the example is to walk,
where the child is an agent trying to manipulate the
environment (which is the surface on which it walks)
by taking actions (viz walking) and he/she tries to go
from one state (viz each step he/she takes) to
another.
▷ The child gets a reward (let’s say chocolate) when
he/she accomplishes a submodule of the task (viz
taking couple of steps) and will not receive any
chocolate (negative reward) when he/she is not able
to walk.
35
Exercise 1
26
Types of Machine Learning
Machine
Learning
Supervised Unsupervised Reinforcement
Task Driven Data Driven Algorithm learns to react
(Regression/ (Clustering) to an environment
Classification)
37
Input representation
○ The general classification problem is concerned
with assigning a class label to an unknown instance
from instances of known assignments of labels.
○ real world problem, a given situation or an object
will have large number of features which may
contribute to the assignment of the labels.
○ Only those which are significant need be
considered as inputs for assigning the class labels.
○ These features are referred to as the “input
features” for the problem
38
Hypothesis
○ In a binary classification problem, a hypothesis
is a statement or a proposition purporting to
explain a given set of facts or observations.
39
Hypothesis
▷ In a machine learning problem where the input
is denoted by x and the output is y.
▷ In order to do machine learning, there should
exist a relationship (pattern) between the
input and output values.
▷ Lets say
y=f(x), this known as the target function.
▷ However, f(.) is unknown function to us.
40
Hypothesis
▷ So machine learning algorithms try to guess a
“hypothesis'' function h(x) that approximates
the unknown f(.)
▷ the set of all possible hypotheses is known as
the Hypothesis set H(.)
▷ the goal in the learning process is to find the
final hypothesis that best approximates the
unknown target function.
41
Hypothesis - Example
▷ Consider the set of observations of a variable
x with the associated class labels given in
Table
hollow dots representing positive examples and solid
dots representing negative examples
42
Hypothesis - Example
The set of all hypotheses obtained by assigning different values to m
constitutes the hypothesis space H
43
Consider a situation with four binary
variables x1, x2, x3, x4 and one binary
output variable y. What is the size of
the hypothesis space?
The input space is in the above given example 24, it is the
number of possible inputs.
44
Reinforcement Learning
▷ a type of Machine Learning algorithms which allows
software agents and machines to
○ automatically determine the ideal behavior within a
specific context, to maximize its performance.
▷ A reinforcement learning algorithm, or agent, learns
by interacting with its environment.
▷ The agent receives rewards by performing correctly
and penalties for performing incorrectly.
▷ The agent learns (without intervention from a
human) by maximizing its reward and minimizing its
penalty.
33
Reinforcement Learning
▷ Consider an example of a child learning to walk.
○ child will observe how you are walking.
○ Soon he/she will understand that before walking, the child has
to stand up
○ now the child attempts to get up, staggering and slipping
○ Standing up was easy, but to remain still is another task
○ Now the real task for the child is to start walking.
○ But it’s easy to say than actually do it.
○ There are so many things to keep in mind, like balancing the
body weight, deciding which foot to put next and where to put
it.
34
Reinforcement Learning Cont..
▷ the “problem statement” of the example is to walk,
where the child is an agent trying to manipulate the
environment (which is the surface on which it walks)
by taking actions (viz walking) and he/she tries to go
from one state (viz each step he/she takes) to
another.
▷ The child gets a reward (let’s say chocolate) when
he/she accomplishes a submodule of the task (viz
taking couple of steps) and will not receive any
chocolate (negative reward) when he/she is not able
to walk.
35
Hypothesis - Example
The set of all hypotheses obtained by assigning different values to m
constitutes the hypothesis space H
43
Scatter plot of price-power data
(hollow circles indicate positive examples and solid dots
indicate negative examples)
49
The version space consists of hypotheses corresponding to axis-
aligned rectangles contained in the shaded region
The inner rectangle is defined by
(34 < price < 47) AND (215 < power < 260)
and the outer rectangle is defined by
(27 < price < 66) AND (170 < power < 290)
50
Vapnik-Chervonenkis (VC) Dimension
▷ VC Dimension is a measure of the capacity
(complexity, expressive power, richness, or
flexibility) of a space of functions that can be
learned by a classification algorithm.
51
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 2
Classification
Class : A , true, 1, yes (green)
Class : B, false, 0, no (red)
52
Vapnik-Chervonenkis (VC) Dimension
Two Numbers can be classified in four different ways
53
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 3
Three Numbers can be classified in eight different ways
54
Version space
Consider a binary classification problem. Let D be a set
of training examples and H a hypothesis space for the
problem.
The version space for the problem with respect to the set
D and the space H is the set of hypotheses from H consistent
with D; that is, it is the set
VS D,H = {h ∈ H : h(x) = c(x) for all x ∈ D}
45
Version space
▷ A version space learning algorithm is
presented with examples, which it will use to
restrict its hypothesis space;
▷ for each example x, the hypotheses that are
inconsistent with x are removed from the
space.
▷ This iterative refining of the hypothesis space
is called the candidate elimination algorithm;
the hypothesis space maintained inside the
algorithm is version space
46
Version space -Example
47
Vapnik-Chervonenkis (VC) Dimension
Can a line hypothesis class can shatter three data points?
58
Vapnik-Chervonenkis (VC) Dimension
Note: When we say H shatter N data points then does not
mean that H can shatter every N data points. If you can
find even a single dataset of N data points for which H
can shatter them.
59
Vapnik-Chervonenkis (VC) Dimension
60
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 4
We can’t find a dataset of 4 points which can be shattered by line class
61
Vapnik-Chervonenkis (VC) Dimension
VC dimension of a hypothesis class H is
the maximum number of data points
which can be shattered by H
VC Dimension of line class is 3
62
Vapnik-Chervonenkis dimension (VC
dimension)
▷ Let H be the hypothesis space for some
machine learning problem.
▷ The Vapnik-Chervonenkis dimension of H,
also called the VC dimension of H, and
denoted by V C(H),
▷ is a measure of the complexity (or, capacity,
expressive power, richness, or flexibility) of
the space H.
63
Shattering of a set
▷ Let D be a dataset containing N examples for a
binary classification problem with class labels
0 and 1.
▷ Let H be a hypothesis space for the problem.
▷ Each hypothesis h in H partitions D into two
disjoint subsets as follows:
Such a partition of S is called a “dichotomy” in D.
64
Shattering of a set
▷ It can be shown that there are 2N possible
dichotomies in D.
▷ To each dichotomy of D there is a unique
assignment of the labels “1” and “0” to the
elements of D.
▷ S is any subset of D then, S defines a unique
hypothesis h as follows:
65
Example
▷ Let the instance space X be the set of all real
numbers. Consider the hypothesis space
defined by
66
▷ Let D be a subset of X containing only a single
number, say, D = {3.5}.
▷ There are 2 dichotomies for this set.
▷ These correspond to the following assignment
of class labels:
67
▷ h4 ∈ H is consistent with the former
dichotomy and h3 ∈ H is consistent with the
latter.
▷ So, to every dichotomy in D there is a
hypothesis in H consistent with the
dichotomy.
▷ Therefore, the set D is shattered by the
hypothesis space H.
68
▷ Let D be a subset of X containing two
elements, say, D = {3.25; 4.75}.
69
▷ In these dichotomies,
○ h1 is consistent with (a),
○ h2 is consistent with (b) and
○ h3 is consistent with (d).
○ But there is no hypothesis hm Є H consistent with
(c).
▷ Thus the two-element set D is not shattered
by H.
▷ the size of the largest finite subset of X
shattered by H is 1. This number is the VC
dimension of H. 70
VC Dimension
▷ VC –dim(constant)=0
▷ VC-dim(single-parametric threshold classifier)=1
▷ VC-dim(intervals) = 2
▷ VC-dim(line) = 3
▷ VC-dim(Axis-aligned rectangles) = 4
71
An axis aligned rectangle can shatter 4
points
72
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 2
Classification
Class : A , true, 1, yes (green)
Class : B, false, 0, no (red)
52
Vapnik-Chervonenkis (VC) Dimension
Two Numbers can be classified in four different ways
53
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 3
Three Numbers can be classified in eight different ways
54