0% found this document useful (0 votes)

246 views77 pages

Machine Learning KTU Module 1

1) Unsupervised learning involves training machine learning models using unlabeled data to group or cluster the data based on similarities and patterns, without explicit guidance on the outputs. 2) It is used for tasks like clustering to group a set of objects so that objects in the same group are more similar to each other than those in other groups. 3) An example is grouping different types of fruits without any prior knowledge about the fruits, by considering their physical characteristics like color to initially cluster them.

Uploaded by

Zayn Tawfik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

246 views77 pages

Machine Learning KTU Module 1

Uploaded by

Zayn Tawfik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

Module 1

CS 467 - Machine Learning

Syllabus

What is Machine Learning, Examples of Machine

Learning applications - Learning associations,
Classification, Regression, Unsupervised Learning,
Reinforcement Learning. Supervised learning- Input
representation, Hypothesis class, Version space,
Vapnik-Chervonenkis (VC) Dimension

2
Machine Learning is...

▷ Machine learning is about predicting the future

based on the past.
-- Hal Daume III

3
What is Machine Learning

Machine learning is a subset of artificial intelligence

in the field of computer science that often uses
statistical techniques to give computers the ability to
"learn" (i.e., progressively improve performance on a
specific task) with data, without being explicitly
programmed.
- Wikipedia

4
What is Machine Learning Cont..

▷ Machine learning is an application of artificial

intelligence (AI) that provides systems the
ability to automatically learn and improve from
experience without being explicitly
programmed. Machine learning focuses on the
development of computer programs that can
access data and use it learn for themselves.

5
Why “Learn” ?
▷ There is no need to “learn” to calculate payroll
▷ Learning is used when:
○ Human expertise does not exist (navigating on Mars),
○ Humans are unable to explain their expertise (speech
recognition)
○ Solution changes in time (routing on a computer
network)
○ Solution needs to be adapted to particular cases (user
biometrics)

Based on slide by E. Alpaydin 6

A classic example of a task that requires machine
learning: It is very hard to say what makes a 2

Slide credit: Geoffrey Hinton

7
What We Talk About When We Talk
About “Learning”
▷ Learning general models from a data of particular
examples
▷ Data is cheap and abundant (data warehouses,
data marts); knowledge is expensive and scarce.
▷ Example in retail: Customer transactions to
consumer behavior:
○ People who bought book A also bought book B
▷ Build a model that is a good and useful
approximation to the data.

8
Data Mining
▷ Retail: Market basket analysis, Customer relationship
management (CRM)
▷ Finance: Credit scoring, fraud detection
▷ Manufacturing: Control, robotics, troubleshooting
▷ Medicine: Medical diagnosis
▷ Telecommunications: Spam filters, intrusion detection
▷ Web mining: Search engines
▷ ...

9
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS

▷ Learning associations
○ Association rule learning is a machine learning
method for discovering interesting relations,
called “association rules”, between variables in
large databases using some measures of
“interestingness”.
○ How association rules are made use of: Consider
an association rule of the form
X => Y,
that is, if people buy X then they are also likely to
buy Y .
10
Learning associations

▷ we are interested in learning a conditional

probability of the form P(Y|X) where Y is the
product the customer may buy and X is the
product or the set of products the customer
has already purchased.

11
What We Talk About When We Talk
About “Learning”
▷ Learning general models from a data of particular
examples
▷ Data is cheap and abundant (data warehouses,
data marts); knowledge is expensive and scarce.
▷ Example in retail: Customer transactions to
consumer behavior:
○ People who bought book A also bought book B
▷ Build a model that is a good and useful
approximation to the data.

8
Learning associations

▷ There are several algorithms for generating

association rules. Some of the well-known
algorithms are listed below:
○ Apriori algorithm
○ Eclat algorithm
○ FP-Growth Algorithm (FP stands for Frequency
Pattern)

13
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS

▷ Classification
○ the problem of identifying to which of a set of
categories a new observation belongs, based on a
training set of data containing observations (or
instances) whose category membership is known.

14
Data Mining
▷ Retail: Market basket analysis, Customer relationship
management (CRM)
▷ Finance: Credit scoring, fraud detection
▷ Manufacturing: Control, robotics, troubleshooting
▷ Medicine: Medical diagnosis
▷ Telecommunications: Spam filters, intrusion detection
▷ Web mining: Search engines
▷ ...

9
Classification

▷ There are several machine-learning algorithms

for classification. The following are some of
the well-known algorithms.
○ Logistic regression
○ Naive Bayes algorithm
○ k-NN algorithm
○ Decision tree algorithm
○ Support vector machine algorithm
○ Random forest algorithm

16
GENERAL CLASSES OF MACHINE
LEARNING PROBLEMS

▷ Regression
○ the problem of predicting the value of a numeric
variable based on observed values of the variable.

17
Regression

Suppose we are required to estimate the price of a car aged 25

years with distance 53240 KM and weight 1200 pounds.

18
Regression

19
Regression

20
Kinds of Machine Learning

▷ Supervised Learning
○ Classification
○ Regression
▷ Unsupervised Learning
▷ Reinforcement Learning

21
Supervised Learning

▷ A majority of practical machine learning uses

supervised learning.

▷ In supervised learning, the system tries to

learn from the previous examples that are
given.

22
Supervised Learning

▷ train the machine using data which is well

labeled
▷ A machine is provided with new set of
examples(data)
▷ supervised learning algorithm analyses the
training data(set of training examples) and
produces a correct outcome from labeled data.

23
Supervised Learning

▷ If shape of object is rounded, with depression at top and

having color Red, then it will be labelled as –Apple.
▷ If shape of object is long curving cylinder, having color
Green-Yellow, then it will be labelled as –Banana.

24
Supervised Learning

▷ Supervised learning classified into two categories of

algorithms:

○ Classification: A classification problem is when the output

variable is a category, such as “Red” or “blue” or “disease” and
“no disease”.
○ Regression: A regression problem is when the output variable
is a real value, such as “dollars” or “weight”.

25
Exercise 1

26
Supervised Learning

FRUIT
No. SIZE COLOR SHAPE
NAME

27
Supervised Learning

FRUIT
No. SIZE COLOR SHAPE
NAME
Rounded shape with
1 Big Red Apple
depression at the top
Heart-shaped to nearly
2 Small Red Cherry
globular
3 Big Green Long curving cylinder Banana
Round to oval,Bunch shape
4 Small Green Grape
Cylindrical

28
Unsupervised Learning

▷ Training of machine using information that is neither

classified nor labeled and allowing the algorithm to
act on that information without guidance.
▷ Here the task of machine is to group unsorted
information according to similarities, patterns and
differences without any prior training of data.
▷ Used in clustering (task of grouping a set of objects in
such a way that objects in the same group are more
similar to each other than to those in other groups. )

29
Unsupervised Learning

30
Unsupervised Learning

▷ Suppose you have a basket and it is filled with some

different types of fruits and your task is to arrange
them as groups.
▷ This time, you don’t know anything about the fruits,
honestly saying this is the first time you have seen
them. You have no clue about those.
▷ So, how will you arrange them?
▷ What will you do first???
▷ You will take a fruit and you will arrange them by
considering the physical character of that particular
fruit
31
Unsupervised Learning

▷ Suppose you have considered color.

○ Then you will arrange them on considering base condition as color.
○ Then the groups will be something like this.
■ RED COLOR GROUP: apples & cherry fruits.
■ GREEN COLOR GROUP: bananas & grapes.
▷ So now you will take another physical character such as
size.
○ RED COLOR AND BIG SIZE: apple.
○ RED COLOR AND SMALL SIZE: cherry fruits.
○ GREEN COLOR AND BIG SIZE: bananas.
○ GREEN COLOR AND SMALL SIZE: grapes.

32
Reinforcement Learning

▷ a type of Machine Learning algorithms which allows

software agents and machines to
○ automatically determine the ideal behavior within a
specific context, to maximize its performance.
▷ A reinforcement learning algorithm, or agent, learns
by interacting with its environment.
▷ The agent receives rewards by performing correctly
and penalties for performing incorrectly.
▷ The agent learns (without intervention from a
human) by maximizing its reward and minimizing its
penalty.
33
Reinforcement Learning

▷ Consider an example of a child learning to walk.

○ child will observe how you are walking.

○ Soon he/she will understand that before walking, the child has
to stand up
○ now the child attempts to get up, staggering and slipping
○ Standing up was easy, but to remain still is another task
○ Now the real task for the child is to start walking.
○ But it’s easy to say than actually do it.
○ There are so many things to keep in mind, like balancing the
body weight, deciding which foot to put next and where to put
it.

34
Reinforcement Learning Cont..

▷ the “problem statement” of the example is to walk,

where the child is an agent trying to manipulate the
environment (which is the surface on which it walks)
by taking actions (viz walking) and he/she tries to go
from one state (viz each step he/she takes) to
another.
▷ The child gets a reward (let’s say chocolate) when
he/she accomplishes a submodule of the task (viz
taking couple of steps) and will not receive any
chocolate (negative reward) when he/she is not able
to walk.
35
Exercise 1

26
Types of Machine Learning

Machine
Learning

Supervised Unsupervised Reinforcement

Task Driven Data Driven Algorithm learns to react

(Regression/ (Clustering) to an environment
Classification)

37
Input representation

○ The general classification problem is concerned

with assigning a class label to an unknown instance
from instances of known assignments of labels.
○ real world problem, a given situation or an object
will have large number of features which may
contribute to the assignment of the labels.
○ Only those which are significant need be
considered as inputs for assigning the class labels.
○ These features are referred to as the “input
features” for the problem

38
Hypothesis

○ In a binary classification problem, a hypothesis

is a statement or a proposition purporting to
explain a given set of facts or observations.

39
Hypothesis

▷ In a machine learning problem where the input

is denoted by x and the output is y.
▷ In order to do machine learning, there should
exist a relationship (pattern) between the
input and output values.
▷ Lets say
y=f(x), this known as the target function.
▷ However, f(.) is unknown function to us.

40
Hypothesis

▷ So machine learning algorithms try to guess a

“hypothesis'' function h(x) that approximates
the unknown f(.)
▷ the set of all possible hypotheses is known as
the Hypothesis set H(.)
▷ the goal in the learning process is to find the
final hypothesis that best approximates the
unknown target function.

41
Hypothesis - Example

▷ Consider the set of observations of a variable

x with the associated class labels given in
Table

hollow dots representing positive examples and solid

dots representing negative examples
42
Hypothesis - Example

The set of all hypotheses obtained by assigning different values to m

constitutes the hypothesis space H

43
Consider a situation with four binary
variables x1, x2, x3, x4 and one binary
output variable y. What is the size of
the hypothesis space?

The input space is in the above given example 24, it is the

number of possible inputs.

44
Reinforcement Learning

▷ a type of Machine Learning algorithms which allows

▷ Consider an example of a child learning to walk.

○ child will observe how you are walking.

34
Reinforcement Learning Cont..

▷ the “problem statement” of the example is to walk,

The set of all hypotheses obtained by assigning different values to m

constitutes the hypothesis space H

43
Scatter plot of price-power data
(hollow circles indicate positive examples and solid dots
indicate negative examples)

49
The version space consists of hypotheses corresponding to axis-
aligned rectangles contained in the shaded region

The inner rectangle is defined by

(34 < price < 47) AND (215 < power < 260)
and the outer rectangle is defined by
(27 < price < 66) AND (170 < power < 290)
50
Vapnik-Chervonenkis (VC) Dimension

▷ VC Dimension is a measure of the capacity

(complexity, expressive power, richness, or
flexibility) of a space of functions that can be
learned by a classification algorithm.

51
Vapnik-Chervonenkis (VC) Dimension

Total Data Points = 2

Classification
Class : A , true, 1, yes (green)
Class : B, false, 0, no (red)

52
Vapnik-Chervonenkis (VC) Dimension

Two Numbers can be classified in four different ways

53
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 3

Three Numbers can be classified in eight different ways

54
Version space

Consider a binary classification problem. Let D be a set

of training examples and H a hypothesis space for the
problem.

The version space for the problem with respect to the set
D and the space H is the set of hypotheses from H consistent
with D; that is, it is the set
VS D,H = {h ∈ H : h(x) = c(x) for all x ∈ D}

45
Version space

▷ A version space learning algorithm is

presented with examples, which it will use to
restrict its hypothesis space;
▷ for each example x, the hypotheses that are
inconsistent with x are removed from the
space.
▷ This iterative refining of the hypothesis space
is called the candidate elimination algorithm;
the hypothesis space maintained inside the
algorithm is version space
46
Version space -Example

47
Vapnik-Chervonenkis (VC) Dimension

Can a line hypothesis class can shatter three data points?

58
Vapnik-Chervonenkis (VC) Dimension

Note: When we say H shatter N data points then does not

mean that H can shatter every N data points. If you can
find even a single dataset of N data points for which H
can shatter them.

59
Vapnik-Chervonenkis (VC) Dimension

60
Vapnik-Chervonenkis (VC) Dimension
Total Data Points = 4

We can’t find a dataset of 4 points which can be shattered by line class

61
Vapnik-Chervonenkis (VC) Dimension

VC dimension of a hypothesis class H is

the maximum number of data points
which can be shattered by H

VC Dimension of line class is 3

62
Vapnik-Chervonenkis dimension (VC
dimension)

▷ Let H be the hypothesis space for some

machine learning problem.
▷ The Vapnik-Chervonenkis dimension of H,
also called the VC dimension of H, and
denoted by V C(H),
▷ is a measure of the complexity (or, capacity,
expressive power, richness, or flexibility) of
the space H.

63
Shattering of a set
▷ Let D be a dataset containing N examples for a
binary classification problem with class labels
0 and 1.
▷ Let H be a hypothesis space for the problem.
▷ Each hypothesis h in H partitions D into two
disjoint subsets as follows:

Such a partition of S is called a “dichotomy” in D.

64
Shattering of a set

▷ It can be shown that there are 2N possible

dichotomies in D.
▷ To each dichotomy of D there is a unique
assignment of the labels “1” and “0” to the
elements of D.
▷ S is any subset of D then, S defines a unique
hypothesis h as follows:

65
Example

▷ Let the instance space X be the set of all real

numbers. Consider the hypothesis space
defined by

66
▷ Let D be a subset of X containing only a single
number, say, D = {3.5}.
▷ There are 2 dichotomies for this set.
▷ These correspond to the following assignment
of class labels:

67
▷ h4 ∈ H is consistent with the former
dichotomy and h3 ∈ H is consistent with the
latter.
▷ So, to every dichotomy in D there is a
hypothesis in H consistent with the
dichotomy.
▷ Therefore, the set D is shattered by the
hypothesis space H.

68
▷ Let D be a subset of X containing two
elements, say, D = {3.25; 4.75}.

69
▷ In these dichotomies,
○ h1 is consistent with (a),
○ h2 is consistent with (b) and
○ h3 is consistent with (d).
○ But there is no hypothesis hm Є H consistent with
(c).
▷ Thus the two-element set D is not shattered
by H.
▷ the size of the largest finite subset of X
shattered by H is 1. This number is the VC
dimension of H. 70
VC Dimension

▷ VC –dim(constant)=0
▷ VC-dim(single-parametric threshold classifier)=1
▷ VC-dim(intervals) = 2
▷ VC-dim(line) = 3
▷ VC-dim(Axis-aligned rectangles) = 4

71
An axis aligned rectangle can shatter 4
points

72
Vapnik-Chervonenkis (VC) Dimension