Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
13 views89 pages

01-Introduction To Machine Learning

The document provides an introduction to machine learning, outlining prerequisites, resources, and key concepts. It explains the significance of data-driven solutions in various fields such as finance, medicine, and telecommunications, emphasizing the role of algorithms in learning from data. Additionally, it categorizes machine learning algorithms into supervised, unsupervised, and reinforcement learning, detailing applications like credit scoring, autonomous driving, and handwriting recognition.

Uploaded by

sumrunkhan904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views89 pages

01-Introduction To Machine Learning

The document provides an introduction to machine learning, outlining prerequisites, resources, and key concepts. It explains the significance of data-driven solutions in various fields such as finance, medicine, and telecommunications, emphasizing the role of algorithms in learning from data. Additionally, it categorizes machine learning algorithms into supervised, unsupervised, and reinforcement learning, detailing applications like credit scoring, autonomous driving, and handwriting recognition.

Uploaded by

sumrunkhan904
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 89

MACHINE

LEARNING
INTRODUCTION

1
COURSE INTRO
 Pre-requisite
 Introductory knowledge of Artificial Intelligence,
Probability, Statistics and Linear Algebra

 Course Resources
 Lectures slides, assignments (computer/written),
solutions to problems, projects.

2
Machine Learning
COURSE INTRO
 Book

1. Introduction to Machine Learning, Ethem


Alpaydin, MIT Press, 2010.
2. Machine Learning, Tom, M., McGraw Hill, 1997
3. Deep Learning by Ian Goodfellow, Yoshua
Bengio, Aaron Courville(
http://www.deeplearningbook.org/)
4. Deep learning with python by Francoise Chollet,
ISBN-10: 9781617294433, 2017

3
INTRODUCTION
 To solve a problem on a computer, we need
an algorithm. An algorithm is a sequence of
instructions that should be carried out to
transform the input to output.

 For example, one can devise an algorithm for


sorting. The input is a set of numbers and the
output is their ordered list.

4
INTRODUCTION
 For some tasks, however, we do not have an
algorithm—for example, to tell spam emails
from legitimate emails. We know what the
input is: an email document that in the
simplest case is a file of characters.

 We know what the output should be: a yes/no


output indicating whether the message is
spam or not. We do not know how to
transform the input to the output.

5
INTRODUCTION – DATA DRIVEN
SOLUTIONS
 What we lack in knowledge, we make up for it in DATA.

 We can easily compile thousands of example messages


some of which we know to be spam and what we want
is to “learn” what consititutes spam from them.

 In other words, we would like the computer (machine)


to extract automatically the algorithm for this task.

 With advances in computer technology, we currently


have the ability to store and process large amounts of
data, as well as to access it from physically distant
locations over a computer network.
6
INTRODUCTION – DATA DRIVEN
SOLUTIONS
 Examples:
 In finance, banks analyze their past data to

build models to use in credit applications,


fraud detection, and the stock market.
 In manufacturing, learning models are used

for optimization, control, and


troubleshooting.
 In medicine, learning programs are used for

medical diagnosis.
 In telecommunications, call patterns are

analyzed for network optimization and


7
maximizing the quality of service.
INTRODUCTION – DATA DRIVEN
SOLUTIONS
 Machine learning also helps us find solutions to
many problems in vision, speech recognition, and
robotics.
 Let us take the example of recognizing faces:
This is a task we do effortlessly; every day we
recognize family members.
 But we do it unconsciously and are unable to
explain how we do it.
 Each person’s face is a pattern composed of a
particular combination of these. By analyzing
sample face images of a person, a learning
program captures the pattern specific to that
8
person and then recognizes by checking for this
pattern in a given image.
INTRODUCTION – DATA DRIVEN
SOLUTIONS
 In machine learning, we may not be able to identify
the process completely, but we believe we can
construct a good and useful approximation.
 That approximation may not explain everything, but
may still be able to account for some part of the
data.
 We believe that though identifying the complete
process may not be possible, we can still detect
certain patterns or regularities.
 This is the role of machine learning. Such patterns
may help us understand the process, or we can use
those patterns to make predictions: Assuming that
the future, at least the near future, will not be much
9
different from the past when the sample data was
collected.
WHAT IS MACHINE LEARNING?

Make the machine Evaluate how good


‘learn’ some thing the machine has
‘learned’
10
MACHINE LEARNING

Field of study that gives computers


the ability to learn without being
explicitly programmed.

Arthur Sameul (1959)

11
MACHINE LEARNING

Machine learning is programming


computers to optimize a
performance criterion using
example data or past experience.

Tom Mitchell (1998)

12
LEARNING PROBLEMS – EXAMPLES
 Learning = Improving with experience over
some task
 Improve over task T,
 With respect to performance measure P,
 Based on experience E.
 Example
T = Play checkers
 P = % of games won in a tournament
 E = opportunity to play against itself

13
LEARNING PROBLEMS – EXAMPLES
 Handwriting recognition
learning problem
 Task T: recognizing
handwritten words within
images
 Performance measure P:
percent of words correctly
recognized
 Training experience E: a
database of handwritten
words with given
classifications
14
LEARNING PROBLEMS – EXAMPLES
 A robot driving learning
problem
 Task T: driving on public four-
lane highways using vision
sensors
 Performance measure P:
average distance traveled
before an error (as judged by
human overseer)
 Training experience E: a
sequence of images and
steering commands recorded
while observing a human 15
driver
MACHINE LEARNING
 Nicolas learns about trucks and dumpers

16
MACHINE LEARNING
 But will he recognize others?

So learning involves ability to


generalize from labeled
examples
17
MACHINE LEARNING
 There is no need to “learn” to calculate payroll
 Learning is used in:
 Data mining programs that learn to detect fraudulent
credit card transactions
 Programs that learn to filter spam email
 Programs that learn to play checkers/chess
 Autonomous vehicles that learn to drive on public
highways
 Self customizing programs
 And many more…

18
MACHINE LEARNING

19
Applications

CREDIT SCORING
 Differentiating
between low-risk and
high-risk customers
from their income
and savings

Discriminant: IF income > θ1 AND savings > θ2


THEN low-risk
ELSE high-risk 20
Applications

AUTONOMOUS DRIVING
 ALVINN* – Drives 70mph on highways

*Autonomous Land Vehicle In 21


a Neural Network
Applications

FACE RECOGNITION
Training examples of a person

Test images

AT&T Laboratories, Cambridge UK


http://www.uk.research.att.com/facedatabase.html

22
Applications

OCR & HANDWRITING RECOGNITION

23
TRAINING DATA – SOME IMPORTANT
CONSIDERATIONS
 The first design choice is the type of training
experience from which our system will learn. The type
of training experience available can have a significant
impact on success or failure of the learner.

 A second important attribute of the training experience


is the degree to which the learner controls the
sequence of training examples.

 A third important attribute of the training experience is


how well it represents the distribution of examples
over which the final system performance P must be
measured.
24
INTRODUCTION TO FEATURES

25
TEMPLATE MATCHING FOR PATTERN
RECOGNITION
 Problem: Recognize letters A to Z
Image is converted into 12x12 bitmap.

26
TEMPLATE MATCHING
Bitmap is represented by 12x12-matrix or by 144-vector
with 0 and 1 coordinates.
0 0 0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 1 1 1 0 0 0 0
0 0 0 0 0 1 0 1 1 0 0 0
0 0 0 0 1 1 0 1 1 0 0 0
0 0 0 0 1 0 0 0 1 0 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 0 1 1 0 0 0 1 1 0 0
0 0 1 1 1 1 1 1 1 1 1 0
0 0 1 1 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 1 1 0
0 1 1 0 0 0 0 0 0 0 1 1
1 1 0 0 0 0 0 0 0 0 1 1
TEMPLATE MATCHING
Training samples – templates with corresponding class:
t1 { (0,0,0,0,1,1,...,0), ' A '}
t 2 { (0,0,0,0,0,1,...,0), ' A '}
.........
t k { (0,0,1,1,1,1,...,0), ' B'}
..........
Template of the image to be recognized:
T { (0,0,0,0,1,1,...,0), ' A? '}

Algorithm:
1. Find ti , so that ti T . 28

2. Assign image to the same class as ti .


TEMPLATE MATCHING

Number of templates to store: 2144


If fewer templates are stored, some images might not
be recognized.

Improvements
?

29
FEATURES
 Features are the individual measurable
properties of the signal being observed.
 The set of features used for learning/recognition
is called feature vector.
 The number of used features is the
dimensionality of the feature vector.
 n-dimensional feature vectors can be
represented as points in n-dimensional feature 30

space
FEATURES
height

 x1 
 x  x
weight  2

Class 1 Class 1
Class 2 Class 2

31
FEATURE SPACE

32
FEATURE EXTRACTION
 Feature extraction aims to create
discriminative features good for learning
 Good Features
 Objects from the same class have similar feature
values.
 Objects from different classes have different
values.

33

“Good” features “Bad” features


FEATURES
 Use fewer features if possible
 Use features that differentiate classes well
 Character recognition example
 Good features: aspect ratio, presence of
loops
 Bad features: number of black pixels,
number of connected components

34
CATÉGORISATION OF MACHINE
LEARNING ALGORITHMS

 Supervised learning
 Classification

 Regression

 Unsupervised learning
 Reinforcement learning

35
CLASSIFICATION

36
SUPERVISED LEARNING -
CLASSIFICATION
 Objective
 Make a 2 year old kid recognize what is an apple
and what is an orange
 Strategy: Show some examples of each category

Apples Oranges
37
CLASSIFICATION
 You had some training
What is this???
example or ‘training data’

 The examples were ‘labeled’

 You used those examples to


make the kid ‘learn’ the
difference between an apple
and an orange

Its an
apple!! 38
!
CLASSIFICATION

Apple

Pear

Tomato

Cow

Dog

Horse

Given: training images and their categories


What are the categories
of these test images? 39
CLASSIFIER: IDENTIFY THE CLASS OF
GIVEN PATTERN

Distance between Feature Vectors


 Instead of finding template exactly matching input
template look at how close feature vectors are
 Nearest neighbor classification algorithm:
1. Find template closest to
the input pattern. Class 1

2. Classify pattern to the Class 2


same class as closest
template.

40
CLASSIFIER
K-Nearest Neighbor Classifier
 Use k nearest neighbors instead of 1 to classify
pattern.
Class 1

Class 2

41
CLASSIFIER
A classifier partitions feature space X into class-
labeled regions such that:
X X 1  X 2    X |Y | and X 1  X 2    X |Y | {0}

X1
X1 X3 X1
X2
X2 X3

The classification consists of determining to which region a feature


vector x belongs to. 42

Borders between regions are called decision boundaries


CLASSIFICATION
 Cancer Diagnosis – Generally more than one
variables

Malignant
Benign

Age

Tumor Size
Why supervised – The algorithm is given a number
of patients with the RIGHT ANSWER and we want 43
the algorithm to learn to predict for new patients
CLASSIFICATION
 Cancer Diagnosis – Generally more than one
variables Predict for this
patient
Malignant
Benign

Age

Tumor Size

We want the algorithm to learn the separation line.


Once a new patient arrives with a given age and 44

tumor size – Predict as Malignant or Benign


SUPERVISED LEARNING - EXAMPLE
 Cancer diagnosis – Many more features
Patient ID # of Tumors Avg Area Avg Density Diagnosis
1 5 20 118 Malignant
2 3 15 130 Benign
3 7 10 52 Benign
4 2 30 100 Malignant

 Use this training set to learn how to classify patients


where diagnosis is not known:
Patient ID # of Tumors Avg Area Avg Density Diagnosis
101 4 16 95 ?
102 9 22 125 ?
103 1 14 80 ?

Input Data Classification


45
CONTENTS

 Supervised learning
 Classification

 Regression

 Unsupervised learning
 Reinforcement learning

46
REGRESSION

47
REGRESSION
CLASSIFICATION
The variable we are trying to predict
is DISCRETE

REGRESSION
The variable we are trying to predict
is CONTINUOUS

48
REGRESSION
 Dataset giving the living areas and prices of
50 houses

49
REGRESSION
 We can plot this data

Given data like


this, how can we
learn to predict the
prices of other
houses as a
function of the size
of their living
areas?

50
REGRESSION
 The “input” variables – x(i) (living area in this
example)
 The “output” or target variable that we are

trying to predict – y(i) (price)


 A pair (x(i), y(i)) is called a training example

 A list of m training examples {(x(i), y(i)); i =

 1, . . . ,m}—is called a training set

 X denote the space of input values, and Y

the space of output values

51
REGRESSION
Given a training set, to learn a function h :
X → Y so that h(x) is a “good” predictor for
the corresponding value of y. For historical
reasons, this function h is called a
hypothesis.

52
REGRESSION

53
REGRESSION
 Example: Price of a
used car
 x : car attributes

y : price

54
CONTENTS

 Supervised learning
 Classification

 Regression

 Unsupervised learning
 Reinforcement learning

55
CLUSTERING

56
UNSUPERVISED LEARNING
 CLUSTERING

There are two types of


fruit in the basket,
separate them into two
‘groups’

57
UNSUPERVISED LEARNING
 CLUSTERING
 The data was not ‘labeled’
you did not tell which are
apples which are oranges

 May be the kid used the idea


that things in the same group
should be similar to one Separate groups or clusters
another as compared to
things in the other group

 Groups - Clusters
58
CLUSTERING

Age

Tumor Size

We have the data for patients but NOT the RIGHT


ANSWERS. The objective is to find interesting
structures in data (in this case two clusters)
59
60
61
UNSUPERVISED LEARNING –
COCKTAIL PARTY EFFECT
 Speakers recorded speaking simultaneously

62
CLASSIFICATION VS CLUSTERING

 Challenges
 Intra-class variability
 Inter-class similarity
63
INTRA CLASS VARIABILITY

The letter “T” in different typefaces

64
Same face under different expression, pose,
illumination
INTER CLASS SIMILARITY

Characters that look similar

Identical twins

65
CONTENTS

 Supervised learning
 Classification

 Regression

 Unsupervised learning
 Reinforcement learning

66
REINFORCEMENT
LEARNING

67
REINFORCEMENT LEARNING
 In RL, the computer is simply given a goal to
achieve.
 The computer then learns how to achieve that

goal by trial-and-error interactions with its


environment
System learns from success
and failure, reward and
punishment

68
REINFORCEMENT LEARNING
 Similar to training a pet dog

 Every time dog does


something good you pat him
and say ‘good dog’
 Every time dog does some
thing bad you scold him
saying ‘bad dog’
 Over time dog will learn to
do good things
69
LEARNING TO RIDE A BICYCLE
 Goal given to RL System - To ride the bicycle
without falling over
 The RL system begins riding the bicycle and
performs a series of actions that result in the
bicycle being tilted 45 degrees to the right
 At this point two actions possible: turn the handle
bars left or turn them right.
 RL system turns the handle bars to the left,
immediately crashes to the ground, and receives
a negative reinforcement.
 The RL system has just learned not to turn the
handle bars left when tilted 45 degrees to the
70
right
LEARNING TO RIDE A BICYCLE
 RL system turns the handle bars to the RIGHT
 Result:CRASH!!!
 Receives negative reinforcement

 RL system has learned that the “state” of


being titled 45 degrees to the right is bad

 ….

71
A (SIMPLIFIED ) CLASSIFICATION
SYSTEM
Two Modes:
Classification Mode

test Feature
Preprocessing Classification
pattern Measurement

training Feature
pattern Preprocessing Extraction/ Learning
Selection

72
Training Mode
A fancy PR Example

73
A FANCY PROBLEM
Sorting incoming fish on a conveyor
according to species (salmon or sea
bass) using optical sensing

Salmon or sea bass?


(2 categories or
classes)

It is a
classification
problem. How
to solve it? 74
APPROACH
Data Collection: Take
some images using
optical sensor

75
APPROACH
 Data collection
How to use it?

 Preprocessing: Use a segmentation operation to isolate fish


from one another and from the background

 Information from a single fish is sent to a feature extractor


whose purpose is to reduce the data by measuring certain
features
But which features to extract?

 The features are passed to a classifier that evaluates the


76
evidence and then takes a final decision
How to design and realize a classifier?
APPROACH
 Set up a camera and take some sample
images to extract features
 Length
 Lightness
 Width
 Number and shape of fins
 Position of the mouth, etc…
 This is the set of all suggested features to
explore for use in our classifier!
 Challenges:
 Variations in images – lightning, occlusion, camera view
angle
 Position of the fish on the conveyer belt, etc… 77
FEATURE EXTRACTION
 Feature extraction: use domain knowledge
 The sea bass is generally longer than a salmon
 The average lightness of sea bass scales is
greater than that of salmon

 We will use training data in order to learn a


classification rule based on these features
(length of a fish and average lightness)

 Length of fish and average lightness may not


be sufficient features i.e. they may not
78
guarantee 100% classification results
CLASSIFICATION – OPTION 1
 Select the length of the fish as a possible
feature for discrimination between two
classes
Decision
Boundary

79

Histograms for the length feature for the two


COST OF TAKING A DECISION
 A fish-packaging industry use the system to
pack fish in cans.
 Two facts
 People do not want to find sea bass in the cans
labeled salmon
 People occasionally accepts to find salmon in the
cans labeled sea-bass

 So the cost of taking a decision in favor of sea


bass when the true reality is salmon is not the
same as the converse
80
EVALUATION OF A CLASSIFIER
 How to evaluate a certain classifier?

 Classification error: The percentage of patterns


(e.g. fish) that are assigned to wrong category
 Choose a classifier that gives minimum classification
error

 Risk is the total expected cost of decisions


 Choose a classifier that minimizes the risk
81
CLASSIFICATION – OPTION 2
 Select the average lightness of the fish as a
possible feature for discrimination between two
classes

82

Histograms for the average lightness feature for the


two categories
CLASSIFICATION – OPTION 3 x  x1 x2 
 Use both length and average lightness features
for classification. Use a simple line to
discriminate

Decisio
n
Bounda
ry

83
The two features of lightness and width for sea bass and
salmon. The dark line might serve as a decision boundary
of our classifier
CLASSIFICATION – OPTION 3
 Use both length and average lightness features
for classification. Use a complex model to
discriminate

Overly complex models for the fish will lead to decision


boundaries that are complicated. While such a decision may84
lead to perfect classification (classification error is zero) of
our training samples, it would lead to poor performance on
COMMENTS
 Model selection
A complex model seems not be correct one. It is
learning the training data by heart.
 So how to choose correct model? (a difficult
question)
 Occam Razor principle says “simpler models should
be preferred over complex ones”

 Generalization error
 The minimization of classification error on train
database does not guarantee minimization of
classification error on test database (generalization 85
error)
CLASSIFICATION – OPTION 3
 Decision boundary with good generalization

The decision boundary shown might represent the


optimal tradeoff between performance on the training 86
set and simplicity of classifier.
WHERE YOU STAND…
 Separation of different coins using a robotic
arm

87
PROPOSE A SET OF STATISTICAL
FEATURES FOR…..
 Verifying the claimed identities of people from
images of their hands laid flat with fingers spread

 Identifying models of automobiles from side view


photographs of unknown (variable) scales

88
The material in these slides has been taken from the following
sources

ACKNOWLEDGEMENTS
 Machine Learning, Dr. Sneeha Amir, Bahria University,
Islamabad
 Pattern Recognition, Dr Imran Siddiqui, Bahria Univeristy,
Islamabad.
 Machine Intelligence, Dr M. Hanif, UET, Lahore
 Machine Learning, S. Stock, University of Nebraska
 Lecture Slides, Introduction to Machine Learning, E. Alpyadin,
MIT Press.
 Machine Learning, Andrew Ng – Stanfrod University
 Fisher kernels for image representation & generative
classification models, Jakob Verbeek

89

You might also like