Lecture
Slides for
ETHEM ALPAYDIN
© The MIT Press, 2010
In preparation of these slides, I have benefited from slides [email protected]
prepared by:
h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
E. Alpaydin (Intro. to Machine Learning),
D. Bouchaffra and V. Murino (PaGern Classification and Scene
Analysis),
R. Gutierrez-Osuna (Texas A&M)
A. Moore (CMU)
Why “Learn” ?
Machine learning is programming computers to opImize
a performance criterion using example data or past
experience.
There is no need to “learn” to calculate payroll
Learning is used when:
Human experIse does not exist (navigaIng on Mars),
Humans are unable to explain their experIse (speech
recogniIon)
SoluIon changes in Ime (rouIng on a computer network)
SoluIon needs to be adapted to parIcular cases (user
biometrics)
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 3
What We Talk About When We
Talk About“Learning”
Learning general models from a data of parIcular
examples
Data is cheap and abundant (data warehouses, data
marts); knowledge is expensive and scarce.
Example in retail: Customer transacIons to consumer
behavior:
People who bought “Blink” also bought
“Outliers” (www.amazon.com)
Build a model that is a good and useful approximaGon to
the data.
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 4
Data Mining
Retail: Market basket analysis, Customer relaIonship
management (CRM)
Finance: Credit scoring, fraud detecIon
Manufacturing: Control, roboIcs, troubleshooIng
Medicine: Medical diagnosis
TelecommunicaIons: Spam filters, intrusion detecIon
BioinformaIcs: MoIfs, alignment
Web mining: Search engines
...
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 5
What is Machine Learning?
OpImize a performance criterion using example data or
past experience.
Role of StaIsIcs: Inference from a sample
Role of Computer science: Efficient algorithms to
Solve the opImizaIon problem
RepresenIng and evaluaIng the model for inference
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 6
ApplicaIons
AssociaIon
Supervised Learning
ClassificaIon
Regression
Unsupervised Learning
Reinforcement Learning
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 7
Learning AssociaIons
Basket analysis:
P (Y | X ) probability that somebody who buys X also buys
Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 8
ClassificaIon
Example: Credit
scoring
DifferenIaIng
between low-risk
and high-risk
customers from their
income and savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 9
ClassificaIon: ApplicaIons
Aka Pamern recogniIon
Face recogniIon: Pose, lighIng, occlusion (glasses,
beard), make-up, hair style
Character recogniIon: Different handwriIng styles.
Speech recogniIon: Temporal dependency.
Medical diagnosis: From symptoms to illnesses
Biometrics: RecogniIon/authenIcaIon using physical
and/or behavioral characterisIcs: Face, iris, signature, etc
...
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 10
Face RecogniIon
Training examples of a person
Test images
ORL dataset,
AT&T Laboratories, Cambridge UK
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 11
Regression
Example: Price of a used
car
x : car amributes y = wx+w0
y : price
y = g (x | θ )
g ( ) model,
θ parameters
12
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0)
Regression ApplicaIons
NavigaIng a car: Angle of the steering
KinemaIcs of a robot arm
(x,y) α1= g1(x,y)
α2= g2(x,y)
α2
α1
n Response surface design
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 13
Supervised Learning: Uses
PredicIon of future cases: Use the rule to predict the
output for future inputs
Knowledge extracIon: The rule is easy to understand
Compression: The rule is simpler than the data it explains
Outlier detecIon: ExcepIons that are not covered by the
rule, e.g., fraud
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 14
Unsupervised Learning
Learning “what normally happens”
No output
Clustering: Grouping similar instances
Example applicaIons
Customer segmentaIon in CRM
Image compression: Color quanIzaIon
BioinformaIcs: Learning moIfs
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 15
Reinforcement Learning
Learning a policy: A sequence of outputs
No supervised output but delayed reward
Credit assignment problem
Game playing
Robot in a maze
MulIple agents, parIal observability, ...
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 16
ML Aplications
Unsupervised Supervised
Classification Regression
Components of a PR System
A basic pattern classification system contains
A sensor, preprocessing and feature extraction mechanism (manual or automated)
Dimensionality reduction step
A classification (regression, clustering, description) algorithm
Model selection mechanism (Cross validation or bootstrap)
A set of examples (training set) already classified or described
Features and patterns (1)
Feature is any distinctive aspect, quality or characteristic
Features may be symbolic (i.e., color) or numeric (i.e., height)
The combination of d features is represented as a d-dimensional column vector called a
feature vector
The d-dimensional space defined by the feature vector is called the feature space
Objects are represented as points in feature space. This representation is called a scatter plot
Pattern is a composite of traits or features characteristic of an individual
In classification tasks, a pattern is a pair of variables {x,r} where
x is a collection of observations or features (feature vector)
r is the concept behind the observation (label) (sometimes we will use t instead of r)
Features and patterns (2)
What makes a “good” feature vector?
The quality of a feature vector is related to its ability to discriminate examples
from different classes
Examples from the same class should have similar feature values
Examples from different classes have different feature values
PR design cycle
Data collection
Probably the most time-intensive component of a PR project
How many examples are enough?
Feature choice
Critical to the success of the PR problem
“Garbage in, garbage out”
Requires basic prior knowledge
Model choice
Statistical, neural and structural approaches
Parameter settings
Training
Given a feature set and a “blank” model, adapt the model to explain the data
Supervised, unsupervised and reinforcement learning
Evaluation
How well does the trained model do?
Overfitting vs. generalization
Feature Selection
(Salmon vs Sea Bass Recognition Problem)
Length Avg. scale intensity
Length and Avg. scale intensity
Model Selection
(Salmon vs Sea Bass Recognition Problem)
Linear Discriminant Function Nonlinear (Neural Network) Function
Performance: 95.7% Performance: 99.9%
Which model should we use?
Training vs. Test performance.
Resources: Libraries
Python-scikit (UI, lib)
Weka (UI, lib java based)
R (UI, lib)
Mlpack (C)
Spark Mllib (scala, java)
Matlab-prtools, nntools?
Knime (interesIng UI)
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 24
Resources: Datasets
UCI Repository: hmp://www.ics.uci.edu/~mlearn/MLRepository.html
UCI KDD Archive: hmp://kdd.ics.uci.edu/
summary.data.applicaIon.html
Statlib: hmp://lib.stat.cmu.edu/
Delve: hmp://www.cs.utoronto.ca/~delve/
Kaggle
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 25
Resources: Journals (scholar.google)
Journal of Machine Learning Research
Machine Learning
Neural ComputaIon
Neural Networks
IEEE TransacIons on Neural Networks
IEEE TransacIons on Pamern Analysis and Machine
Intelligence
Annals of StaIsIcs
Journal of the American StaIsIcal AssociaIon
...
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 26
Resources: Conferences (scholar.google)
InternaIonal Conference on Machine Learning (ICML)
European Conference on Machine Learning (ECML)
Neural InformaIon Processing Systems (NIPS)
Uncertainty in ArIficial Intelligence (UAI)
ComputaIonal Learning Theory (COLT)
InternaIonal Conference on ArIficial Neural Networks
(ICANN)
InternaIonal Conference on AI & StaIsIcs (AISTATS)
InternaIonal Conference on Pamern RecogniIon (ICPR)
ICMLA, ICDM, ICDE, ...
Lecture Notes for E Alpaydın 2010 IntroducIon to Machine Learning 2e © The MIT Press (V1.0) 27
Resources: Video Lectures
Lectures:
Yaser S. Abu-Mostafa:
http://work.caltech.edu/telecourse.html
Andrew Ng:http://www.academicearth.org/courses/machine-learning
http://see.stanford.edu/see/lecturelist.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1
Tom Mitchell:http://www.cs.cmu.edu/~tom/10701_sp11/lectures.shtml
Anil Jain:
http://ocw.korea.edu/ocw/college-of-engineering/introduction-to-pattern-recognition
Statistical Learning:
Deep NN….
Latest topics:
http://videolectures.net/Top/Computer_Science/Machine_Learning/
Istanbul Technical University, BBL514E/BLG527E, Fall 2006- Zehra Cataltepe