What is Machine Learning?
• Optimize a performance criterion using example data or
past experience.
• Role of Statistics: inference from a sample
• Role of Computer science: efficient algorithms to
– Solve an optimization problem
– Represent and evaluate the model for inference
• Learning is used when:
– Human expertise does not exist (navigating on Mars),
– Humans are unable to explain their expertise (speech recognition)
– Solution changes with time (routing on a computer network)
– Solution needs to be adapted to particular cases (user biometrics)
• There is no need to “learn” to calculate payroll
1
What We Talk About When We Talk About
“Learning”
• Learning general models from a data of particular
examples
• Data is cheap and abundant (data warehouses, data marts);
knowledge is expensive and scarce.
• Example in retail: Customer transactions to consumer
behavior:
• Build a model that is a good and useful approximation to
the data.
2
Types of Learning Tasks
• Association
• Supervised learning
– Learn to predict output when given an input vector
• Reinforcement learning
– Learn action to maximize payoff
Payoff is often delayed
Exploration vs. exploitation
Online setting
• Unsupervised learning
– Create an internal representation of the input e.g. form
clusters; extract features
How do we know if a representation is good?
– Big datasets do not come with labels.
3
Learning Associations
• Basket analysis:
P (Y | X ) probability that somebody who buys X also buys
Y where X and Y are products/services.
Example: P ( chips | beer ) = 0.7
4
Classification
• Example: Credit
scoring
• Differentiating
between low-risk and
high-risk customers
from their income and
savings
Discriminant: IF income > θ1 AND savings > θ2
THEN low-risk ELSE high-risk
5
Classification: Applications
• Aka Pattern recognition
• Face recognition: Pose, lighting, occlusion (glasses, beard),
make-up, hair style
• Character recognition: Different handwriting styles.
• Speech recognition: Temporal dependency.
– Use of a dictionary or the syntax of the language.
– Sensor fusion: Combine multiple modalities; eg, visual (lip image)
and acoustic for speech
• Medical diagnosis: From symptoms to illnesses
6
Face Recognition
Training examples of a person
Test images
7
The Role of Learning
8
9
Regression
• Example: Price of a used car
• x : car attributes
y : price
y = g (x, θ ) y = wx+w0
g ( ) model,
θ parameters
10
Supervised Learning: Uses
• Prediction of future cases: Use the rule to predict the
output for future inputs
• Knowledge extraction: The rule is easy to understand
• Compression: The rule is simpler than the data it explains
• Outlier detection: Exceptions that are not covered by the
rule, e.g., fraud
11
Unsupervised Learning
• Learning “what normally happens”
• Clustering: Grouping similar instances
• Example applications
– Customer segmentation in CRM (customer relationship
management)
– Image compression: Color quantization
– Bioinformatics: Learning motifs
12
Example: Netflix
• Application: automatic product recommendation
• Importance: this is the modern/future shopping.
• Prediction goal: Based on past preferences, predict which
movies you might want to watch
• Data: Past movies you have watched
• Target: Like or don’t-like
• Features: ?
13
What makes a 2?
14
Example: Google
• Application: automatic ad selection
• Importance: this is modern/future advertising.
• Prediction goal: Based on your search query, predict which
ads you might be interested in
• Data: Past queries
• Target: Whether the ad was clicked
• Features: ?
15
Example: Call Centers
• Application: automatic call routing
• Importance: this is modern/future customer service.
• Prediction goal: Based on your speech recording, predict
which words you said
• Data: Past recordings of various people
• Target: Which word was intended
• Features: ?
16
Example: Stock Market
• Application: automatic program trading
• Importance: this is modern/future finance.
• Prediction goal: Based on past patterns, predict whether the
stock will go up
• Data: Past stock prices
• Target: Up or down
• Features: ?
17
Web-based examples of machine learning
• The web contains a lot of data. Tasks with very big datasets
often use machine learning
– especially if the data is noisy or non-stationary.
• Spam filtering, fraud detection:
– The enemy adapts so we must adapt too.
• Recommendation systems:
– Lots of noisy data. Million dollar prize!
• Information retrieval:
– Find documents or images with similar content.
18
What is a Learning Problem?
• Learning involves performance
Develop methods, techniques
improving and tools for building intelligent
– at some task T learning machines, that can
– with experience E solve the problem in
combination with an available
– evaluated in terms of performance measure P data set of training examples.
• Example: learn to play checkers
– Task T: playing checkers
– Experience E: playing against itself When a learning machine
improves its performance at a
– Performance P: percent of games won
given task over time, without
• What exactly should be learned? reprogramming, it can be said
to have learned something.
– How might this be represented?
– What specific algorithm should be used?
19
Components of a Learning Problem
• Task: the behavior or task that’s being improved, e.g.
classification, object recognition, acting in an environment.
• Data: the experiences that are being used to improve
performance in the task.
• Measure of improvements: How can the improvement
be measured? Examples:
– Provide more accurate solutions (e.g. increasing the accuracy in
prediction)
– Cover a wider range of problems
– Obtain answers more economically (e.g. improved speed)
– Simplify codified knowledge
– New skills that were not presented initially
20
Hypothesis Space
• One way to think about a supervised learning machine is as a device that
explores a “hypothesis space”.
– Each setting of the parameters in the machine is a different hypothesis
about the function that maps input vectors to output vectors.
– If the data is noise-free, each training example rules out a region of
hypothesis space.
– If the data is noisy, each training example scales the posterior
probability of each point in the hypothesis space in proportion to how
likely the training example is given that hypothesis.
• The art of supervised machine learning is in:
– Deciding how to represent the inputs and outputs
– Selecting a hypothesis space that is powerful enough to represent the
relationship between inputs and outputs but simple enough to be
searched.
21
Generalization
• The real aim of supervised learning is to do well on test
data that is not known during learning.
• Choosing the values for the parameters that minimize the
loss function on the training data is not necessarily the best
policy.
• We want the learning machine to model the true
regularities in the data and to ignore the noise in the data.
– But the learning machine does not know which
regularities are real and which are accidental quirks of
the particular set of training examples we happen to
pick.
• So how can we be sure that the machine will generalize
correctly to new data?
22
Some Issues in Machine Learning
• Understanding Which Processes Need Automation
• Lack of Quality Data
• Inadequate Infrastructure
• Lack of Skilled Resources
• Getting Bad Predictions to Come Together With Biases
• Making the Wrong Assumptions
• Ethics
(If my self-driving car kills someone on the road, whose fault is it)
23
Ways of Learning
• Rote learning, i.e. learning from memory; in a mechanical
way
• Learning from examples and by practice
• Learning from instructions/advice/explanations
• Learning by analogy
• Learning by discovery
• …
24
Inductive and Deductive Learning
• Inductive Learning: Reasoning from a set of examples to produce a general
rules. The rules should be applicable to new examples, but there is no
guarantee that the result will be correct.
• Deductive Learning: Reasoning from a set of known facts and rules to
produce additional rules that are guaranteed to be true.(1-If all humans are mortal,
and John is a human, then John is mortal. 2- Bachelors are unmarried men. Bill is
unmarried. Therefore, Bill is a bachelor. To get a Bachelor’s degree at a college, a
student must have 120 credits. Sally has more than 130 credits. Therefore, Sally has a
bachelor’s degree.)
• .Inductive reasoning involves starting from specific premises and forming a
general conclusion, while deductive reasoning involves using general premises
to form a specific conclusion.(1-This marble from the bag is black. That marble from
the bag is black. A third marble from the bag is black. Therefore all the marbles in the
bag black. 2- A stockbroker notices that a company's stock decreased significantly
during the summer for the last four years. Therefore, he advises his clients not to invest
in that company during the summer.)
25
Assessment of Learning Algorithms
• The most common criteria for learning algorithms
assessments are:
– Accuracy (e.g. percentages of correctly classified +’s and –’s)
– Efficiency (e.g. examples needed, computational tractability)
– Robustness (e.g. against noise, against incompleteness)
– Special requirements (e.g. incrementality, concept drift)
– Concept complexity (e.g. representational issues – examples &
bookkeeping)
– Transparency (e.g. comprehensibility for the human user)
26
Some Theoretical Settings
• Inductive Logic Programming (ILP)
• Probably Approximately Correct (PAC) Learning
• Learning as Optimization (Reinforcement Learning)
• Bayesian Learning
• …
27
Key Aspects of Learning
• Learner: who or what is doing the learning, e.g. an
algorithm, a computer program.
• Domain: what is being learned, e.g. a function, a concept.
• Goal: why the learning is done.
• Representation: the way the objects to be learned are
represented.
• Algorithmic Technology: the algorithmic framework to be
used, e.g. decision trees, lazy learning, artificial neural
networks, support vector machines
28
The Role of Learning
• Learning is at the core of
– Understanding High Level Cognition
– Performing knowledge intensive inferences
– Building adaptive, intelligent systems
– Dealing with messy, real world data
• Learning has multiple purposes
– Knowledge Acquisition
– integration of various knowledge sources to ensure robust behavior
– Adaptation (human, systems)
29