Classification:
A machine learning perspective
Emily Fox & Carlos Guestrin
Machine Learning Specialization
University of Washington
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Part of a specialization
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
This course is a part of the
Machine Learning Specialization
1. Foundations
4. Clustering 5. Recommender
2. Regression 3. Classification
& Retrieval Systems
6. Capstone
3 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
What is the course about?
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
What is classification?
From features to predictions
ML
Data Classifier Intelligence
Method
Input x:
features derived Learn xày
from data
relationship Predict y:
categorical “output”,
class or label
5 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Sentiment classifier
Input x: Easily best sushi in Seattle.
Sentence Sentiment
Classifier
Output: y
Sentiment
6 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Classifier
Sentence
Classifier
from
review MODEL
Output: y
Input: x Predicted
class
7 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Example multiclass classifier
Output y has more than 2 categories
Education
Finance
Technology
Input: x Output: y
Webpage
8 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Spam filtering
Not spam
Spam
Input: x Output: y
Text of email,
9
sender, IP,… ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Image classification
Input: x Output: y
Image pixels Predicted object
10 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Personalized medical diagnosis
Input: x Output: y
Healthy
Disease Cold
Classifier Flu
MODEL Pneumonia
…
11 ©2015 Emily Fox & Carlos Guestrin Machine Learning Specialization
Reading your mind
Inputs x are
brain region Output y
intensities
“Hammer”
“House”
12 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Impact of classification
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Impact of classification
14 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Course overview
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Course philosophy: Always use case studies & …
Core
Visual Algorithm
concept
Advanced
Practical Implement
topics
I O N A L
OPT
16 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Overview of content
Models Algorithms Core ML
Linear Alleviating
Gradient
classifiers overfitting
Logistic Stochastic Handling
regression gradient missing data
Decision Recursive Precision-
trees greedy recall
Online
Ensembles Boosting
learning
17 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Course outline
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Overview of modules
Models Algorithms Core ML
Alleviating
Linear classifiers Gradient
overfitting
Module 1 Modules 2 & 3
Modules 3 & 5
Handling missing
Logistic regression Stochastic gradient
data
Modules 1, 2, 3 Module 9
Module 6
Decision trees Recursive greedy Precision-recall
Modules 4 & 5 Module 4 Module 8
Ensembles Boosting Online learning
Module 7 Module 8 Module 9
19 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 1: Linear classifiers
Word Coefficient
#awesome 1.0
#awful -1.5
Score(x) = 1.0 #awesome – 1.5 #awful
#awful
Score(x) < 0
…
0
Score(x) > 0
0 1 2 3 4 …
#awesome
20 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 1: Logistic regression represents probabilities
⌃
P(y=+1|x,ŵ) = 1 .
1 + e-ŵ h(x)
T
21 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 2: Learning “best” classifier
Maximize likelihood over all possible w0,w1,w2
ℓ(w0=0, w1=1, w2=-1.5) = 10-6
#awful
ℓ(w0=1, w1=1, w2=-1.5) = 10-5
… Best model with
4 gradient ascent:
3 Highest likelihood ℓ(w)
2 ŵ = (w0=1, w1=0.5, w2=-1.5)
1
ℓ(w0=1, w1=0.5, w2=-1.5) = 10-4
0
0 1 2 3 4 …
#awesome
23 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 3: Overfitting & regularization
True error
Classification
error
Training error
Model complexity
Use regularization penalty 2
to mitigate overfitting
ℓ(w)
(w) - λ||w||2
25 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 4: Decision trees
Start
excellent poor
Credit?
fair
Income?
Safe Term?
high Low
3 years 5 years
Risky Safe Term? Risky
3 years 5 years
Risky Safe
26 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 5: Overfitting in decision trees
Decision Tree
Depth 1 Depth 3 Depth 10
Logistic Regression
Degree 1 features Degree 2 features Degree 6 features
27 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 5: Alleviate overfitting by learning simpler trees
Occam’s Razor: “Among competing hypotheses,
the one with fewest assumptions should be
selected”, William of Occam, 13th Century
Complex Tree Simpler Tree
Simplify
28 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 6: Handling missing data
Start
Credit Term Income y
excellent poor
excellent 3 yrs high safe Credit?
fair ? low risky fair
or unknown
fair 3 yrs high safe Income?
Safe Term?
poor 5 yrs high risky high Low
3 years 5 years or unknown
excellent 3 yrs low risky or unknown
fair 5 yrs high safe Risky Safe Term? Risky
poor ? high risky 3 years 5 years
or unknown
poor 5 yrs low safe
fair ? high safe Risky Safe
30 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 7: Boosting question
“Can a set of weak learners be combined to
create a stronger learner?” Kearns and Valiant (1988)
Yes! Schapire (1990)
Boosting
Amazing impact: simple approach widely used in
industry wins most Kaggle competitions
32 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 7: Boosting using AdaBoost
Income>$100K? Credit history? Savings>$100K? Market conditions?
Yes No Bad Good Yes No Bad Good
Safe Risky Risky Safe Safe Risky Risky Safe
f1(xi) = +1 f2(xi) = -1 f3(xi) = -1 f4(xi) = +1
Ensemble: Combine votes from many simple
classifiers to learn complex classifiers
33 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 8: Precision-recall
Goal: increase
# guests by 30%
Need an automated,
“authentic”
Reviews marketing campaign
Great quotes Spokespeople
“Easily best sushi in Seattle.”
Accuracy not most important metric
PRECISION RECALL
Did I (mistakenly) show a Did I not show a (great)
negative sentence??? positive sentence???
34 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Module 9: Scaling to huge datasets & online learning
4.8B webpages 500M Tweets/day 5B views/day
Stochastic gradient: tiny modification to gradient,
a lot faster, but annoying in practice
Avg. log likelihood
Gradient
Better
35 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Assumed background
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Courses 1 & 2 in this ML Specialization
• Course 1: Foundations
- Overview of ML case studies
- Black-box view of ML tasks
- Programming & data
manipulation skills
• Course 2: Regression
- Data representation (input, output, features)
- Linear regression model
- Basic ML concepts:
• ML algorithm
• Gradient descent
• Overfitting
• Validation set and cross-validation
• Bias-variance tradeoff
• Regularization
37 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Math background
• Basic calculus
- Concept of derivatives
• Basic vectors
• Basic functions
- Exponentiation ex
- Logarithm
38 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Programming experience
• Basic Python used
- Can pick up along the way if
knowledge of other language
39 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Reliance on GraphLab Create
• SFrames will be used, though not required
- open source project of Dato
(creators of GraphLab Create)
- can use pandas and numpy instead
• Assignments will:
1. Use GraphLab Create to
explore high-level concepts
2. Ask you to implement
all algorithms without GraphLab Create
• Net result:
- learn how to code methods in Python
40 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Computing needs
• Basic 64-bit desktop or laptop
• Access to internet
• Ability to:
- Install and run Python (and GraphLab Create)
- Store a few GB of data
41 ©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization
Let’s get started!
©2015-2016 Emily Fox & Carlos Guestrin Machine Learning Specialization