0% found this document useful (0 votes)

64 views46 pages

ML Classifiers & Regression Guide

This document discusses different machine learning algorithms for classification and regression. It compares generative classifiers like Naive Bayes to discriminative classifiers like logistic regression. It also covers linear regression, discussing techniques like least squares estimation, gradient descent, and regularization methods like ridge regression and lasso. Non-linear regression methods using polynomial bases, kernels, and regression trees are also summarized.

Uploaded by

MarioMateiroMM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views46 pages

ML Classifiers & Regression Guide

Uploaded by

MarioMateiroMM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 46

Revisiting Logistic Regression & Nave Bayes

Aarti Singh
Machine Learning 10-701/15-781 Jan 27, 2010

Generative and Discriminative Classifiers

Training classifiers involves learning a mapping f: X -> Y, or P(Y|X) Generative classifiers (e.g. Nave Bayes) Assume some functional form for P(X,Y) (or P(X|Y) and P(Y)) Estimate parameters of P(X|Y), P(Y) directly from training data Use Bayes rule to calculate P(Y|X) Discriminative classifiers (e.g. Logistic Regression) Assume some functional form for P(Y|X) Estimate parameters of P(Y|X) directly from training data

Logistic Regression
Assumes the following functional form for P(Y|X):

Alternatively,

(Linear Decision Boundary)

DOES NOT require any conditional independence assumptions

Connection to Gaussian Nave Bayes

There are several distributions that can lead to a linear decision boundary. As another example, consider a generative model:

Exponential family Observe that Gaussian is a special case

Connection to Gaussian Nave Bayes

Constant term

First-order term

Special case: P(X|Y=y) ~ Gaussian( y,y) where 0 = 1 (cij,0 = cij,1) Conditionally independent cij,y = 0 , i j (Gaussian Nave Bayes)

Generative vs Discriminative
Given infinite data (asymptotically), If conditional independence assumption holds, Discriminative and generative perform similar.

If conditional independence assumption does NOT holds, Discriminative outperforms generative.

Generative vs Discriminative
Given finite data (n data points, p features), Ng-Jordan paper

Nave Bayes (generative) requires n = O(log p) to converge to its asymptotic error, whereas Logistic regression (discriminative) requires n = O(p). Why? Independent class conditional densities * smaller classes are easier to learn * parameter estimates not coupled each parameter is learnt independently, not jointly, from training data.
7

Nave Bayes vs Logistic Regression

Verdict Both learn a linear decision boundary. Nave Bayes makes more restrictive assumptions and has higher asymptotic error, BUT converges faster to its less accurate asymptotic error.
8

Experimental Comparison (Ng-Jordan01)

UCI Machine Learning Repository 15 datasets, 8 continuous features, 7 discrete features

Classification so far (Recap)

Classification Tasks
Features, X
Diagnosing sickle cell anemia Tax Fraud Detection

Labels, Y
Anemic cell Healthy cell

Web Classification

Sports Science News

Drive to CMU, Rachels fan, Shop at SH Giant Eagle

Predict squirrel hill resident

Resident 11 Not resident

Classification
Goal:

Sports Science News

Features, X

Labels, Y

Probability of Error
12

Classification
Optimal predictor: (Bayes classifier)

Depends on unknown distribution

Classification algorithms
However, we can learn a good prediction rule from training data

Independent and identically distributed

Learning algorithm

So far

Decision Trees K-Nearest Neighbor Nave Bayes Logistic Regression

Linear Regression
Aarti Singh
Machine Learning 10-701/15-781 Jan 27, 2010

Discrete to Continuous Labels

Classification Sports Science News
X = Document

Anemic cell Healthy cell

Y = Topic

X = Cell Image

Y = Diagnosis

Regression

Stock Market Prediction

Y=?
X = Feb01
16

Regression Tasks
Weather Prediction

Y = Temp
X = 7 pm

Estimating Contamination
X = new location Y = sensor reading

Supervised Learning
Goal:

Sports Science News

Y=?
X = Feb01

Classification:

Regression:

Probability of Error

Mean Squared Error

Regression
Optimal predictor: (Conditional Mean)

Dropping subscripts for notational convenience

Regression
Optimal predictor: (Conditional Mean)

Intuition: Signal plus (zero-mean) Noise model

Depends on unknown distribution

Regression algorithms
Learning algorithm

Linear Regression Lasso, Ridge regression (Regularized Linear Regression) Nonlinear Regression Kernel Regression Regression Trees, Splines, Wavelet estimators,
Empirical Risk Minimizer:
21

Empirical mean

Linear Regression
Least Squares Estimator

- Class of Linear functions

Uni-variate case:

2 = slope

1 - intercept

Multi-variate case:

where

,
22

Least Squares Estimator

Normal Equations
p xp p x1 p x1

is invertible,

When is invertible ? (Homework 2) Recall: Full rank matrices are invertible. What is rank of What if is not invertible ? (Homework 2) Regularization (later)

Geometric Interpretation

Difference in prediction on training set:

is the orthogonal projection of onto the linear subspace spanned by the columns of
26

Revisiting Gradient Descent

Even when is invertible, might be computationally expensive if A is huge.

Gradient Descent Initialize: Update:

0 if

= < .
27

Stop: when some criterion met e.g. fixed # iterations, or

Effect of step-size

Large => Fast convergence but larger residual error Also possible oscillations Small => Slow convergence but small residual error

When does Gradient Descent succeed?

View of the algorithm is myopic.

http://www.ce.berkeley.edu/~bayen/

http://demonstrations.wolfram.com

Guaranteed to converge to local minima if

Converges as in jth direction Convergence depends on eigenvalue spread

Least Squares and MLE

Intuition: Signal plus (zero-mean) Noise model

log likelihood

Least Square Estimate is same as Maximum Likelihood Estimate under a Gaussian model ! 30

Regularized Least Squares and MAP

What if is not invertible ?

log likelihood

log prior

I) Gaussian Prior

Ridge Regression
Closed form: HW Prior belief that is Gaussian with zero-mean biases solution to small
31

Regularized Least Squares and MAP

What if is not invertible ?

log likelihood

log prior

II) Laplace Prior

Lasso
Closed form: HW Prior belief that is Laplace with zero-mean biases solution to small
32

Ridge Regression vs Lasso

Ridge Regression: Lasso:
HOT! Ideally l0 penalty, but optimization becomes non-convex

s with constant J() (level sets of J()) s with constant l2 norm

s with constant l1 norm

s with constant l0 norm

Lasso (l1 penalty) results in sparse solutions vector with more zero coordinates Good for high-dimensional problems dont have to store all coordinates!

Beyond Linear Regression

Polynomial regression

Regression with nonlinear features/basis functions

Kernel regression - Local/Weighted regression

Regression trees Spatially adaptive regression

Polynomial Regression
Univariate case: where ,

Weight of each feature

Nonlinear features
35

Nonlinear Regression
Basis coefficients Nonlinear features/basis functions

Fourier Basis

Wavelet Basis

Good representation for oscillatory functions

Good representation for functions localized at multiple scales

Local Regression
Basis coefficients Nonlinear features/basis functions

Globally supported basis functions (polynomial, fourier) will not yield a good representation

Local Regression
Basis coefficients Nonlinear features/basis functions

Globally supported basis functions (polynomial, fourier) will not yield a good representation

Kernel Regression (Local)

Weighted Least Squares Weigh each training point based on distance to test point

K Kernel
h Bandwidth of kernel

Nadaraya-Watson Kernel Regression

constant

Nadaraya-Watson Kernel Regression

constant

with box-car kernel #pts in h ball around X Sum of Ys in h ball around X Recall: NN classifier Average <-> majority41vote

Choice of Bandwidth
h
Should depend on n, # training data (determines variance) Should depend on smoothness of function (determines bias)

Large Bandwidth average more data points, reduce noise (Lower variance) Small Bandwidth less smoothing, more accurate fit (Lower bias) Bias Variance tradeoff : More to come in later lectures
42

Spatially adaptive regression

If function smoothness varies spatially, we want to allow bandwidth h to depend on X Local polynomials, splines, wavelets, regression trees
43

Regression trees
Binary Decision Tree Num Children? 2 <2

Average (fit a constant ) on the leaves

Regression trees
Quad Decision Tree

h
h

- Polynomial fit on each leaf

If Else stop

, then split Compare residual error with and without split

Summary
Discriminative vs Generative Classifiers - Nave Bayes vs Logistic Regression Regression - Linear Regression Least Squares Estimator Normal Equations Gradient Descent Geometric Interpretation Probabilistic Interpretation (connection to MLE) - Regularized Linear Regression (connection to MAP) Ridge Regression, Lasso - Polynomial Regression, Basis (Fourier, Wavelet) Estimators - Kernel Regression (Localized) - Regression Trees
46

Principles of Gas-Solid Flows
100% (3)
Principles of Gas-Solid Flows
575 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
The Leadership Quest
No ratings yet
The Leadership Quest
15 pages
Generalized Linear Model
No ratings yet
Generalized Linear Model
67 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Lecture 11 Logistic
No ratings yet
Lecture 11 Logistic
19 pages
Week 4 Logistic
No ratings yet
Week 4 Logistic
21 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
5 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Unit 2
No ratings yet
Unit 2
133 pages
4 Linear Regression Additional Notes
No ratings yet
4 Linear Regression Additional Notes
8 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
ML 3
No ratings yet
ML 3
66 pages
Homework2 v1.0
No ratings yet
Homework2 v1.0
5 pages
PRML Slides 3
No ratings yet
PRML Slides 3
57 pages
Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
110 pages
Logistic Regression
No ratings yet
Logistic Regression
26 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Hundred Page ML Book CH 3
No ratings yet
Hundred Page ML Book CH 3
16 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Unit-2 ML
No ratings yet
Unit-2 ML
199 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Classification
No ratings yet
Classification
47 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Tutorial: Gaussian Process Models For Machine Learning
No ratings yet
Tutorial: Gaussian Process Models For Machine Learning
35 pages
Machine Learning Guide 2017
No ratings yet
Machine Learning Guide 2017
15 pages
Lecture 3
No ratings yet
Lecture 3
35 pages
2021 Logistic Regression
No ratings yet
2021 Logistic Regression
33 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
41 Machine Learning Algorithms I
No ratings yet
41 Machine Learning Algorithms I
8 pages
Smai Lecture 06 Regression
No ratings yet
Smai Lecture 06 Regression
46 pages
Lecture - 4 - Logistic Regression
No ratings yet
Lecture - 4 - Logistic Regression
62 pages
MLT Unit 2 - Updated
No ratings yet
MLT Unit 2 - Updated
58 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
M7 ClassificationLinearModels
No ratings yet
M7 ClassificationLinearModels
74 pages
3.1 Linear and Logistic Regression
No ratings yet
3.1 Linear and Logistic Regression
36 pages
Notes MSM
No ratings yet
Notes MSM
66 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Iet Cipher ML Bootcamp (Session-1)
No ratings yet
Iet Cipher ML Bootcamp (Session-1)
67 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Final ML
No ratings yet
Final ML
54 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Fileml
No ratings yet
Fileml
54 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
Vibration Analysis for Engineers
86% (7)
Vibration Analysis for Engineers
36 pages
Bacterial Enumeration for Students
80% (5)
Bacterial Enumeration for Students
3 pages
Plate Forming by Line Heating
No ratings yet
Plate Forming by Line Heating
166 pages
Foaie Matricola in Engleza
No ratings yet
Foaie Matricola in Engleza
5 pages
S5 TMsupplement EN SIEZ S800000 58A 0 0
No ratings yet
S5 TMsupplement EN SIEZ S800000 58A 0 0
156 pages
Thermodynamics for Chemistry Students
100% (1)
Thermodynamics for Chemistry Students
55 pages
Quantum Mechanics PyEd 342
100% (3)
Quantum Mechanics PyEd 342
113 pages
Jose Saletan Classical Dynamics
No ratings yet
Jose Saletan Classical Dynamics
3 pages
Mixed States and Pure States: (Dated: April 9, 2009)
No ratings yet
Mixed States and Pure States: (Dated: April 9, 2009)
14 pages
Exam1 04
No ratings yet
Exam1 04
7 pages
107 ICGG2018 Full Paper (17926) PDF
No ratings yet
107 ICGG2018 Full Paper (17926) PDF
12 pages
Thermodynamics for Engineering Students
No ratings yet
Thermodynamics for Engineering Students
20 pages
Assignment 1 - RCC
No ratings yet
Assignment 1 - RCC
7 pages
Chaos in Dynamical Systems by The Poincaré-Melnikov-Arnold Method
No ratings yet
Chaos in Dynamical Systems by The Poincaré-Melnikov-Arnold Method
7 pages
Pipeline Sizing for Engineers
No ratings yet
Pipeline Sizing for Engineers
7 pages
Basics of Fluid Mechanics Concepts
No ratings yet
Basics of Fluid Mechanics Concepts
15 pages
Ijsrdv3i110384 PDF
No ratings yet
Ijsrdv3i110384 PDF
5 pages
Light Pole Footing Design PDF
100% (2)
Light Pole Footing Design PDF
7 pages
MIT2 017JF09 Acoustics PDF
No ratings yet
MIT2 017JF09 Acoustics PDF
15 pages
MPC Book PDF
No ratings yet
MPC Book PDF
193 pages
02 Rotation Matrices
No ratings yet
02 Rotation Matrices
13 pages
Framed Structures. Ductility and Seismic Response General Report
No ratings yet
Framed Structures. Ductility and Seismic Response General Report
8 pages
SEISMIC DESIGN Priestley PDF
No ratings yet
SEISMIC DESIGN Priestley PDF
22 pages
Fuchs Reniso KC 68 Sds
No ratings yet
Fuchs Reniso KC 68 Sds
6 pages
Dr. Niranjan Murthy H L: Asst. Prof., Dept. of Physiology Sree Siddhartha Medical College & Hospital, Tumkur
100% (6)
Dr. Niranjan Murthy H L: Asst. Prof., Dept. of Physiology Sree Siddhartha Medical College & Hospital, Tumkur
29 pages
Liquid Penetrant Testing Course Outline
No ratings yet
Liquid Penetrant Testing Course Outline
1 page
Physics 71 Equations
No ratings yet
Physics 71 Equations
3 pages
Signals and Systems Module 1
No ratings yet
Signals and Systems Module 1
35 pages

ML Classifiers & Regression Guide

Uploaded by

ML Classifiers & Regression Guide

Uploaded by

Revisiting Logistic Regression & Nave Bayes

Generative and Discriminative Classifiers

(Linear Decision Boundary)

DOES NOT require any conditional independence assumptions

Connection to Gaussian Nave Bayes

Exponential family Observe that Gaussian is a special case

Connection to Gaussian Nave Bayes

If conditional independence assumption does NOT holds, Discriminative outperforms generative.

Nave Bayes vs Logistic Regression

Experimental Comparison (Ng-Jordan01)

Classification so far (Recap)

Sports Science News

Predict squirrel hill resident

Resident 11 Not resident

Sports Science News

Depends on unknown distribution

Independent and identically distributed

Decision Trees K-Nearest Neighbor Nave Bayes Logistic Regression

Discrete to Continuous Labels

Anemic cell Healthy cell

Stock Market Prediction

Sports Science News

Mean Squared Error

Dropping subscripts for notational convenience

Intuition: Signal plus (zero-mean) Noise model

Depends on unknown distribution

- Class of Linear functions

Least Squares Estimator

Least Squares Estimator

Difference in prediction on training set:

Revisiting Gradient Descent

Gradient Descent Initialize: Update:

Stop: when some criterion met e.g. fixed # iterations, or

When does Gradient Descent succeed?

Guaranteed to converge to local minima if

Converges as in jth direction Convergence depends on eigenvalue spread

Least Squares and MLE

Regularized Least Squares and MAP

Regularized Least Squares and MAP

II) Laplace Prior

Ridge Regression vs Lasso

s with constant J() (level sets of J()) s with constant l2 norm

s with constant l1 norm

s with constant l0 norm

Beyond Linear Regression

Regression with nonlinear features/basis functions

Kernel regression - Local/Weighted regression

Regression trees Spatially adaptive regression

Weight of each feature

Good representation for oscillatory functions

Good representation for functions localized at multiple scales

Kernel Regression (Local)

Nadaraya-Watson Kernel Regression

Nadaraya-Watson Kernel Regression

Spatially adaptive regression

Average (fit a constant ) on the leaves

- Polynomial fit on each leaf

, then split Compare residual error with and without split

You might also like