0% found this document useful (0 votes)

11 views52 pages

Data Science L19 - LogisticRegression

The document provides an overview of Linear and Logistic Regression, focusing on their applications in predictive modeling and classification tasks. It explains the concepts of ground truth data, cost functions, and the sigmoid function, which is crucial for logistic regression. Additionally, it discusses the decision boundary and the importance of using appropriate cost functions for optimization in machine learning.

Uploaded by

Abhishek Goutam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views52 pages

Data Science L19 - LogisticRegression

Uploaded by

Abhishek Goutam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

ED5340 - Data Science: Theory apa th y

and Practise an
h ug
u t
M
a n
L19 - Logistic Regression (Credit
n a t to Andrew Ng)
h
a
Ram

Ramanathan Muthuganapathy (https://ed.iitm.ac.in/~raman)

Course web page: https://ed.iitm.ac.in/~raman/datascience.html
Moodle page: Available at https://courses.iitm.ac.in/
Linear Regression
Predictive problem - Continuous input / output

• Ground truth data - Input feature / output (x, y)hy are the knowns
a t
ap
• Use a model / hypothesis as h(w) ug
an
th
u
• Develop an error / cost / loss function a n
M J(w) = J(y, ȳ) = J(y, h(w))
t h
n a
a
• The weights are identified byam
R
• min J(w)
• Essentially, ML problem is now reduced to an optimization problem.
• Weights are identified using Optimization.
Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras
Linear Regression
Predictive

• Ground truth data - Input feature / output (x,hyy) are the knowns
a t
ap
• Use a model / hypothesis as h(w) anducost
ga function J(w) n
th
u
M
• h a n
a t
a n
Input (x) am
Hypothesis
R h(w)
Weights / Parameters
Loss function J(w)
Output (Y)

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Classification (binary)

• Ground truth data - Input y

feature / output (x, y) are the pa th
n a
knowns a
h ug
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n 1
Ram

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Classification (binary) - Examples

• Spam / Not spam y

th
pa
• Malignant / benign an a
h ug
u t
• Fraud / No fraud

Output (y)
M
a n
t h
a
• Good / bad grades a n 1
Ram

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Classification (binary)

• Ground truth data - Input y

feature / output (x, y) are the pa th
n a
knowns a
h ug
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Benign
Ram
1

• - Malignant

0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Hypothesis - Linear Regression Model

• Ground truth data - Input y

feature / output (x, y) are the pa th
n a
knowns a
h ug hw(x ) (i)
u t

Output (y)
• Output is either 0 or 1
h a n
M
a t
a n
• - Small
Ram
1

• - Large
(i) (i) (i)
• ȳ = hw(x ) = w0 + w1x 0
(x)
Tum size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Hypothesis - Linear Regression Model with thresholding

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
hw(x ) (i)
u t

Output (y)
M
a n
t h
n a
a 1
Ram
0.5

0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Hypothesis - Increase the training data

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
u t

Output (y)
M
a n
t h
n a
a 1
Ram
0.5

0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Hypothesis - Increase the training data

(i)
• hw(x ) ≥ 0.5, y = 1 h y
a t
ap
(i)
• hw(x ) < 0.5, y = 0 h ug
an
(i)
u t hw(x )

Output (y)
M
• Misclassification starts t h a n
happening a n a
1
Ram
• Not a good idea to use 0.5
Linear Regression

• y < 0 or y > 1
0
(x)
T-shirt size

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

T
• hw(x) = w x h y
a t
ap
T
• hw(x) = σ(w x) h ug
an
u t
1 a n
M

• σ(z) = a th
1+e −z
am
a n
R
• σ(z) is called Sigmoid or
Logistic function.

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −z
n apa
a
ug
• σ(z) is called Sigmoid or M
u th
Logistic function. h a n
a t
a n
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −5z
n apa
a
ug
• σ(z) with 5 M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −10z
n apa
a
ug
• σ(z) with 10. M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −100z
n apa
a
ug
• σ(z) with 100 M
u th
a n
th
n a
a
Ram

Ramanathan Muthuganapathy, Department of Engineering Design, IIT Madras

Logistic Regression
Sigmoid function

1
• σ(z) = th y
1+e −z
n apa
a
ug
• Smoother approximation of u th
step function M
a n
th
n a
a
• This means what? R am