Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views6 pages

Cost Function of Logistic Regression

The document discusses the cost function in logistic regression, emphasizing the use of log loss as a metric for model evaluation. It explains why Mean Squared Error (MSE) is unsuitable for logistic regression due to the nonlinearity introduced by the sigmoid function, leading to a non-convex cost function. Additionally, it highlights the challenges of optimizing non-convex functions, such as getting stuck in local minima and encountering plateaus or saddle points.

Uploaded by

Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Cost Function of Logistic Regression

The document discusses the cost function in logistic regression, emphasizing the use of log loss as a metric for model evaluation. It explains why Mean Squared Error (MSE) is unsuitable for logistic regression due to the nonlinearity introduced by the sigmoid function, leading to a non-convex cost function. Additionally, it highlights the challenges of optimizing non-convex functions, such as getting stuck in local minima and encountering plateaus or saddle points.

Uploaded by

Aman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Cost Function of Logistic Regression

A cost function is a mathematical function that calculates the difference


between the target actual values (ground truth) and the values predicted by
the model. A function that assesses a machine learning model’s performance
also referred to as a loss function or objective function. Usually, the objective
of a machine learning algorithm is to reduce the error or output of cost
function.

Log loss and Cost function for Logistic Regression

One of the popular metrics to evaluate models for classification by using


probabilities is log loss.

F=−∑(i=1 to M) yilog (hθ(xi))+(1−yi)log(1−hθ(xi))

The cost function can be written as:

F(θ)=1/n∑(i=1 to n) 1/2[hθ(xi)−Yi]2
For Logistic Regression,

hθ(x)=g(θTx)
The above equation leads to a non−convex function that acts as the cost
function. The cost function logistic regression is log loss and is summarized
below.

cost(hθ(x), y) = -log(hθ(x)) , when y=1

and

cost(hθ(x), y) = -log(1 - hθ(x)) , when y=0

where,

 y is the actual value of the target variable,


 hθ (x) is the predicted probability that y=1 given , X and
parameterized by θ.
 yi is the actual label for the i th training example.
This cost function penalizes the model with a higher loss when its prediction
diverges from the actual label. Specifically, it imposes a large penalty when
the model confidently predicts the wrong class (i.e., high probability for the
incorrect class).

Why Mean Squared Error suitable for Linear Regression?

Because in linear regression there present a value where exist minimum


error i.e. global minima.
Why Mean Squared Error not suitable for Logistic Regression?

Let’s consider the Mean Squared Error (MSE) as a cost function, but it is not
suitable for logistic regression due to its nonlinearity introduced by the
sigmoid function.

MSE = 1/2m Σ (i=1 to m) (σ(i) - yi)2

In logistic regression, if we substitute the sigmoid function into the above


MSE equation, we get
The equation 1/(1+ez) is a nonlinear transformation, and evaluating this term
within the Mean Squared Error formula results in a non-convex cost function.
A non-convex function, have multiple local minima which can make it difficult
to optimize using traditional gradient descent algorithms as shown below.

Imagine you have a function that looks like a series of hills and valleys, with
multiple peaks and troughs scattered throughout. This type of function is
called non-convex because it doesn't have a single, well-defined minimum
point; instead, it has multiple local minima (valleys) and potentially even
some local maxima (peaks).

When you're trying to optimize such a function, the goal is to find the lowest
point, which corresponds to the global minimum. However, because of the
presence of multiple local minima, traditional gradient descent algorithms
can encounter difficulties.

Why is it challenging?

1. Getting Stuck in Local Minima: Gradient descent algorithms, like the


one used in logistic regression, work by iteratively moving in the direction of
the steepest descent of the function. However, if they start from an initial
point that is not the global minimum and there are multiple local minima,
they might get trapped in one of the local minima instead of reaching the
global minimum. Once stuck in a local minimum, the algorithm cannot
escape it to find the true minimum.

2. Plateaus and Saddle Points: In addition to local minima, non-convex


functions may have plateaus (flat regions) and saddle points (points where
the gradient is zero but not a minimum or maximum). These features can
slow down or stall the convergence of gradient descent algorithms, making
optimization even more challenging.

Mark as Read
Report An Issue

You might also like