Artificial Intelligence & Machine Learning
TEXTBOOKS/LEARNING RESOURCES:
a) Masashi Sugiyama, Introduction to Statistical Machine Learning (1st ed.), Morgan Kaufmann, 2017. ISBN 978-0128021217.
b) T. M. Mitchell, Machine Learning (1st ed.), McGraw Hill, 2017. ISBN 978-1259096952.
REFERENCE BOOKS/LEARNING RESOURCES:
a) Richard Golden, Statistical Machine Learning A Unified Framework (1st ed.), unknown, 2020.
Lecture
What Is the Logistic Regression?
Sigmoid Function
Types of LR
Applications of LR in different field
Linear regression Vs Logistic Regression
Python program to implement Logistic regression(Lab)
It is used for predicting the categorical dependent variable using
a given set of independent variables.
Outcome must be a categorical or discrete value. It can be either
Logistic Yes or No, 0 or 1, true or False, etc.
Regression Instead of giving the exact value as 0 and 1, it gives the
probabilistic values which lie between 0 and 1. it has ability to
provide probabilities and classify new data using continuous and
discrete datasets.
Classify the observations using different types of data and can
easily determine the most effective variables used for the
classification.
Dr. Sanchali 10 October 2023 3
Logistic Regression
Example
Logistic regression analysis is It helps determine the
valuable for predicting the probabilities between any two
likelihood of an event. classes.
• Mathematical function used to map the predicted values to probabilities. It
maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which cannot
go beyond this limit, so it forms a curve like the "S" form. The S-form curve is
called the Sigmoid function or the logistic function.
• Use the concept of the threshold value, which defines the probability of
Sigmoid / either 0 or 1.
Logistic
function
• Where, x is the linear combination of the input features and
their corresponding coefficients in the logistic regression model.
• e is the base of the natural logarithm (approximately equal to
2.71828).
Decision boundary in
Logistic Regression
• The sigmoid function returns a probability value
between 0 and this probability value is then mapped to
a discrete class which is either “0” or “1”.
• In order to map this probability value to a discrete
class, we select a threshold value. This threshold value
is called Decision boundary.
• Above this threshold value, we will map the probability
values into class 1 and below which we will map values
into class 0.
• Mathematically, it can be expressed as follows:-
• p ≥ 0.5 => class = 1
• p < 0.5 => class = 0
Dr. Sanchali 10 October 2023 6
The ‘S’ curve
• Instead of fitting a regression line, we
fit an "S" shaped logistic function, which
predicts two maximum values (0 or 1).
• The curve from the logistic function
indicates the likelihood of something such
as whether the cells are cancerous or not,
a mouse is obese or not based on its
weight, etc.
How it works
Logistic Regression
Equation
The Logistic regression equation can be obtained from the Linear Regression equation. The
mathematical steps to get Logistic Regression equations are given below:
We know the equation of the straight line can be written as:
In Logistic Regression, y can be between 0 and 1 only, so for this let's divide the above equation by (1-y):
But we need range between -[infinity] to +[infinity], then take logarithm of the equation it will become:
Assumptions in Logistic Regression
The relationship between the independent variables and the log-odds of the outcome follows a straight line in
the logit space. Mathematically, this relationship can be represented as:
•p is the probability of the binary outcome (0 ≤ p ≤ 1)
•β₀, β₁, β₂, ..., βᵣ are the coefficients for the intercept and each predictor variable (x₁, x₂, ..., xᵣ) respectively.
And p(x)/(1-p(x)) is termed odds.
The left-hand side is called the logit or log-odds function.
The odds are the ratio of the chances of success to the chances of failure.
As a result, in Logistic Regression, a linear combination of inputs is translated to log(odds), with an output
of 1.
Assumptions in Logistic Regression
Other considerable Assumptions
Independence of Little or no
Large sample size:
errors: The multicollinearity:
Logistic regression
observations are The predictor
performs well with a
assumed to be variables are not
sufficient number of
independent of each highly correlated with
observations.
other. each other.
Logistic Regression
Types of Logistic R
Binomial LR Multinomial LR Ordinal logistic
regression
No of categories 2 3 or more 3 or more
for response
variable
Does order of NO NO YES
categories matter?
If a person is likely to If a person has Sorting the severity
Example get a positive COVID-19, an of a COVID-19
COVID-19 result or allergy, a cold, or the infection into mild,
not. flu. moderate, or severe.
Dr. Sanchali 10 October 2023 12
Binomial and multinomial LR example
Whether or not to lend to
a bank customer
(outcomes are yes or no).
Classifying texts Predicting
into what language whether a
they come from. student will go
Assessing cancer risk to college, trade
(outcomes are high or school or into
the workforce.
low).
Will a team win Does your cat prefer wet
tomorrow’s game food, dry food or human
(outcomes are yes or no). food?
Binomial LR Multinomial LR
Dr. Sanchali 10 October 2023 13
Logistic Regression
Ordinal Regression
Ranking restaurants on a scale of
0 to 5 stars.
Predicting the podium results of
an Olympic event.
Assessing a choice of candidates,
specifically in places that institute
ranked-choice voting.
Ranking students marks on a
scale of poor, average, best
Linear
regression vs
Logistic
Regression
Application of LR
• Medicine
• Credit scoring
• Hotel Booking
• Text editing
Application of LR in blockchain
Fraud Detection: Blockchain- User Authentication: Market Analysis: For Risk Assessment: In Sentiment Analysis: Logistic
based systems often deal with Blockchain networks may blockchain-based marketplaces blockchain-based lending regression can be utilized in
transactions and financial implement user access controls. or token sales, logistic platforms or insurance systems, sentiment analysis of
activities. Logistic regression Logistic regression could be regression could be used to logistic regression might be blockchain-related news, social
can be used for fraud detection, employed to assess the predict the probability of used to assess the risk media posts, or community
identifying suspicious probability of a user being success for certain projects associated with a particular discussions to gauge the
transactions, or determining the genuine or to detect potential based on various factors such loan or insurance application. sentiment towards specific
likelihood of fraudulent unauthorized access based on as team experience, project projects or tokens.
behavior based on historical user behavior patterns. specifications, market
data. conditions, etc.
Application of LR in Fullstack
A/B Testing: Full-stack developers
often conduct A/B tests to compare
User Behavior Analysis:. different versions of a website or
can be applied to analyze user application. Spam Detection: can be used in
behavior, such as predicting whether spam detection systems to classify
It can be used to assess the incoming emails as either spam or
a user will churn or not, whether they
significance of the results and non-spam.
will click on a specific element, or determine which version performs
whether they will make a purchase.
better in terms of user engagement or
conversion rate.
Sentiment Analysis: If your full-
Recommendation Systems: building
stack application deals with user- User Authentication: Used in
recommendation systems by
generated content like reviews or combination with other algorithms
predicting whether a user will like or
comments, sentiment analysis can be for user authentication based on
dislike a particular item based on
performed to determine whether the behavior patterns or other
their previous behavior and
sentiment expressed is positive or characteristics.
preferenc.es
negative.
Application of LR in DevOps
1 2 3 4 5
Anomaly Detection: Quality Assurance Incident Resource Continuous
1. DevOps teams and Testing: In Management: Optimization: Integration and
often deal with large software development, Deployment
volumes of data from (CI/CD):
various systems and 1. It predict the 1. Predict resource
processes. 1. It can be employed severity of incidents usage patterns, such as
to create predictive or issues in a system CPU, memory, or 1. Evaluate the
models for software based on certain network bandwidth. success or failure of
2. To build anomaly quality assurance and parameters 2. By understanding continuous integration
detection models that testing. 2. Allocate resources resource demands, . and deployment
help identify unusual 2. It can help accordingly. DevOps teams can pipelines based on
behavior or outliers in determine whether a optimize various factors such as
system logs, software component is infrastructure code quality, test
performance metrics, likely to pass or fail provisioning and coverage, or build
or user activities. certain tests based on scaling strategies. times.
historical data.
Further Readings:
1. Logistic regression introduction: https://www.youtube.com/watch?v=wmznPOlEk2k
2. Logistic regression explained: https://www.analyticsvidhya.com/blog/2021/08/conceptual-understanding-of-
logistic-regression-for-data-science-beginners/
3. Logistic Regression — Detailed Overview: https://towardsdatascience.com/logistic-regression-detailed-
overview-46c4da4303bc
4. Python library requirement: Pythohttps://scikit-
learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
4. StatQuest: Logistic Regression: https://www.youtube.com/watch?v=yIYKR4sgzI8
5. Linear Regression Vs. Logistic Regression: https://www.youtube.com/watch?v=OCwZyYH14uw
6. Linear Regression Vs. Logistic Regression: https://shorturl.at/rsMP9