Basic Supervised Machine
Learning Models
By Dr Shantanu Pathak
Tasks in Supervised Learning
• Classification
• Regression
Classification Task
Classification Task
• Predict a class / label using features/ columns
• Types of Classification in Supervised Learning
• Binary Classification
• Multi-Class Classification
Binary classification
• Only Two classes in output
• Positive class (Represented by 1)
• When test results are positive
• Ex In diabetes detection, when test results are positive, then person has diabetes
• Negative class (Represented by 0)
• When test results are negative
• So, in patient blood report, diabetes detected is positive class and no diabetes
detected is negative class
Ex. In an image Dog is present ?
• Dog present is positive class
• Dog absent is negative class
Multi Class Classification
• More than two classes in output
• Example:
• Predict Breathing problem using X-ray
• Normal
• Asthma
• Covid-19
Classification Models
• Basic Models
• Logistic Regression
• Naïve Bayes
• SVM
• K Nearest Neighbors
• Decision Tree
• Ensemble Learning
• Random Forest (Bagging)
• XG Boost (Boosting)
• Cat Boost (Boosting)
Logistic Regression
Logistic Regression ( logit classifier)
• Used for classification (binary and multi-class) ONLY
• Models linear relation between independent variables to predict a
binary or categorical output.
• Target is to have minimum loss, so it does find the right coefficients by
doing multiple iterations
• Output is in form of probability, between 0 to 1
• Output gives the probability that Y will be class 1
• t = b0 + b1X1 + b2X2 + …. bnXn
• Where b0 is intercept
• b1 to bn are coefficients just like linear regression
• Y = sigmoid(t) = 1 / ( 1 + e ^ (-t) )
• Sigmoid / logistic function converts the real valued t into value between 0 to 1
Logistic / Sigmoid Function
• S-Shaped curve
• Output(y) is always between 0 to 1
• Input(x) can be any value between
–inf to + inf
• Converts real valued input to binary
or in terms of probability
• By Qef (talk) - Created from scratch with gnuplot, Public Domain,
https://commons.wikimedia.org/w/index.php?curid=4310325
Logistic Regression (logit classifier)
•
Logistic Regression Parameters
• Solver = ‘saga’ , ‘liblinear’, ’lbfgs’
• According to Scikit Documentation: The SAGA solver is often the best choice.
• Penalty : l1 , l2, elasticnet
• The ‘liblinear’ solver supports both L1 and L2 regularization
• The Elastic-Net regularization is only supported by the ‘saga’ solver.
• 'saga' supports all penalties
• class_weight: balanced or in dictionary form
• l1_ratio : used when penalty is 'elasticnet'. It tells how much portion will be l1
penalty
• multi_class : Support for multi class classification problem.
• ‘multinominal’ : ‘saga’ solver supports this. It can model multiclass classification directly.
• ‘ovr’ : One vs rest strategy supported by ‘liblinear’ solver
Further Reading: https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-definitions
SAGA is variant of SAG ( Stochastic Average Gradient)
Naïve Bayes
Probability
• Joint probability is the probability of two events occurring
simultaneously.
• Marginal probability is the probability of an event irrespective of the
outcome of another variable.
• Conditional probability is the probability of one event occurring in the
presence of a second event.
Probability
• Joint Probability to Bayes Rule
• P(AB) = P(A) * P (B) .. A & B independent
• P(AB) = P(A) * P(B|A) = P(B)*P(A|B)
• When A & B are dependent
Naive Bayes
• Bayesian Model
▪ P(C | x) =
▪ P(C|x) -> Probability of given sample ‘x’ belonging to class ‘C’
▪ P(x) -> Probability of ‘x’ in complete sample space / training data
▪ P(C) -> Probability of ‘C’ in complete sample space / training data
▪ P(x|C) ->Probability of occurrence of ‘x’ when ‘C’ is given in training data
Naive Bayes
• Used for Binary Classification Task ONLY
• Assumption : Input consists of set of independent features / columns
• Bayesian Probabilistic Model is used
• Simple yet effective
• Highly scalable
• Application: Automatic medial case classification / diagnosis
Data : Example of Naive Bayes
Sr No Time(X1) Vehicle Type(X2) Accident(Target/
Y)
1 Early Morning Heavy Yes
2 Early Morning Light No
3 Early Morning Light Yes
4 Early Morning Heavy No
5 Early Morning Heavy Yes
6 Early Morning Heavy No
7 Evening Light Yes
8 Evening Light No
9 Evening Heavy No
10 Evening Heavy Yes
11 Day Time Heavy No
12 Day Time Heavy Yes
13 Day Time Light No
14 Night Light Yes
Probabilities : Example of Naive Bayes
• P(Accident) =0.5 P(No Accident) =0.5
• P(em)=6/14 P(day time)= 3/14
• P(ev)=4/14 P(night)=1/14
• P(light)=6/14 P(heavy)=8/14
Probabilities : Example of Naive Bayes
• P(daytime | Accident) = 1/7
• P(em | Accident) = 3/7
• P(ev | Accident) = 2/7
• P(n | Accident) =1/7
• P(Light | Accident) = 3/7
• P(Heavy | Accident) = 4/7
Probabilities : Example of Naive Bayes
• P(daytime | Accident = ‘No’) = ??
• P(em | Accident = ‘No’) = ??
• P(ev | Accident = ‘No’) = ??
• P(n | Accident = ‘No’) =??
• P(Light | Accident = ‘No’) = ??
• P(Heavy | Accident = ‘No’) = ??
Example of Naive Bayes
• What is probability of an accident given that its early
morning?
P(Accident|em)=P(em|Accident)P(Accident)/ P(em)
= (3/7 * 7/14 ) /(6/14) = 3/6
P(Accident|DT) = P(DT|Accident)P(Acc)/P(DT)
= (1/7 * 7/14 ) / (3/14) = 1/3
P(Accident | (em,light)) = P(Acc|em) * P(Acc|light)
= 3/6 * 3/6 = 0.25
P(C | (x1,x2,x3 ... xn)) = P(C|x1) * P(C|x2) * ... *P(C|xn)
Types of Naïve Bayes
• Gaussian Naive Bayes
• For numeric data
• For large data
• Multi Nominal Naïve Bayes
• For categorical features like Text data
• Bernoulli's Naïve Bayes
• For Binary Variables
• Application of Naive Bayes / Bayesian Formula
• Incremental Learning
• Probabilistic Reasoning
Support Vector Machine
Support Vector Machines(SVM)
• Used ONLY for Binary Classification or Regression
• It can handle Linear / Non-Linear boundary
• Goal: Find the decision boundary between two classes with maximum
margin
• Margin is based on
Samples close to the
boundary. These
samples are called as
“Support Vectors”
Support Vector Machines(SVM)
• Margin
• Minimum distance between proposed
boundary and closest point(s) from each
class
• Which boundary to choose?
• Single dataset may have multiple
possible boundaries between two classes
• Boundary with maximum margin is most
beneficial
• Such boundary avoids misclassification,
even if there is some deviation in new
data from original data
Support Vector Machines(SVM)
• Binary Classification:
• Points which are on either side of the boundary are given to respective class
• For new point class is predicted by using which side of the boundary it is
• Regression
• Using decision boundary linear equation is formed
• This is transformed into probability using logistic function
• Output of logistic function ( probability) is used for regression
SVM Kernel Trick
• Transform the data by adding non-linear features
• ( x^2 , cos(x) , x^5, Convert points in form of polar coordinates.. etc)
• This enables to have linear boundary between classes
• Kernel does this transform efficiently ONLY once and not every
iteration
• RBF Kernel : Radial basis Function / Gaussian Kernel
• Considers similarity between points in the dataset
• Similarity is based on gamma and distance between points
SVM Parameters
• C: regularization parameter ( l2 penalty)
• Higher value less regularization / more misclassification is allowed
• Always positive
• kernel: ‘linear’ , ‘rbf’
• gamma : Kernel coefficient for ‘rbf’
• How much each support vector impacts
• Less value : more support vectors are selected
• slow
• High Value: less support vectors are selected
• Fast
• Needs fine tuning
K Nearest Neighbors
K Nearest Neighbors (KNN)
• Supervised or Un-supervised
• Works by creating Tree using data
• Tree gives closest points / neighbors
• K-Neighbors are used for doing predictions for any
new data
• K number is given by user
• In classification majority class of K neighbors is
taken
• In regression Mean of K neighbors is taken
Star : denotes new points
Class 1 : red color
• Used for classification (binary and multi-class) & Class 0 : blue color
Regression
New points class is predicted using majority class
• Used for Un-supervised learning to understand the in neighbors
distribution of data and distance between
neighbors
Regression Task
Regression Task
• Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• LASSO
• RIDGE
Extra
• Solvers in Logistic regression
• https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-definitions
• https://scikit-learn.org/stable/modules/linear_model.html