Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views34 pages

Basic Supervised ML Algorithms

Basic supervised multi level language and its algorithms. It's use and its various functions

Uploaded by

kidsfuntoostv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views34 pages

Basic Supervised ML Algorithms

Basic supervised multi level language and its algorithms. It's use and its various functions

Uploaded by

kidsfuntoostv
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34

Basic Supervised Machine

Learning Models
By Dr Shantanu Pathak
Tasks in Supervised Learning
• Classification
• Regression
Classification Task
Classification Task
• Predict a class / label using features/ columns

• Types of Classification in Supervised Learning


• Binary Classification
• Multi-Class Classification
Binary classification
• Only Two classes in output
• Positive class (Represented by 1)
• When test results are positive
• Ex In diabetes detection, when test results are positive, then person has diabetes
• Negative class (Represented by 0)
• When test results are negative
• So, in patient blood report, diabetes detected is positive class and no diabetes
detected is negative class

Ex. In an image Dog is present ?


• Dog present is positive class
• Dog absent is negative class
Multi Class Classification
• More than two classes in output

• Example:
• Predict Breathing problem using X-ray
• Normal
• Asthma
• Covid-19
Classification Models
• Basic Models
• Logistic Regression
• Naïve Bayes
• SVM
• K Nearest Neighbors
• Decision Tree
• Ensemble Learning
• Random Forest (Bagging)
• XG Boost (Boosting)
• Cat Boost (Boosting)
Logistic Regression
Logistic Regression ( logit classifier)
• Used for classification (binary and multi-class) ONLY
• Models linear relation between independent variables to predict a
binary or categorical output.
• Target is to have minimum loss, so it does find the right coefficients by
doing multiple iterations
• Output is in form of probability, between 0 to 1
• Output gives the probability that Y will be class 1
• t = b0 + b1X1 + b2X2 + …. bnXn
• Where b0 is intercept
• b1 to bn are coefficients just like linear regression
• Y = sigmoid(t) = 1 / ( 1 + e ^ (-t) )
• Sigmoid / logistic function converts the real valued t into value between 0 to 1
Logistic / Sigmoid Function
• S-Shaped curve
• Output(y) is always between 0 to 1
• Input(x) can be any value between
–inf to + inf
• Converts real valued input to binary
or in terms of probability

• By Qef (talk) - Created from scratch with gnuplot, Public Domain,


https://commons.wikimedia.org/w/index.php?curid=4310325
Logistic Regression (logit classifier)

Logistic Regression Parameters
• Solver = ‘saga’ , ‘liblinear’, ’lbfgs’
• According to Scikit Documentation: The SAGA solver is often the best choice.
• Penalty : l1 , l2, elasticnet
• The ‘liblinear’ solver supports both L1 and L2 regularization
• The Elastic-Net regularization is only supported by the ‘saga’ solver.
• 'saga' supports all penalties
• class_weight: balanced or in dictionary form
• l1_ratio : used when penalty is 'elasticnet'. It tells how much portion will be l1
penalty
• multi_class : Support for multi class classification problem.
• ‘multinominal’ : ‘saga’ solver supports this. It can model multiclass classification directly.
• ‘ovr’ : One vs rest strategy supported by ‘liblinear’ solver

Further Reading: https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-definitions


SAGA is variant of SAG ( Stochastic Average Gradient)
Naïve Bayes
Probability
• Joint probability is the probability of two events occurring
simultaneously.
• Marginal probability is the probability of an event irrespective of the
outcome of another variable.
• Conditional probability is the probability of one event occurring in the
presence of a second event.
Probability
• Joint Probability to Bayes Rule

• P(AB) = P(A) * P (B) .. A & B independent

• P(AB) = P(A) * P(B|A) = P(B)*P(A|B)


• When A & B are dependent
Naive Bayes
• Bayesian Model
▪ P(C | x) =

▪ P(C|x) -> Probability of given sample ‘x’ belonging to class ‘C’


▪ P(x) -> Probability of ‘x’ in complete sample space / training data
▪ P(C) -> Probability of ‘C’ in complete sample space / training data
▪ P(x|C) ->Probability of occurrence of ‘x’ when ‘C’ is given in training data
Naive Bayes
• Used for Binary Classification Task ONLY
• Assumption : Input consists of set of independent features / columns
• Bayesian Probabilistic Model is used
• Simple yet effective
• Highly scalable

• Application: Automatic medial case classification / diagnosis


Data : Example of Naive Bayes
Sr No Time(X1) Vehicle Type(X2) Accident(Target/
Y)
1 Early Morning Heavy Yes
2 Early Morning Light No
3 Early Morning Light Yes
4 Early Morning Heavy No
5 Early Morning Heavy Yes
6 Early Morning Heavy No
7 Evening Light Yes
8 Evening Light No
9 Evening Heavy No
10 Evening Heavy Yes
11 Day Time Heavy No
12 Day Time Heavy Yes
13 Day Time Light No
14 Night Light Yes
Probabilities : Example of Naive Bayes
• P(Accident) =0.5 P(No Accident) =0.5

• P(em)=6/14 P(day time)= 3/14


• P(ev)=4/14 P(night)=1/14

• P(light)=6/14 P(heavy)=8/14
Probabilities : Example of Naive Bayes
• P(daytime | Accident) = 1/7
• P(em | Accident) = 3/7
• P(ev | Accident) = 2/7
• P(n | Accident) =1/7

• P(Light | Accident) = 3/7


• P(Heavy | Accident) = 4/7
Probabilities : Example of Naive Bayes
• P(daytime | Accident = ‘No’) = ??
• P(em | Accident = ‘No’) = ??
• P(ev | Accident = ‘No’) = ??
• P(n | Accident = ‘No’) =??

• P(Light | Accident = ‘No’) = ??


• P(Heavy | Accident = ‘No’) = ??
Example of Naive Bayes
• What is probability of an accident given that its early
morning?
P(Accident|em)=P(em|Accident)P(Accident)/ P(em)
= (3/7 * 7/14 ) /(6/14) = 3/6
P(Accident|DT) = P(DT|Accident)P(Acc)/P(DT)
= (1/7 * 7/14 ) / (3/14) = 1/3
P(Accident | (em,light)) = P(Acc|em) * P(Acc|light)
= 3/6 * 3/6 = 0.25
P(C | (x1,x2,x3 ... xn)) = P(C|x1) * P(C|x2) * ... *P(C|xn)
Types of Naïve Bayes
• Gaussian Naive Bayes
• For numeric data
• For large data
• Multi Nominal Naïve Bayes
• For categorical features like Text data
• Bernoulli's Naïve Bayes
• For Binary Variables

• Application of Naive Bayes / Bayesian Formula


• Incremental Learning
• Probabilistic Reasoning
Support Vector Machine
Support Vector Machines(SVM)
• Used ONLY for Binary Classification or Regression
• It can handle Linear / Non-Linear boundary
• Goal: Find the decision boundary between two classes with maximum
margin
• Margin is based on
Samples close to the
boundary. These
samples are called as
“Support Vectors”
Support Vector Machines(SVM)
• Margin
• Minimum distance between proposed
boundary and closest point(s) from each
class
• Which boundary to choose?
• Single dataset may have multiple
possible boundaries between two classes

• Boundary with maximum margin is most


beneficial

• Such boundary avoids misclassification,


even if there is some deviation in new
data from original data
Support Vector Machines(SVM)
• Binary Classification:
• Points which are on either side of the boundary are given to respective class
• For new point class is predicted by using which side of the boundary it is

• Regression
• Using decision boundary linear equation is formed
• This is transformed into probability using logistic function
• Output of logistic function ( probability) is used for regression
SVM Kernel Trick
• Transform the data by adding non-linear features
• ( x^2 , cos(x) , x^5, Convert points in form of polar coordinates.. etc)
• This enables to have linear boundary between classes
• Kernel does this transform efficiently ONLY once and not every
iteration
• RBF Kernel : Radial basis Function / Gaussian Kernel
• Considers similarity between points in the dataset
• Similarity is based on gamma and distance between points
SVM Parameters
• C: regularization parameter ( l2 penalty)
• Higher value less regularization / more misclassification is allowed
• Always positive
• kernel: ‘linear’ , ‘rbf’
• gamma : Kernel coefficient for ‘rbf’
• How much each support vector impacts
• Less value : more support vectors are selected
• slow
• High Value: less support vectors are selected
• Fast
• Needs fine tuning
K Nearest Neighbors
K Nearest Neighbors (KNN)
• Supervised or Un-supervised
• Works by creating Tree using data
• Tree gives closest points / neighbors
• K-Neighbors are used for doing predictions for any
new data
• K number is given by user
• In classification majority class of K neighbors is
taken
• In regression Mean of K neighbors is taken
Star : denotes new points
Class 1 : red color
• Used for classification (binary and multi-class) & Class 0 : blue color
Regression
New points class is predicted using majority class
• Used for Un-supervised learning to understand the in neighbors
distribution of data and distance between
neighbors
Regression Task
Regression Task
• Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• LASSO
• RIDGE
Extra
• Solvers in Logistic regression
• https://stackoverflow.com/questions/38640109/logistic-regression-python-solvers-definitions
• https://scikit-learn.org/stable/modules/linear_model.html

You might also like