0% found this document useful (0 votes)

8 views199 pages

Unit-2 ML

The document covers various machine learning concepts, focusing on linear regression, maximum likelihood estimation, and different regression techniques such as robust linear regression, ridge regression, and Bayesian linear regression. It also discusses classification methods, including discriminant functions and support vector machines, along with their applications and assumptions. Additionally, it introduces kernel functions and their role in transforming data for better classification outcomes.

Uploaded by

adityac7724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views199 pages

Unit-2 ML

Uploaded by

adityac7724

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 199

21CSC305P

MACHINE LEARNING
UNIT 2
Maximum likelihood estimation- Least squares, Robust linear
expression, ridge regression, Bayesian linear regression, Linear models
for classification: Discriminant function, Probabilistic generative
models, Probabilistic discriminative models, Laplacian approximation,
Bayesian logistic regression, Kernel functions, using kernels in GLM,
Kernel trick, SVMs.
Introduction to Linear
Regression
Linear Regression
In Machine Learning,
 Linear Regression is a supervised machine learning algorithm.
 It tries to find out the best linear relationship that describes the data you have.
 It assumes that there exists a linear relationship between a dependent variable and independent variable(s).
 The value of the dependent variable of a linear regression model is a continuous value i.e. real numbers.

Representing Linear Regression Model-

Linear regression model represents the linear relationship between a dependent variable and independent variable(s) via
a sloped straight line.

The sloped straight line representing the linear

relationship that fits the given data best is called as a

regression line.
Types of Linear Regression
Based on the number of independent variables, there are two types of linear regression-

1. Simple Linear Regression-

In simple linear regression, the dependent variable depends only on a single independent variable.
For simple linear regression, the form of the model is-
Y = β0 + β 1 X

Here,
 Y is a dependent variable.
 X is an independent variable.
 β0 and β1 are the regression coefficients.
 β0 is the intercept or the bias that fixes the offset to a line.
 β1 is the slope or weight that specifies the factor by which X has an impact on Y.
Types of Linear Regression

2. Multiple Linear Regression-

In multiple linear regression, the dependent variable depends on more than one independent variables.
For multiple linear regression, the form of the model is-
Y = β0 + β1X1 + β2X2 + β3X3 + …… + βnXn

Here,
 Y is a dependent variable.
 X1, X2, …., Xn are independent variables.
 β0, β1,…, βn are the regression coefficients.
 βj (1<=j<=n) is the slope or weight that specifies the factor by which X j has an impact on Y.
Assumptions of Linear Regression
Assumptions of Linear Regression
Assumptions of Linear Regression
Assumptions of Linear Regression
Assumptions of Linear Regression
Assumptions of Linear Regression
Maximum likelihood estimation- Least squares
What is Maximum Likelihood Estimation (MLE)?
Maximum Likelihood Estimation (MLE) is a method to estimate the parameters (like
weights w or θ) of a model such that the likelihood of observing the given data is
maximized.
In simple terms:
Find the parameters that make the data “most likely.”

Key idea:
We find parameters θ (e.g., weights w) such that the likelihood of data D given θ is
maximized:

For linear regression, this becomes maximizing the likelihood of observing the target
values y given the inputs X and weights w.
What is Least Squares:

It is a method to estimate parameters (often in linear regression) by minimizing the

sum of squared errors between predicted and actual values.
For linear models, Least Squares = MLE, when the errors are assumed to be normally
distributed.
Step 9 – Geometric interpretation &
convexity
geometric illustration of linear regression as an orthogonal projection
in 3D space.
Example : Perform Linear Regression using MLE on a small
dataset
Here is the visual plot showing:
•Blue dots: Actual data points from the sample.
•Green line: Linear regression line computed
using the Least Squares Method (which is
equivalent to Maximum Likelihood
Estimation under Gaussian noise).
Note: for
understanding
Robust Linear Expression
A robust linear regression is like regular linear regression, but it’s designed to handle
outliers better.

Why Robust Regression?

•Ordinary Least Squares (OLS) works well if errors are normally distributed.
•But outliers can heavily influence the regression line, pulling it away from the main data
pattern.
•Robust regression methods reduce the influence of outliers, giving a more “stable” fit.
The Problem with Ordinary Least Squares (OLS)
Motivation for Robust Regression
We want:
•A line that fits most of the data points well.
•Minimal influence from extreme points (outliers).
Goal: Reduce the penalty for large errors so they don’t dominate the solution.

Core idea about Robust Regression

Common Loss Functions in Regression
Loss functions measure how far predictions are from actual values.
In robust regression, we choose loss functions that penalize outliers less severely than squared error.
Approac
hes to
Robust
Regressi
on
Here’s the plot comparing OLS and Robust
Regression (Huber):
•Green Line (OLS): Gets pulled strongly toward
the outliers, so the fit for normal points worsens.
•Orange Line (Robust): Less sensitive to outliers,
so it stays close to the majority of the data.
This is why robust regression is preferred when you
expect extreme or erroneous points in your
dataset.
Example Robust linear
regression (Huber)
Ridge Regression
What is Ridge Regression?
Ridge regression, also known as L2 regularization, is a technique used in linear
regression to address the problem of multicollinearity among predictor variables.
Multicollinearity occurs when independent variables in a regression model are highly
correlated, which can lead to unreliable and unstable estimates of regression coefficients.

Ridge regression mitigates this issue by adding a regularization term to the ordinary least
squares (OLS) objective function, which penalizes large coefficients and thus reduces
their variance.
Why Do We Need Ridge Regression?
Why Do We Need Ridge Regression?
where is the ridge parameter and I is the identity matrix.
Ridge Regression,
Bayesian Linear Regression
Bayesian Linear Regression
Bayesian linear regression is a statistical method that combines prior knowledge with
observed data to estimate the parameters of a linear model. It uses probability
distributions to represent uncertainty in the parameters and predictions.

Bayesian Linear Regression is an extension of simple linear regression where:

•The parameters (coefficients) are treated as random variables.
•We use probability distributions to represent what we know about them before and after
seeing data.
•It produces predictions with uncertainty instead of a single “best guess.”

Bayesian Linear Regression learns a linear relationship just like normal regression but also
gives uncertainty and confidence by combining prior belief and observed data.
Why Bayesian Regression?
•In ordinary linear regression, we find one best line that fits the data.
•But in reality, we are often uncertain (because of noise, small data, or randomness).
•Bayesian regression says: instead of finding one line, let’s find a distribution of possible
lines (with probabilities).
This way, we quantify uncertainty.
Why do we need Bayesian Linear Regression?
The Bayesian Approach

a clear, testable assumption that connects data to parameters. everything else (likelihood,
posterior, predictions) builds on this.
This assumption leads to the formulation of a likelihood, which tells us how well parameters
explain the data.
It gives us a regularization effect and incorporates prior knowledge.
From the noise model, the likelihood can be expressed mathematically.
This serves as the expression that links parameters to the observed data.
Linear Model for Classification :
Discriminant Functions
1.What is Classification?
•Definition: Classification is the process of assigning an input data point into one of a
set of discrete categories (classes).
•Example:
• Email → {Spam, Not Spam}
• Medical test results → {Healthy, Diseased}
• Handwritten digit → {0,1,2,…,9}

The Goal:
•Given an input vector x (with D features), assign it to one of K classes.
•Classes are usually disjoint → each input belongs to only one class.

Input Space:
•The entire feature space is divided into regions → decision regions.
•Boundaries between these regions are called:
• Decision boundaries (lines/curves in 2D)
• Decision surfaces (planes/hyperplanes in higher dimensions)
Two classes
Discriminant functions
Two-Class Linear Discriminant
Function
Multi classes
Discriminant functions
Why Multi classes Discriminant
functions

Solution: Multi classes Discriminant functions

Three Approaches
 One-vs-Rest (OvR)
 One-vs-One (OvO)
 Direct Multiclass Discriminant Function
One-vs-Rest (OvR)
•Build K classifiers, one for each class vs. "all
the rest".
•Example: To classify into {C1, C2, C3}, train:
• C1 vs (C2+C3)
• C2 vs (C1+C3)
• C3 vs (C1+C2)
•Problem: Sometimes two classifiers "vote
yes", giving ambiguous green regions .

OvR: “Each class builds a wall around itself, but walls

can overlap.”
One-vs-One (OvO)
•Build classifiers for every pair of classes:
• (C1 vs C2), (C1 vs C3), (C2 vs C3), etc.
•A new point is classified by majority vote. (0,2,1
•Still can have ambiguous regions, but less than OvR. )

OvO: “Classes fight pairwise, and majority

wins.”
Example
Direct multiclass: “Everyone builds their own wall, and you pick the class with
the strongest wall.”
Summary:
• In OvR / OvO, sometimes we got ambiguous regions where two classifiers give
conflicting votes.
• In Direct Multiclass, each point is assigned only one maximum value → clear, convex
decision regions → no ambiguity.
Least Squares for Classification
Discriminant Functions
Least Squares Classification (boundary found by
regression idea)
Fisher’s Linear
Discriminant Functions
Fisher’s Linear Discriminant (best
separating line by maximizing
separation)
Perceptron
Algorithm
Perceptron Algorithm (boundary adjusted iteratively based on
errors).
Probabilistic Generative Models
Probabilistic Generative Models
Features are Continuous-Prediction is based on Gaussian
probability densities (Bayes rule).
Example: Probabilistic Generative Model (Naïve Bayes with Gaussian
Assumption)
We want to classify whether a student will pass or fail an exam based on study
hours.
Probabilistic Discriminative Models
Discriminative model
A discriminative model in machine learning is an algorithm
designed to directly learn the decision boundary between different
classes or categories within a dataset.
Probabilistic Discriminative model
Example Problem
(Binary Classification)
We want to predict whether a student passes (1) or fails (0) an exam based on
marks.
Laplace Approximation
What is Laplace Approximation?
Laplace Approximation is a method to approximate a complicated probability
distribution (usually a posterior in Bayesian inference) with a Gaussian distribution.
Bayesian Logistic Regression
The distribution of the decision boundary in Bayesian Logistic Regression is useful
because it captures uncertainty in predictions, prevents overconfidence, and
supports better risk-aware decisions.
Difference between Linear and Logistic
Linear Regression
Regression Logistic Regression

Used to solve regression problems Used to solve classification problems

The response variables are continuous in nature The response variable is categorical in nature

It helps estimate the dependent variable when there It helps to calculate the possibility of a particular
is a change in the independent variable event taking place

It is a straight line It is an S-curve (S = Sigmoid)

Kernel Functions
Kernel Functions –
Overview
1. RBF (Radial Basis Function) Kernels
2. Kernels for Comparing Documents
A Mercer kernel is a "valid similarity function" that
3. Mercer (Positive Definite) Kernels machine learning algorithms can use. It guarantees that
even if we map data into some hidden higher-
dimensional space, the math will still work properly.

The condition says: For a function

to be a Mercer kernel, the
kernel matrix it produces must
always be positive semi-
definite (PSD).
4 Linear Kernels
5. Matern Kernels
6 String Kernels
7 Pyramid Match Kernels

PMK Score
0 --- no overlap…set at different
< 10---- moderate similarity
>10 ----- High similarity
8 Kernels Derived from Probabilistic Generative Models

Simplified Formula,
= Fisher score of
Using Kernel in GLM’s Function
Example:

Normal Logistic Regression (GLM Kernel Logistic Regression

form)

So GLMs + Kernels = Non-linear decision

boundaries
This simple graph explains why we use kernels
in GLMs.
Kernel Trick
Kernel Trick
The kernel trick is the idea that:
Instead of mapping data into high-dimensional space (which is
costly), we directly compute the inner product in that space using a
kernel function.
Consider the Example
•Both methods give the same result.
•Kernel Trick avoids the hard work of computing explicit higher-dimensional vectors.
•Instead, we compute directly in the original space using a kernel function.
In simple words: Kernel Trick = Shortcut to high dimensions without ever going there!
Support Vector Machine(SVM)
What is Support Vector Machin (SVM)
A Support Vector Machine (SVM) is a machine learning algorithm used for classification and
regression. This finds the best line (or hyperplane) to separate data into groups, maximizing
the distance between the closest points (support vectors) of each group. It can handle complex
data using kernels to transform it into higher dimensions. In short, SVM helps classify data
effectively.

Types of Support Vector Machin (SVM)

Linear SVM: Linear SVM is used for linearly separable data, which means if a dataset can be
classified into two classes by using a single straight line, then such data is termed as linearly
separable data, and classifier is used called as Linear SVM classifier.
Non-linear SVM: Non-Linear SVM is used for non-linearly separable data, which means if a
dataset cannot be classified by using a straight line, then such data is termed as non-linear data
and classifier used is called as Non-linear SVM classifier. In such cases, we use advanced
techniques like kernel tricks to classify them
BASIC TERMS OF
SUPPORT VECTOR MACHINE
(SVM)
What is Hyper Plane
Hyperplanes are decision boundaries that help classify the data
points. Data points falling on either side of the hyperplane can
be attributed to different classes.
There can be multiple lines/decision boundaries to segregate the classes, but we need to
find out the best decision boundary that helps to classify the data points. This best
boundary is known as the hyperplane of SVM.

The dimension of the hyperplane depends upon the number

of features. If the number of input features is 2, then the
hyperplane is just a line.
If the number of input features is 3, then the hyperplane
becomes a two-dimensional plane.
It becomes difficult to imagine when the number of features
exceeds 3.
What is Support Vector

The data points or vectors that are

the closest to the hyperplane and
which affect the position of the
hyperplane are termed as Support
Vector. Since these vectors support
the hyperplane, hence called a
Support vector.

Support Vectors are simply the coordinates of

individual observation.
What is Margin

The width that the boundary could

be increased by before hitting a
data point.

A Support Vector Machine (SVM) performs classification

by finding the hyperplane that maximizes the margin
between the two classes.
Hard Margin & Soft Margin
When the data is linearly separable, and we don’t
want to have any misclassifications, we use SVM
with a Hard margin.

When a linear boundary is not feasible, or we

want to allow some misclassifications in the hope
of achieving better generality, we can opt for a
Soft margin for our classifier.

Sometimes, the data is linearly separable, but the margin is so small that the
model becomes prone to overfitting or being too sensitive to outliers. Also, in
this case, we can opt for a larger margin by using soft margin SVM in order
to help the model generalize better.
How does SVM Work?
At first approximation, SVM finds a separating line (or hyperplane) between
data of two classes. SVM is an algorithm that takes the data as an input and
outputs a line that separates those classes if possible.

Suppose we have a dataset as shown and we

need to classify the red rectangles from the blue
ellipses. So our task is to find an ideal line that
separates this dataset in two classes (say red
and blue).
How does SVM Work (Contd.)

Not a big task, right…?

How does SVM Work (Contd.)
Not a big task, right…?

But, as we notice there isn’t a unique line that

does the job. In fact, we have an infinite lines
that can separate these two classes. So how
does SVM find the ideal one???
Let’s take some probable candidates and
figure it out ourselves.
How does SVM Work (Contd.)
We have two candidates here, the green colored line
and the yellow colored line. Which line according to
you best separates the data?

The green line in the image above is quite close to

the red class. Though it classifies the current
datasets it is not a generalized line.

If we select the yellow line then it’s visually quite

intuitive in this case that the yellow line classifies better.
But, we need something concrete to fix our line.
How does SVM Work (Contd.)

According to the SVM algorithm we find the

points closest to the line from both the classes.
These points are called support vectors. Now,
we compute the distance between the line and
the support vectors. This distance is called the
margin. Our goal is to maximize the margin.
The hyperplane for which the margin is
maximum is the optimal hyperplane.
How does SVM Work (Contd.)

Thumb Rule to identify Right Hyperplane

• Select the hyper-plane which segregates the two classes
better.
• Maximizing the distances between nearest data point
(either class) and hyper-plane. This distance is called
Margin.
How does SVM Work (Contd.)

Identify the right hyper-plane (Scenario-1):

• Here, we have three hyper-planes (A, B, and C). Now, identify
the right hyper-plane to classify stars and circles.

• You need to remember a thumb rule to identify the right hyper-plane: “Select
the hyper-plane which segregates the two classes better”. In this scenario, hyper-
plane “B” has excellently performed this job
How does SVM Work (Contd.)

Identify the right hyper-plane

•
(Scenario-2)
Here, we have three hyper-planes (A, B, and C) and all are segregating
the classes well. Now, How can we identify the right hyper-plane?

• You need to remember a thumb rule to identify the right hyper-plane:

Here, maximizing the distances between nearest data point (either class)
and hyper-plane will help us to decide the right hyper-plane. This distance
is called as Margin
How does SVM Work (Contd.)

you can see that the margin for hyper-plane C is

high as compared to both A and B. Hence, we
name the right hyper-plane as C. Another
lightning reason for selecting the hyper-plane with
higher margin is robustness. If we select a hyper-
plane having low margin then there is high
chance of miss-classification.
Identify the right hyper-plane
How does SVM Work (Contd.)

(Scenario-3):
• Hint: Use the rules as discussed in previous section to identify
the right hyper-plane.

• Some of you may have selected the hyper-plane B as it has

higher margin compared to A. But, here is the catch, SVM
selects the hyper-plane which classifies the classes
accurately prior to maximizing margin. Here, hyper-plane B has a
classification error and A has classified all correctly. Therefore,
the right hyper-plane is A.
How does SVM Work (Contd.)

Can we classify two classes (Scenario-4)?

• Below, I am unable to segregate the two classes using a

straight line, as one of the stars lies in the territory of
other(circle) class as an outlier
How does SVM Work (Contd.)
Find the hyper-plane to segregate to classes (Scenario-5):

• In the scenario below, we can’t have linear hyper-plane

between the two classes, so how does SVM classify
these two classes? Till now, we have only looked at the
linear hyper-plane.
How does SVM Work (Contd.)
Let’s consider a bit complex dataset, which
is not linearly separable.

This data is clearly not linearly separable. We

cannot draw a straight line that can classify this
data. But, this data can be converted to linearly
separable data in higher dimension. Lets add one
more dimension and call it z-axis. Let the co-
ordinates on z-axis be governed by the constraint,
z = x²+y²

Basically z co-ordinate is the square of distance of the

point from origin. Let’s plot the data on z-axis.
How does SVM Work (Contd.)
Now the data is clearly linearly separable. Let the purple line separating the data in higher dimension be
z=k, where k is a constant. Since, z=x²+y² we get x² + y² = k; which is an equation of a circle. So, we can
project this linear separator in higher dimension back in original dimensions using this transformation.

Remember that…. This feature is not added manually.

This is done by Kernel trick.
Kernel Method
Kernel methods represent the techniques that are used to deal with linearly inseparable data or non-linear
data set shown figure below. The idea is to create nonlinear combinations of the original features to
project them onto a higher-dimensional space via a mapping function, where the data becomes linearly
separable. In the diagram given below, the two-dimensional dataset (X1, X2) is projected into a new three-
dimensional feature space (Z1, Z2, Z3) where the classes become separable.
Hinge Loss Function
Hinge Loss is a specific type of loss function primarily used for classification tasks, especially in Support Vector
Machines (SVMs). It measures how well a model’s predictions align with the actual labels and encourages
predictions that are not only correct but confidently separated by a margin.
Hinge loss penalizes predictions that are:
1.Incorrectly classified.
2.Correctly classified but too close to the decision boundary (within a “margin”).
It is designed to create a “margin” around the decision boundary to improve the robustness of the classifier.
Formula
The hinge loss for a single data point is given by:

y- the actual class (-1 or 1)

f(x) – the output of the classifier for the datapoint
Hinge Loss Function

Case 1 : Correct and Confident Classification and y.f(x) ≥ 1

(blue)
Case 2 : Correct and not confident Classification and y.f(x) <
1 (faded blue)
Case 3: Incorrect Classification (Red)
Advantages of SVM
• Training of the model is relatively easy
• The model scales relatively well to high dimensional data
• SVM is a useful alternative to neural networks
• Trade-off amongst classifier complexity and error can be
controlled explicitly
• It is useful for both Linearly Separable and Non-linearly
Separable data
• Assured Optimality: The solution is guaranteed to be the
global minimum due to the nature of Convex Optimization
Disadvantages of SVM
• Picking right kernel and parameters can be computationally
intensive
• In Natural Language Processing (NLP), a structured
representation of text yields better performance. However,
SVMs cannot accommodate such structures (word
embedding)
Applications of SVM
Geostatistics: It is a branch of statistics concentrating on spatial or spatiotemporal datasets. It was originally created to
predict the probability distributions of ore grading at mining operations. Now it is applied in diverse disciplines
including petroleum geology, hydrogeology, hydrology, meteorology, oceanography, geochemistry, geometallurgy,
geography, forestry, environmental control, landscape ecology, soil science, and agriculture (specifically in precision
farming).
Inverse distance weighting: Type of deterministic method for multivariate interpolation with a known scattered set of
points. The values assigned to unknown points are calculated with a weighted average of the values existing at the
known points.
3D Reconstruction: Process of capturing the shape and appearance of real objects.
Bioinformatics: An interdisciplinary field that involves molecular biology, genetics, computer science, mathematics and
statistics. Software tools and methods are developed to understand biological data better.
Chemoinformatics: Application of computational and informational techniques over the field of chemistry to solve a
wide range of problems.
Information Extraction: Acronym as IE, It is a method of automated extraction or retrieval of structured information
from an unstructured and semi-structured text documents, databases and websites.
Handwriting Recognition: Acronym as HWR, It is the ability of a computer system to receive and interpret the
handwritten input comprehensibly from different sources such as a letter paper documents, photographs, touch-
screens and other devices.

Unit-2 Machine Learning
No ratings yet
Unit-2 Machine Learning
148 pages
CSE545 sp23 (7) Regressions To Transformers 3-29
No ratings yet
CSE545 sp23 (7) Regressions To Transformers 3-29
188 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
Unit-2 Supervised Machine Learning
No ratings yet
Unit-2 Supervised Machine Learning
132 pages
RRB - Unit 2 Regresion
No ratings yet
RRB - Unit 2 Regresion
53 pages
Linear Reg, Logistic Reg and SVM
No ratings yet
Linear Reg, Logistic Reg and SVM
40 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
No ratings yet
Bias and Variance Tradeoff:: High Bias Underfitting Low Training & Testing
12 pages
Updated Module2 - OTML Updated
No ratings yet
Updated Module2 - OTML Updated
83 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Complete
No ratings yet
Complete
12 pages
UNIT 2 Machine Learning BCAI601BCDS062
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062
244 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Unit - Iii Supervisied Learning - Notes
No ratings yet
Unit - Iii Supervisied Learning - Notes
42 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Unit-2: Machine Learning Techniques (KCS-055) Module-2
No ratings yet
Unit-2: Machine Learning Techniques (KCS-055) Module-2
199 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
Unit 2
No ratings yet
Unit 2
133 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
FAM Unit6
No ratings yet
FAM Unit6
32 pages
Machine Learning for Data Analysts
No ratings yet
Machine Learning for Data Analysts
201 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
ML Points
No ratings yet
ML Points
13 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
Predictive Analytics
No ratings yet
Predictive Analytics
46 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Supervised Learning Essentials
No ratings yet
Supervised Learning Essentials
30 pages
Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Cp4252 ML Unit-II
No ratings yet
Cp4252 ML Unit-II
44 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Regression
No ratings yet
Regression
45 pages
Linear vs Logistic Regression Guide
No ratings yet
Linear vs Logistic Regression Guide
81 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Rubin - Multiple Imputation After 18+ Years
No ratings yet
Rubin - Multiple Imputation After 18+ Years
17 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
STAT 252 2025 Winter Common Syllabus 1
No ratings yet
STAT 252 2025 Winter Common Syllabus 1
7 pages
Rock Drillability & Strength Analysis
No ratings yet
Rock Drillability & Strength Analysis
10 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
23 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
SNM 1 - (II) T Distribution
No ratings yet
SNM 1 - (II) T Distribution
21 pages
Linear Regression-Part 2
No ratings yet
Linear Regression-Part 2
26 pages
ML Lab Manual TE 2021-22
No ratings yet
ML Lab Manual TE 2021-22
43 pages
V25 C12 240tips
No ratings yet
V25 C12 240tips
14 pages
Sample Final Paper For LBOLYTC
No ratings yet
Sample Final Paper For LBOLYTC
39 pages
Poisson
No ratings yet
Poisson
54 pages
Project Work
No ratings yet
Project Work
34 pages
O-Level Statistics (4040) - Quiz Level 2
No ratings yet
O-Level Statistics (4040) - Quiz Level 2
21 pages
Sta301 Solved Mcqs Final Term by Junaid
No ratings yet
Sta301 Solved Mcqs Final Term by Junaid
55 pages
Syllabus Statistical Officer 070513
No ratings yet
Syllabus Statistical Officer 070513
2 pages
Pretest ch10
No ratings yet
Pretest ch10
7 pages
Slovins Formula
No ratings yet
Slovins Formula
20 pages
HSSRPTR - Plus One Maths CH 15 Statistics Questions
No ratings yet
HSSRPTR - Plus One Maths CH 15 Statistics Questions
3 pages
Digital Image Processing (DIP) of Remotely Sensed Data CE 712
No ratings yet
Digital Image Processing (DIP) of Remotely Sensed Data CE 712
17 pages
Hypothesis Testing Z-Test and T-Test
No ratings yet
Hypothesis Testing Z-Test and T-Test
13 pages
WK 6.1 - Risk and Return of Portfolios
No ratings yet
WK 6.1 - Risk and Return of Portfolios
15 pages
Hendrickson Assignment4
No ratings yet
Hendrickson Assignment4
11 pages
Unit 4
No ratings yet
Unit 4
8 pages
Automated Variable Selection in Regression
No ratings yet
Automated Variable Selection in Regression
5 pages
Endogeneity
No ratings yet
Endogeneity
10 pages
Towards Data Science All About Feature Scaling
No ratings yet
Towards Data Science All About Feature Scaling
16 pages
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
No ratings yet
Catena: Khanh Pham, Dongku Kim, Sangyeong Park, Hangseok Choi T
11 pages
Easyanova
No ratings yet
Easyanova
25 pages
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
No ratings yet
Model Sum of Squares DF Mean Square F Sig. 1 Regression .471 4 .118 1.576 .196 Residual 3.590 48 .075 Total 4.061 52 A. Predictors: (Constant), LC, EXT, DEBT, TANG B. Dependent Variable: DPR
3 pages
Multiple Regression: Problem Set 7
No ratings yet
Multiple Regression: Problem Set 7
3 pages
Module 1 Final Wrap Up
No ratings yet
Module 1 Final Wrap Up
4 pages
Forecasting Methods: Delphi & Exponential Smoothing
No ratings yet
Forecasting Methods: Delphi & Exponential Smoothing
4 pages
Semiparametric Theory and Missing Data 1st Edition Anastasios Tsiatis Download
100% (3)
Semiparametric Theory and Missing Data 1st Edition Anastasios Tsiatis Download
81 pages

Unit-2 ML

Uploaded by

Unit-2 ML

Uploaded by

21CSC305P

Representing Linear Regression Model-

The sloped straight line representing the linear

relationship that fits the given data best is called as a

1. Simple Linear Regression-

2. Multiple Linear Regression-

It is a method to estimate parameters (often in linear regression) by minimizing the

Why Robust Regression?

Core idea about Robust Regression

Bayesian Linear Regression is an extension of simple linear regression where:

Solution: Multi classes Discriminant functions

OvR: “Each class builds a wall around itself, but walls

OvO: “Classes fight pairwise, and majority

Used to solve regression problems Used to solve classification problems

It is a straight line It is an S-curve (S = Sigmoid)

The condition says: For a function

Normal Logistic Regression (GLM Kernel Logistic Regression

So GLMs + Kernels = Non-linear decision

Types of Support Vector Machin (SVM)

The dimension of the hyperplane depends upon the number

The data points or vectors that are

Support Vectors are simply the coordinates of

The width that the boundary could

A Support Vector Machine (SVM) performs classification

When a linear boundary is not feasible, or we

Suppose we have a dataset as shown and we

Not a big task, right…?

But, as we notice there isn’t a unique line that

The green line in the image above is quite close to

If we select the yellow line then it’s visually quite

According to the SVM algorithm we find the

Thumb Rule to identify Right Hyperplane

Identify the right hyper-plane (Scenario-1):

Identify the right hyper-plane

• You need to remember a thumb rule to identify the right hyper-plane:

you can see that the margin for hyper-plane C is

• Some of you may have selected the hyper-plane B as it has

Can we classify two classes (Scenario-4)?

• Below, I am unable to segregate the two classes using a

• In the scenario below, we can’t have linear hyper-plane

This data is clearly not linearly separable. We

Basically z co-ordinate is the square of distance of the

Remember that…. This feature is not added manually.

y- the actual class (-1 or 1)

Case 1 : Correct and Confident Classification and y.f(x) ≥ 1

You might also like