ML Assignment 2 PDF

This document is Anubhav Monga's submission for Assignment-2 of his Machine Learning course. It provides detailed explanations of various machine learning techniques including: decision trees, k-means clustering, support vector machines, Naive Bayes classification, k-nearest neighbors, random forests, and linear regression. For each technique, it describes the primary purpose, basic methodology, and provides an example to illustrate how it works. The document aims to elaborate Anubhav's understanding of these fundamental machine learning algorithms.

Uploaded by

Anubhav Monga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

93 views9 pages

ML Assignment 2 PDF

Uploaded by

Anubhav Monga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

ASSIGNMENT-2

OF
Machine Learning

SUBMITTED TO SUBMITTED BY
Dr Varun Malik Anubhav Monga
1955991509
Btech It 7A

DEPARTMENT OF COMPUTER APPLICATIONS

CHITKARA UNIVERSITY, PUNJAB
Q1 Elaborate your understanding on various machine learning
techniques in detail including their primary purpose and an
example or a case.
Ans:
Various Machine Learning Techniques:
Decision Tree
Ø Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems. It is a tree-structured classifier, where internal nodes
represent the features of a dataset, branches represent the decision rules and
each leaf node represents the outcome.
Ø In a Decision tree, there are two nodes, which are the Decision Node and
Leaf Node. Decision nodes are used to make any decision and have multiple
branches, whereas Leaf nodes are the output of those decisions and do not
contain any further branches.
Ø It is a graphical representation for getting all the possible solutions to a
problem/decision based on given conditions.
Ø It is called a decision tree because, similar to a tree, it starts with the root
node, which expands on further branches and constructs a tree-like structure.
Ø In order to build a tree, we use the CART algorithm, which stands for
Classification and Regression Tree algorithm.
Ø A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.
Ø Example: Suppose there is a candidate who has a job offer and wants to
decide whether he should accept the offer or Not. So, to solve this problem,
the decision tree starts with the root node (Salary attribute by ASM). The root
node splits further into the next decision node (distance from the office) and
one leaf node based on the corresponding labels. The next decision node
further gets split into one decision node (Cab facility) and one leaf node.
Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer).
K means Clustering
Ø K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabeled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on.
Ø It is an iterative algorithm that divides the unlabeled dataset into k different
clusters in such a way that each dataset belongs only one group that has
similar properties.
Ø It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without
the need for any training.
Ø It is a centroid-based algorithm, where each cluster is associated with a
centroid. The main aim of this algorithm is to minimize the sum of distances
between the data point and their corresponding clusters.
Ø The algorithm takes the unlabeled dataset as input, divides the dataset into k-
number of clusters, and repeats the process until it does not find the best
clusters. The value of k should be predetermined in this algorithm.
Ø The k-means clustering algorithm mainly performs two tasks:
Ø Determines the best value for K center points or centroids by an iterative
process.
Ø Assigns each data point to its closest k-center. Those data points which are
near to the particular k-center, create a cluster.
Support Vector Machine Algorithm
Ø Support Vector Machine or SVM is one of the most popular Supervised
Learning algorithms, which is used for Classification as well as Regression
problems. However, primarily, it is used for Classification problems in Machine
Learning.
Ø The goal of the SVM algorithm is to create the best line or decision boundary
that can segregate n-dimensional space into classes so that we can easily put
the new data point in the correct category in the future. This best decision
boundary is called a hyperplane.
Ø SVM chooses the extreme points/vectors that help in creating the hyperplane.
These extreme cases are called as support vectors, and hence algorithm is
termed as Support Vector Machine.
Ø Example: SVM can be understood with the example that we have used in the
KNN classifier. Suppose we see a strange cat that also has some features of
dogs, so if we want a model that can accurately identify whether it is a cat or
dog, so such a model can be created by using the SVM algorithm. We will first
train our model with lots of images of cats and dogs so that it can learn about
different features of cats and dogs, and then we test it with this strange
creature. So as support vector creates a decision boundary between these
two data (cat and dog) and choose extreme cases (support vectors), it will see
the extreme case of cat and dog. On the basis of the support vectors, it will
classify it as a cat.

Naïve Bayes Classifier Algorithm

Ø Naïve Bayes algorithm is a supervised learning algorithm, which is based on
Bayes theorem and used for solving classification problems.
Ø It is mainly used in text classification that includes a high-dimensional training
dataset.
Ø Naïve Bayes Classifier is one of the simple and most effective Classification
algorithms which helps in building the fast machine learning models that can
make quick predictions.
Ø It is a probabilistic classifier, which means it predicts on the basis of the
probability of an object.
Ø Some popular examples of Naïve Bayes Algorithm are spam filtration,
Sentimental analysis, and classifying articles.
K-Nearest Neighbor(KNN) Algorithm for Machine Learning
Ø K-NN algorithm assumes the similarity between the new case/data and
available cases and put the new case into the category that is most similar to
the available categories.
Ø K-NN algorithm stores all the available data and classifies a new data point
based on the similarity. This means when new data appears then it can be
easily classified into a well suite category by using K- NN algorithm.
Ø K-NN algorithm can be used for Regression as well as for Classification but
mostly it is used for the Classification problems.
Ø K-NN is a non-parametric algorithm, which means it does not make any
assumption on underlying data.
Ø It is also called a lazy learner algorithm because it does not learn from the
training set immediately instead it stores the dataset and at the time of
classification, it performs an action on the dataset.
Ø KNN algorithm at the training phase just stores the dataset and when it gets
new data, then it classifies that data into a category that is much similar to the
new data.
Ø Example: Suppose, we have an image of a creature that looks similar to cat
and dog, but we want to know either it is a cat or dog. So for this identification,
we can use the KNN algorithm, as it works on a similarity measure. Our KNN
model will find the similar features of the new data set to the cats and dogs
images and based on the most similar features it will put it in either cat or dog
category.

Random Forest Algorithm

Ø Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble learning,
which is a process of combining multiple classifiers to solve a complex
problem and to improve the performance of the model.
Ø As the name suggests, "Random Forest is a classifier that contains a number
of decision trees on various subsets of the given dataset and takes the
average to improve the predictive accuracy of that dataset." Instead of relying
on one decision tree, the random forest takes the prediction from each tree
and based on the majority votes of predictions, and it predicts the final output.
Ø The greater number of trees in the forest leads to higher accuracy and
prevents the problem of overfitting.
Ø Example: Suppose there is a dataset that contains multiple fruit images. So,
this dataset is given to the Random forest classifier. The dataset is divided
into subsets and given to each decision tree. During the training phase, each
decision tree produces a prediction result, and when a new data point occurs,
then based on the majority of results, the Random Forest classifier predicts
the final decision.
Linear Regression in Machine Learning
Ø It is a statistical method that is used for predictive analysis.
Ø Linear regression makes predictions for continuous/real or numeric variables
such as sales, salary, age, product price, etc.
Ø Linear regression algorithm shows a linear relationship between a dependent
(y) and one or more independent (y) variables, hence called as linear
regression. Since linear regression shows the linear relationship, which
means it finds how the value of the dependent variable is changing according
to the value of the independent variable.
Ø The linear regression model provides a sloped straight line representing the
relationship between the variables.

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error
The values for x and y variables are training datasets for Linear Regression model
representation.

Logistic Regression in Machine Learning

Ø It is used for predicting the categorical dependent variable using a given set of
independent variables.
Ø Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value. It can be
either Yes or No, 0 or 1, true or False, etc. but instead of giving the exact
value as 0 and 1, it gives the probabilistic values which lie between 0 and 1.
Ø Logistic Regression is much similar to the Linear Regression except that how
they are used. Linear Regression is used for solving Regression problems,
whereas Logistic regression is used for solving the classification problems.
Ø In Logistic regression, instead of fitting a regression line, we fit an "S" shaped
logistic function, which predicts two maximum values (0 or 1).
Ø The curve from the logistic function indicates the likelihood of something such
as whether the cells are cancerous or not, a mouse is obese or not based on
its weight, etc.
Ø Logistic Regression is a significant machine learning algorithm because it has
the ability to provide probabilities and classify new data using continuous and
discrete datasets.
Ø Logistic Regression can be used to classify the observations using different
types of data and can easily determine the most effective variables used for
the classification.

Logistic Function (Sigmoid Function):

The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which cannot go
beyond this limit, so it forms a curve like the "S" form. The S-form curve is called the
Sigmoid function or the logistic function.
In logistic regression, we use the concept of the threshold value, which defines the
probability of either 0 or 1. Such as values above the threshold value tends to 1, and
a value below the threshold values tends to 0.
Q2 Explain the concept of over-fitting problem. How can this be
avoided?
Ans:
Overfitting
Ø Overfitting occurs when our machine learning model tries to cover all the data
points or more than the required data points present in the given dataset.
Because of this, the model starts caching noise and inaccurate values present
in the dataset, and all these factors reduce the efficiency and accuracy of the
model. The overfitted model has low bias and high variance.
Ø The chances of occurrence of overfitting increase as much we provide training
to our model. It means the more we train our model, the more chances of
occurring the overfitted model.
Ø Overfitting is the main problem that occurs in supervised learning.
Example: The concept of the overfitting can be understood by the below graph of the
linear regression output:

As we can see from the above graph, the model tries to cover all the data points
present in the scatter plot. It may look efficient, but in reality, it is not so. Because the
goal of the regression model to find the best fit line, but here we have not got any
best fit, so, it will generate the prediction errors.
How to avoid the Overfitting in Model
Both overfitting and underfitting cause the degraded performance of the machine
learning model. But the main cause is overfitting, so there are some ways by which
we can reduce the occurrence of overfitting in our model.
• Cross-Validation
• Training with more data
• Removing features
• Early stopping the training
• Regularization
• Ensembling

Machine Learning
100% (6)
Machine Learning
115 pages
Module 3 Supervised ML Algo
No ratings yet
Module 3 Supervised ML Algo
48 pages
Machine Learning Algorithms Laiki
No ratings yet
Machine Learning Algorithms Laiki
123 pages
Amlt Bca Unit-1
No ratings yet
Amlt Bca Unit-1
24 pages
CH 4
No ratings yet
CH 4
76 pages
Unit 3
No ratings yet
Unit 3
61 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Machine Learning - Iii
No ratings yet
Machine Learning - Iii
53 pages
Unit 3 - Supervise Learning Classification
No ratings yet
Unit 3 - Supervise Learning Classification
23 pages
ML Notes
No ratings yet
ML Notes
50 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Mooc Part 2
No ratings yet
Mooc Part 2
8 pages
ML4 ML Algorithms
No ratings yet
ML4 ML Algorithms
123 pages
Machine Learning: Supervised vs Unsupervised
100% (1)
Machine Learning: Supervised vs Unsupervised
47 pages
Machine Learning Types & Algorithms
No ratings yet
Machine Learning Types & Algorithms
29 pages
Data Science Unit 3
No ratings yet
Data Science Unit 3
33 pages
Machine Learning: Supervised Learning Basics
No ratings yet
Machine Learning: Supervised Learning Basics
46 pages
CH 04 Classification Techniques
No ratings yet
CH 04 Classification Techniques
89 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
ML Unit 2
No ratings yet
ML Unit 2
6 pages
FPA Unit 2
No ratings yet
FPA Unit 2
20 pages
Unit 3 Big Data
No ratings yet
Unit 3 Big Data
50 pages
Refer For KNNDecison Tree SVM
No ratings yet
Refer For KNNDecison Tree SVM
90 pages
Yunsu Han KNN K Means
No ratings yet
Yunsu Han KNN K Means
8 pages
Module 5
No ratings yet
Module 5
16 pages
Unit Ii
No ratings yet
Unit Ii
102 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
83 pages
Module 3
No ratings yet
Module 3
11 pages
Unit - II
No ratings yet
Unit - II
37 pages
Unit 4
No ratings yet
Unit 4
54 pages
UNIT 3 - Final
No ratings yet
UNIT 3 - Final
37 pages
UNIT1
No ratings yet
UNIT1
38 pages
Presentation Machine Learning 2025
No ratings yet
Presentation Machine Learning 2025
20 pages
Cse Vsem 503 B PR Unit 2 Notes
No ratings yet
Cse Vsem 503 B PR Unit 2 Notes
17 pages
Classification and Clustering Algorithm Notes
No ratings yet
Classification and Clustering Algorithm Notes
19 pages
Intro To Machine Learning
No ratings yet
Intro To Machine Learning
15 pages
Unit 1
No ratings yet
Unit 1
15 pages
Day 4 Content
No ratings yet
Day 4 Content
35 pages
Unit 3,4,5 ML (CS - AI)
No ratings yet
Unit 3,4,5 ML (CS - AI)
37 pages
Classification
No ratings yet
Classification
7 pages
Ai Unit 4
No ratings yet
Ai Unit 4
17 pages
ML Unit-Ii Notes
No ratings yet
ML Unit-Ii Notes
17 pages
Supervised Learning - SVM - DT
No ratings yet
Supervised Learning - SVM - DT
43 pages
Chapter5 - Machine Learning
No ratings yet
Chapter5 - Machine Learning
37 pages
3.popular Machine Learning Algorithm
No ratings yet
3.popular Machine Learning Algorithm
11 pages
Module 1 & 2
No ratings yet
Module 1 & 2
21 pages
Primer On Major Data Mining Algorithms
No ratings yet
Primer On Major Data Mining Algorithms
86 pages
Raghav Soni (20IOT6014) Algo - Assignment
No ratings yet
Raghav Soni (20IOT6014) Algo - Assignment
14 pages
Module Iii
No ratings yet
Module Iii
15 pages
Intro to Machine Learning Basics
No ratings yet
Intro to Machine Learning Basics
17 pages
U02Lecture08 Statistical Machine Learning
No ratings yet
U02Lecture08 Statistical Machine Learning
41 pages
Evolutional Study On KNN and K-Means Algorithms (SP)
No ratings yet
Evolutional Study On KNN and K-Means Algorithms (SP)
9 pages
AI Unit 4
No ratings yet
AI Unit 4
15 pages
Algorithms New
No ratings yet
Algorithms New
8 pages
ML for Breast Cancer Prediction
No ratings yet
ML for Breast Cancer Prediction
8 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
7 pages
Algorithms 1
No ratings yet
Algorithms 1
23 pages

ML Assignment 2 PDF

Uploaded by

ML Assignment 2 PDF

Uploaded by

ASSIGNMENT-2

DEPARTMENT OF COMPUTER APPLICATIONS

Naïve Bayes Classifier Algorithm

Random Forest Algorithm

Mathematically, we can represent a linear regression as:

Logistic Regression in Machine Learning

Logistic Function (Sigmoid Function):

You might also like