0% found this document useful (0 votes)

15 views69 pages

CS L04 MachineLearning Basics 02

Uploaded by

shripurva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views69 pages

CS L04 MachineLearning Basics 02

Uploaded by

shripurva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

AIMLCZG567: AI & ML

Techniques for Cyber Security

BITS Pilani Jagdish Prasad
Pilani Campus WILP
BITS Pilani
Pilani Campus

Session : 04
Title : Basics for Machine Learning - II
Agenda
• Converting Categorical Data into Numerical Data
• Feature Encoding, Vectorization, Normalization
• Issues: Overfitting, Under fitting, Class Imbalance
• Evaluation Metrics: Precision, Recall, F1-score
• Overview of Machine learning algorithms
• Support Vector Machine (SVM)
• Bayesian Networks
• Decision Trees
• Random Forests
• Artificial Neural Networks (ANN)
• Hierarchical Algorithms
• Genetic Algorithms
• Similarity Algorithms
BITS Pilani, Pilani Campus
Categorical to Numerical
Conversion

BITS Pilani, Pilani Campus

Why Categorical to Numerical Conversion?
• Machines only understand numerical data.
• Text data ‘as-is’ can not be used as input to Machine Learning
algorithm.
• Process of converting text data into numbers is called Feature
Extraction (also called text vectorization).
• In NLP, Feature Extraction is an important step for a better
understanding of the context of what we are dealing with.

BITS Pilani, Pilani Campus

Conversion Techniques
• One Hot Encoding
• Bag of Word (BOW)
• N-grams
• Tf-Idf
• Word2Vec (Word Embedding)

BITS Pilani, Pilani Campus

One Hot Encoding
• One Hot Encoding converts the words of a document into a Vocabulary
dimension vector.
• Example:
• Documents (Single record): “We are learning Natural Language Processing”, “We are learning
Data Science”, and “Natural Language Processing comes under Data Science”.
• Corpus (Total words): We are learning Natural Language Processing, We are learning Data
Science, Natural Language Processing comes under Data Science
• Vocabulary (Unique words): We are learning Natural Language Processing Data Science
comes under

Document Corpus

Vocabulary

BITS Pilani, Pilani Campus

Example: One Hot Encoding
Advantage
• It is Intuitive
• Easy to implement
Disadvantage
• It creates Sparsity.
• Size of each document
after one hot encoding
may be different.
• Out of Vocabulary (OOV)
problem.
• No capturing of semantic
meaning.

Simplest but not very effective technique - Not much in use

BITS Pilani, Pilani Campus

Bag of Words
• Representation of text that describes the Frequency of words
within a document.
• Specially used in the Text Classification task.
• Can directly use Count Vectorizer class by Scikit-learn.

BITS Pilani, Pilani Campus

Bag of Words

• Advantage
• Simple and intuitive.
• Size of each document after conversion is same.
• Disadvantage
• Creates Sparsity (scattered and lacks denseness)
• Does not consider sentence ordering issues.

One of the most used text vectorization techniques.

BITS Pilani, Pilani Campus

TF-IDF: Term Frequency and Inverse Document
Frequency
• TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents.
• Term Frequency (TF):
• Number of times a word appears in a document is divided by the total number
of words in that document. 0 < Tf < 1

𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡𝑒𝑟𝑚 𝑖 𝑎𝑝𝑝𝑒𝑎𝑟𝑠 𝑖𝑛 𝑑𝑜𝑐𝑢𝑚𝑒𝑛𝑡 𝑗

𝑇𝐹𝑖𝑗 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑡𝑒𝑟𝑚𝑠 𝑖𝑛 𝑑𝑜𝑐𝑢𝑒𝑚𝑛𝑡 𝑗

• Inverse Document Frequency (IDF):

• Logarithm of the number of documents in the corpus divided by the number of
documents where the specific term appears.
• In Scikit-learn use log(N/ni) + 1 formula.

BITS Pilani, Pilani Campus

Example: TF-IDF: Term Frequency and Inverse
Document Frequency
• Document 1: It is going to rain today
• Document 2: Today I am not going outside
• Document 3: I am going to watch a season premiere

BITS Pilani, Pilani Campus

Document going to today i am it is rain

Doc 1 0 0.07 0.07 0 0 0.17 0.17 0.17
Doc 2 0 0 0.07 0.07 0.07 0 0 0
Doc 3 0 0.05 0 0.05 0.05 0 0 0

• Evident that words like ‘it’,’is’,’rain’ are important for doc 1 but not for
doc 2 and doc 3
• Means Doc 1 and 2&3 are different w.r.t talking about rain.
• Notice that Doc 1 and 2 talk about something ‘today’, and doc 2 and 3
discuss something about the writer because of the word ‘I’.
• Table helps find similarities and non-similarities between documents,
words etc much better than BOW.
BITS Pilani, Pilani Campus
TF-IDF: Term Frequency and Inverse Document
Frequency
Advantage
• Widely used technique for Information retrieval like a search
engine.
Disadvantage
• Sparsity
• Dimensionality increases with a large dataset, slowing down
algorithm.
• Does not capture Semantic meaning.

BITS Pilani, Pilani Campus

Word Embedding

• Representation of words for text analysis:

• In the form of a real-valued vector that encodes the meaning of the word
• Words that are closer in the vector space are similar in meaning.
• Example:
• Boy : Man
• Boy : Table
• First pair has more similar words to each other
• Easier for human to understand the associations between words
in a language.
• Helps machines understand this kind of relation automatically in
language.

BITS Pilani, Pilani Campus

Word Embedding Types

• Frequency-based – Count frequency of word

• Bag of Words (BOW)
• Tf-idf
• Glove (based on Matric Factorization)
• Prediction based
• Word2Vec

BITS Pilani, Pilani Campus

Word2Vec

• A Deep learning-based word embedding technique that converts

a given word into a vector as a collection of numbers.
• Why word2vec?
• captures semantic meaning like happiness and joy have the same
meaning.
• creates low dimension vector
• creates a Dense vector (non-zeros)
• Two approaches for using Word2Vec:
• Use a pre-trained model
• Self-Trained model

BITS Pilani, Pilani Campus

Feature Encoding, Vectorization
and Normalization

BITS Pilani, Pilani Campus

What is Feature Encoding?
• Process of transformation of categorical values of the relevant
features into numerical value is called feature encoding.
• Data frame analytics automatically performs feature encoding.
• Input data is pre-processed with the following encoding
techniques:
• One-Hot encoding: Assigns vectors to each category. The vector represent
whether the corresponding feature is present (1) or not (0).
• Target-Mean encoding: Replaces categorical values with the mean value of
the target variable.
• Frequency Encoding: Takes into account how many times a given
categorical value is present in relation with a feature.

BITS Pilani, Pilani Campus

Label/Ordinal Encoder

• Label Encoder and Ordinal

Encoder encode categories into
numerical values directly.
• Label Encoder is used for
nominal categorical variables
(categories without order i.e.
red, green, blue)
• Ordinal Encoder is used for
ordinal categorical variables
(categories with order i.e. small,
medium, large).

BITS Pilani, Pilani Campus

One Hot Encoding

• In One-Hot Encoding and Dummy Encoding, the categorical column is

split into multiple columns consisting of ones and zeros.
• Addresses the drawback to Label and Ordinal Encoding where columns
are now read in as categorical columns due to encoded data being
represented as multiple Boolean columns.

User City User Rome Madrid Istambul

1 Rome 1 1 0 0
2 Madrid 2 0 1 0
1 Madrid 1 0 1 0
3 Istambul 3 0 0 1
2 Istambul 2 0 0 1
1 Istambul 1 0 0 1
1 Rome 1 1 0 0

BITS Pilani, Pilani Campus

Count/Frequency Encoding

• Count/Frequency Encoding
encodes categorical variables
to the count of occurrences
and frequency of occurrences
respectively.
• Utilizes the frequency of the
categories as labels.
• In the cases where the
frequency is related with the
target variable, it helps the
model to understand and
assign the weight in direct and
inverse proportion, depending
on the nature of the data.

BITS Pilani, Pilani Campus

Binary/BaseN Encoding

• Binary Encoding encodes

categorical variables into
integers, then converts them to
binary code.
• Output is similar to One-Hot
Encoding, but lesser columns are
created.
• Addresses the drawback to One-
Hot Encoding where a cardinality
of n does not result in n number
of columns, but log2(n) columns.
• BaseN Encoding follows the same
idea but uses other base values
instead of 2, resulting in logN(n)
columns.

BITS Pilani, Pilani Campus

Target/Mean Encoding

• Target encoding is similar to label encoding, except labels are

correlated directly with the target.
• Each category in the feature label is decided with the mean
value of the target variable on a training data.
• Advantages of the Target encoding are that it does not affect
the volume of the data and helps in faster learning.
• Target Encoding or Mean Encoding is a popular encoding
approach.

BITS Pilani, Pilani Campus

Target Encoding

BITS Pilani, Pilani Campus

Feature Vector
• Feature vector is an ordered list of numerical properties of observed data set.
• Represents input features to a machine learning model that makes a
prediction.
• Humans can analyze qualitative data to make a decision.
• Example: we see the cloudy sky, feel the damp breeze, and decide to take an
umbrella when going outside.
• Machine learning models can only deal with quantitative data.
• Must always convert features of observed phenomena into numerical values and
feed them into a machine learning model in the same order.
• Must represent features in feature vectors.

BITS Pilani, Pilani Campus

Feature Scaling
• Feature scaling is a data pre-processing technique that transforms the values
of features or variables in a dataset to a similar scale.
• Done to ensure that all features contribute equally to the model and to
prevent features with larger values from dominating the model.
• Transforms the data to a more consistent scale, making it easier to build
accurate and effective machine learning models.
• Essential when working with datasets where the features have different
ranges, units of measurement, or orders of magnitude.
• Feature scaling techniques:
• Normalization (Min-Max scaling)
• Standardization

BITS Pilani, Pilani Campus

Feature Normalization

• Normalization: Values are shifted and rescaled so that they end

up ranging between 0 and 1 (also known as Min-Max scaling).
• Normalization adjusts the values of features in a dataset to a
common scale.
• Reduces the impact of different scales on the accuracy of
machine learning models.
• Formula for normalization:

BITS Pilani, Pilani Campus

Feature Standardization
• Standardization: Values are centered around the Mean with a unit
Standard Deviation.
• Under standardization, Mean of the attribute becomes zero, and
the resultant distribution has a unit Standard Deviation.
• Formula for standardization

BITS Pilani, Pilani Campus

Under / Over fit Models

BITS Pilani, Pilani Campus

Under Fitting
• A Machine Learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data, i.e., it only performs well on training
data but performs poorly on testing data.
• Underfitting reduces the accuracy of machine-learning model.
• Usually happens when there is less data to build an accurate model or try to
build a linear model with fewer non-linear data.
• Underfitting can be avoided by using more data and reducing the features
• Reasons for Under fitting:
– High bias and low variance.
– Size of the training dataset used is not enough.
– Model is too simple.
– Training data is not cleaned and also contains noise in it.

BITS Pilani, Pilani Campus

Over Fitting
• A Machine Learning model is said to be overfitted when the model does not
make accurate predictions on testing data.
• When a model gets trained with large data, it starts learning from the noise and
inaccurate data from the data set.
• During testing the model does not categorize the data correctly, because of too
many details and noise.
• Overfitting is caused by non-parametric and non-linear methods:
• these algorithms have more freedom in building the model based on the dataset
• they can build unrealistic models.
• Using a linear algorithm for linear data or using the parameters like the maximal
depth for decision trees avoids over fitting.
• Reasons for Under fitting:
– High variance and low bias.
– Model is too complex.
– Size of the training data is large.
BITS Pilani, Pilani Campus
Feature Extraction
• Feature Extraction aims to reduce the number of features in a
dataset by:
• Creating new features from the existing ones (and then discarding the
original features)
• Discarding some features altogether
• Reduced set of features should be able to summarize most of the
information contained in the original set of features.

BITS Pilani, Pilani Campus

Imbalanced Classes
• Examples of Imbalanced classes:
• Fraudulent transactions are significantly lower than normal healthy transactions i.e.
around 1-2 % of the total number of observations.
• Cyber incidents are lower than genuine incidents.
• Electricity theft transactions are much less compared to normal transactions.
• Advanced Analytics and Machine Learning algorithms try to identify patterns in such
transactions to indicate abnormal behaviour.
• Machine Learning algorithms tend to produce unsatisfactory classifiers when
faced with imbalanced datasets.
• Challenge is to improve identification of the rare minority class as opposed to
achieving higher overall accuracy.
• For an imbalanced data set, if the event to be predicted belongs to the minority
class and the event rate is less than 5%, it is referred to as a rare event.

BITS Pilani, Pilani Campus

Evaluation Metrics
• Evaluation metrics are quantitative measures used to assess the performance
and effectiveness of a Machine Learning model.
• Metrics indicate how well a model is performing and help in comparing different
models or algorithms.
• Evaluation metrics provide objective criteria to evaluate a Machine Learning
model for its:
• Predictive ability
• Generalization capability
• Overall quality
• Choice of evaluation metrics depends on the specific problem domain, the type
of data, and the desired outcome.

BITS Pilani, Pilani Campus

Evaluation Metrics: Terms Definitions
Term Definition
True Positives Predicted Value=Yes, Real Value=Yes
True Negatives Predicted Value=NO, Real Value=No
False Positives Predicted Value=Yes, Real Value=No
False Negatives Predicted Value=No, Real Value=Yes
Accuracy Ratio of total number of correct predictions to that were
correct.
Positive Predictive Ratio of positive cases that were correctly identified.
Value or Precision
Negative Predictive Ratio of negative cases that were correctly identified.
Value
Sensitivity or Recall Ratio of actual positive cases which are correctly
identified.
Specificity Ratio of actual negative cases which are correctly
identified.
BITS Pilani, Pilani Campus
Precision & Recall
Precision
• Defined as ratio of correctly predicted Attacks to all the samples predicted as
Attacks.
• Calculated as the number of true positive predictions divided by the sum of
number of true positive and false positive predictions.
Precision = TP / (TP + FP)

Recall
• Defined as ratio of all samples correctly classified as Attacks to all the samples
that are actually Attacks. It is also called a Detection Rate.
• Calculated as the number of true positive predictions divided by sum of number
of true positive and false negative predictions:
Recall = TP / (TP + FN)

BITS Pilani, Pilani Campus

F1-Score
• Defined as the harmonic mean of the Precision and Recall. It is a statistical
technique for examining the accuracy of a system by considering both precision
and recall of the system.
• F1 score ranges between 0 to 1 and calculated as follows:

• F1-Score tells how precise (number of instances correctly classified) and robust
(does not miss any significant number of instances) the classifier is.
• Harmonic Mean punishes extreme values more.
• Example: Assume a binary classification model with the following results:
• Precision: 0, Recall: 1
• If we take the arithmetic mean, we get 0.5. It indicates that the above result comes
from a classifier that ignores the input and predicts one of the classes as output.
• If we were to take HM, we would get 0 which is accurate as this model is useless for
all purposes.
BITS Pilani, Pilani Campus
F1-Score
• F1 score gives the same importance to both Recall and Precision.
• If we want to give more weight to one of them, F1 score can be calculated by
attaching a value to either Recall or Precision depending on how many times the
value is important.
• In the equation below, β is the weightage.

BITS Pilani, Pilani Campus

Feature Pipeline
• Machine Learning process combines a series of transformers on raw data,
transforming the dataset each step of the way until it is passed to the fit
method of a final estimator.
• If we don’t vectorize our documents in the same exact manner, we will end
up with wrong or, at the very least, unintelligible results.
• Pipeline objects enable us to integrate a series of transformers that combine
normalization, vectorization, and feature analysis into a single, well-defined
mechanism.
• Pipeline objects move data from a loader into feature extraction mechanisms
to finally an estimator object that implements our predictive models.
• Pipelines are Directed Acyclic Graphs (DAGs) that can be simple linear chains
of transformers to arbitrarily complex branching and joining paths.

• Ref: https://www.oreilly.com/library/view/applied-text-analysis/9781491963036/ch04.html

BITS Pilani, Pilani Campus

Feature Pipeline

BITS Pilani, Pilani Campus

Feature Pipeline
• Feature Pipeline chains together multiple estimators representing a fixed
sequence of steps into a single unit.
• All estimators in the pipeline, except the last one, must be transformers—that
is, implement the transform method, while the last estimator can be of any
type, including predictive estimators.
• Pipelines provide convenience - fit and transform can be called for single inputs
across multiple objects at once.
• Pipelines provides a single interface for grid search of multiple estimators at
once.
• Pipelines provide operationalization of text models by coupling a vectorization
methodology with a predictive model.
• Pipelines are constructed by describing a list of (key, value) pairs where
the key is a string that names the step and the value is the estimator object.

BITS Pilani, Pilani Campus

Enriching Feature Extraction with Feature Unions
• Pipelines do not have to be simple linear sequences of steps; in fact, they can
be arbitrarily complex through the implementation of feature unions.
• Pipeline fit and transform data in sequence through each transformer.
• Feature Union object combines several transformer objects into a new, single
transformer similar to the Pipeline object.
• Feature Union evaluate independently and the results are concatenated into a
composite vector.

BITS Pilani, Pilani Campus

Enriching Feature Extraction with Feature Unions
• Consider the example of an HTML parser transformer that uses BeautifulSoup or an XML
library to parse the HTML and return the body of each document.
• We then perform a feature engineering step, where entities and keyphrases are each
extracted from the documents and the results passed into the feature union.
• Using frequency encoding on the entities is more sensible since they are relatively small,
but TF–IDF makes more sense for the keyphrases.
• Feature Union then concatenates the two resulting vectors such that our decision space
ahead of the logistic regression separates word dimensions in the title from word
dimensions in the body.

BITS Pilani, Pilani Campus

ML Algorithms

BITS Pilani, Pilani Campus

Data Nomenclature
Attributes Output

x1 x2 x3 x4 x5 x6 Y
Mean
Per Capita Human Household
GDP (Trillion GDP Development Life Poverty Index Income Dev/
Country USD) (‘1000 USD) Index Expectancy (Gini as %age) (‘1000 USD) UnderDev
Canada 1.577 39.17 0.908 80.7 32.6 67.293 D
China 5.878 7.54 0.687 73 46.9 10.22 U
India 1.632 3.41 0.547 64.7 36.8 0.735 U
Russia 1.48 19.84 0.755 65.5 39.9 0.72 U
Singapore 0.223 56.69 0.866 80 42.5 67.1 D
USA 14.527 46.86 0.91 78.3 40.8 84.3 D
… … … … … … … …
Instances

[Ref: en.wikipedia.org]

BITS Pilani, Pilani Campus

Example Problem

• Given input x compute output y y

• Example: Compute price of house

slope
from area of house
• Size of house
w0
• No of bedrooms intercept
• Construction age x
• Locality
• Segment – affordable, premium, luxury y = w0 +w1.x Red line
y = w.x Violet line
• ….
y = f(x) General
or
y = h(x) Hypothesis

BITS Pilani, Pilani Campus

Example Problem
• Generalized linear regression model is a linear combination of the
input variables and parameters:
h(x) = wo + w1x1 + w2x2 + … + wnxn
• Key components of this model are:
• parameters w0, w1,…, wn (called Weights or parameters)
• input variable x1, x2, x3, …. xn (called Attributes or Features)
• The equation can be simplified in vector form:
h(x) = wox0 + w1x1 + w2x2 + … + wnxn where x0 = 1
if W = [w0 w1 w2 … wn]
X = [x0 x1 x2 … xn]
then h(x) = WT . X

This is an Example of Linear Regression algorithm

BITS Pilani, Pilani Campus

Linear Regression: Normal Regression
t.axis([0, 2, 0, 15])
t.show()

4-2. Linear Regression model predictions

BITS Pilani, Pilani Campus
Descent does: it measures the local gradient of the error function with regard
parameter vector θ, and it goes in the direction of descending gradient. Once
dient is zero, you have reached a minimum!

Gradient Descent Concretely, you start by filling θ with random values (this is called random in
tion), and then you improve it gradually, taking one baby step at a time, e
attempting to decrease the cost function (e.g., the MSE), until the algorithm c
to a minimum (see Figure 4-3).

• A generic optimization algorithm capable of

finding optimal solutions to a problem.
• Gradient Descent tweaks parameters iteratively
in order to minimize the error or cost function
[θ = θ – 𝜂]
• Measures the local gradient of the cost
function with regards to the parameter θ and
goes in the direction of descending gradient
• A gradient value of zero or minimum refersFigure
to 4-3. Gradient Descent
solution An important parameter in Gradient Descent is the size of the steps, determ
the learning rate hyperparameter. If the learning rate is too small, then the al
Figure 4-4. Learning rate too small

• Learning Rate determines size of the step Figure 4-4).

will have to go through many iterations to converge, which will take a long t
On the other hand, if the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
– If the learning rate is too small, the algorithm may take Figure 4-4. Learning rate too small
solution (see Figure 4-5).
On the other hand, if the learning rate is too high, you might jump across the valley
long to converge and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
solution (see Figure 4-5).
– If the learning rate is too high, the algorithm may skip
minimum

Figure 4-5. Learning rate too large

BITS Pilani, Pilani Campus

Finally, not all cost functions look like nice regular bowls. There may be holes, ridges,
plateaus, and all sorts of irregular terrains, making convergence to the minimum very
Gradient Descent: Example

BITS Pilani, Pilani Campus

Naïve Bayes Algorithm
• Naïve Bayes classifier is a supervised machine learning algorithm used for
classification tasks such as text classification.
• Belongs to generative learning algorithm family - it models the distribution of
inputs for a given class or category.
• Assumes that the features of the input data are conditionally independent given
the class.
• Naive Bayes classifiers are among the simplest Bayesian network models with
high accuracy levels.
• Naïve Bayes model is easy to build and particularly useful for very large data
sets.
• Naive Bayes can outperform highly sophisticated classification methods.

BITS Pilani, Pilani Campus

Naïve Bayes Algorithm
• Example:

P(Yes|Sunny) = P(Sunny|Yes) * P(Yes) / P(Sunny)

P(Sunny|Yes) = 3/9 = 0.33

P(Sunny) = 5/14 = 0.36
P(Yes) = 9/14 = 0.64

P(Yes|Sunny) = 0.33 * 0.64 / 0.36 = 0.59

BITS Pilani, Pilani Campus

Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for
classification.
• A tree-structured classifier, where
• Internal nodes represent the features of a dataset,
• Branches represent the decision rules
• Each leaf node represents the outcome.
• A Decision tree has two type of nodes: Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches
• Leaf nodes are the output of those decisions and do not contain any further
branches.
• The decisions or the test are performed on the basis of features of the given
dataset.
• A decision tree simply asks a question, and based on the answer (Yes/No), it
further split the tree into subtrees.

BITS Pilani, Pilani Campus

Decision Tree

• Decision Tree is a Supervised

learning technique that can be
used for both classification and
Regression problems, but mostly it
is preferred for solving
Classification problems.
• It is a tree-structured classifier,
where internal nodes represent
the features of a dataset, branches
represent the decision
rules and each leaf node
represents the outcome.
• In a Decision tree, there are two
nodes, which are the Decision
Node and Leaf Node.

BITS Pilani, Pilani Campus

Random Forests
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance
of the model.
• As the name suggests, "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset."
• Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts
the final output.
• The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.

BITS Pilani, Pilani Campus

Random Forests

• Random Forest is a popular

machine learning algorithm
that belongs to the supervised
learning technique. It can be
used for both Classification and
Regression problems in ML.
• It is based on the concept
of ensemble learning, which is
a process of combining multiple
classifiers to solve a complex
problem and to improve the
performance of the model.

BITS Pilani, Pilani Campus

K-Means Clustering
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabelled dataset into different clusters.
• K defines the number of pre-defined clusters that need to be created in the
process.
• Allows to cluster the data into different groups to discover the categories of
groups in the unlabelled dataset on its own without the need for any training.
• A centroid-based algorithm, where each cluster is associated with a centroid.
• Main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
• K-Means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
• Each cluster has datapoints with some commonalities, and it is away from other
clusters

BITS Pilani, Pilani Campus

K-Means Clustering

• K-Means Clustering is
an Unsupervised Learning
algorithm, which groups the
unlabelled dataset into
different clusters.
• K defines the number of pre-
defined clusters that need to be
created in the process.
• Allows to cluster the data into
different groups to discover the
categories of groups in the
unlabelled dataset on its own
without the need for any
training.

BITS Pilani, Pilani Campus

Support Vector Machine (SVM)
• SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification (primarily) as well as Regression problems.
• SVM algorithm creates the best line or decision boundary (hyperplane) that can
segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future.
• SVM chooses the extreme points/vectors called support vectors, to create the
hyperplane.
• Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.

BITS Pilani, Pilani Campus

Support Vector Machine (SVM)

• SVM is one of the most

popular Supervised Learning
algorithms, which is used for
Classification (primarily) as
well as Regression problems.
• SVM algorithm creates the
best line or decision
boundary (hyperplane) that
can segregate n-dimensional
space into classes so that we
can easily put the new data
point in the correct category
in the future.

BITS Pilani, Pilani Campus

Genetic Algorithm
• Genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature."
• Used to solve complex and long time taking optimization problems in machine
learning.
• Genetic Algorithms are used in real-world applications, for example, Designing
electronic circuits, code-breaking, image processing, and artificial creativity.
• Genetic algorithm works on the evolutionary generational cycle to generate
high-quality solutions.
• Uses different operations that either enhance or replace the population to give
an improved fit solution.
• Five phases to solve the complex optimization problems:
• Initialization
• Fitness Assignment
• Selection
• Reproduction
• Termination
BITS Pilani, Pilani Campus
Genetic Algorithm
• Genetic algorithm is an adaptive heuristic
search algorithm inspired by "Darwin's
theory of evolution in Nature."
• Used to solve complex and long time taking
optimization problems in machine learning.

Cross-over

BITS Pilani, Pilani Campus

Artificial Neural Network (ANN)
• "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
• ANN is usually a computational network based on biological neural networks
that construct the structure of the human brain.
• Like human brain, ANN has neurons that are linked to each other in various
layers of the networks known as nodes.
• An Artificial Neural Network attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand things
and make decisions in a human-like manner.

BITS Pilani, Pilani Campus

Artificial Neural Network
Human Brain Artificial Neural Network

Biological Neural Network Artificial Neural Network

Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
BITS Pilani, Pilani Campus
Artificial Neural Network
• There are around 1000 billion neurons in the human brain.
• Each neuron has an association point somewhere in the range of 1,000 and
100,000.
• In the human brain, data is stored in such a manner as to be distributed, and we
can extract more than one piece of this data when necessary from our memory
parallelly.
• We can say that the human brain is made up of incredibly amazing parallel
processors.
• We can understand the artificial neural network with an example:
• consider an example of a digital logic gate that takes an input and gives an output.
• "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On" in
output.
• If both the inputs are "Off," then we get "Off" in output.
• The outputs to inputs relationship keep changing because of the neurons in our brain, which
are "learning."

BITS Pilani, Pilani Campus

Artificial Neural Network
Input Layer:
• Accepts inputs in several different
formats provided by the
programmer.
Hidden Layer:
• Presents in-between input and
output layers.
• Performs all the calculations to find
hidden features and patterns.
Output Layer:
• The input goes through a series of
• Artificial Neural Network takes input and transformations using the hidden
computes the weighted sum of the inputs layer
and includes a bias. • Final result is conveyed by output
• This computation is represented in the form layer.
of a transfer function.

BITS Pilani, Pilani Campus

Other Algorithms
• Logarithmic Regression
• Hierarchical Networks

BITS Pilani, Pilani Campus

Thank You

BITS Pilani, Pilani Campus

Cyberpunk Thesis
No ratings yet
Cyberpunk Thesis
65 pages
CS L04 MachineLearning Basics 02
No ratings yet
CS L04 MachineLearning Basics 02
64 pages
5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
Word Vectors for NLP Students
No ratings yet
Word Vectors for NLP Students
34 pages
Ch4 Word Embeddings
No ratings yet
Ch4 Word Embeddings
21 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Srujitha 1
No ratings yet
Srujitha 1
91 pages
Text Vectorization
No ratings yet
Text Vectorization
10 pages
Presentation of Oral Exam 2222
No ratings yet
Presentation of Oral Exam 2222
49 pages
Feature Normalisation
No ratings yet
Feature Normalisation
9 pages
Text Vectorization
No ratings yet
Text Vectorization
18 pages
AIML ML Session 2 - Student Common Reference (With More Additional Reading Materials) v1
No ratings yet
AIML ML Session 2 - Student Common Reference (With More Additional Reading Materials) v1
71 pages
Lecture - 7 MSDS
No ratings yet
Lecture - 7 MSDS
32 pages
Natural Language Processing: Lecture # 7
No ratings yet
Natural Language Processing: Lecture # 7
36 pages
Merged Presentation Choladeck Choladeck-Compressed
No ratings yet
Merged Presentation Choladeck Choladeck-Compressed
239 pages
DeekshikaJadyada26 AP24LDS11
No ratings yet
DeekshikaJadyada26 AP24LDS11
7 pages
Feature Engineering in ML & NLP
No ratings yet
Feature Engineering in ML & NLP
85 pages
Vector Semantics - NLP
No ratings yet
Vector Semantics - NLP
118 pages
Word Embedding & Language Modelling
No ratings yet
Word Embedding & Language Modelling
111 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
73 pages
ML CS-2 CS3 Student Reference V1.0
No ratings yet
ML CS-2 CS3 Student Reference V1.0
88 pages
AIML Unit5
No ratings yet
AIML Unit5
36 pages
Unit IV
No ratings yet
Unit IV
58 pages
Lab 5
No ratings yet
Lab 5
27 pages
Vector Semantics 3
No ratings yet
Vector Semantics 3
5 pages
Introduction to IR Models
No ratings yet
Introduction to IR Models
22 pages
Text Mining - Vectorization
No ratings yet
Text Mining - Vectorization
24 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
Lect 04
No ratings yet
Lect 04
44 pages
Lesson 2 Feature Engineering On Text Data
No ratings yet
Lesson 2 Feature Engineering On Text Data
89 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
NLP Record 2
No ratings yet
NLP Record 2
18 pages
AI6122 Topic 3.2 - Ranking
No ratings yet
AI6122 Topic 3.2 - Ranking
27 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
66 pages
Module III
No ratings yet
Module III
42 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
Feature Engineering Guide
No ratings yet
Feature Engineering Guide
51 pages
Unit IV
No ratings yet
Unit IV
57 pages
Boolean and Vector Space Retrieval Models
No ratings yet
Boolean and Vector Space Retrieval Models
27 pages
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (2) : Inteligência Artificial E Cibersegurança (Inacs)
45 pages
Lec 6
No ratings yet
Lec 6
2 pages
Lect 06 Feature Engineering and Selection
No ratings yet
Lect 06 Feature Engineering and Selection
41 pages
Feature Engineering Guide
100% (2)
Feature Engineering Guide
44 pages
Term Weighting and Similarity Measures
No ratings yet
Term Weighting and Similarity Measures
35 pages
Unit No: 4 Basics of Feature Engineering (31707 24)
No ratings yet
Unit No: 4 Basics of Feature Engineering (31707 24)
98 pages
Text Processing & Term Weighting
100% (2)
Text Processing & Term Weighting
38 pages
3 termWeightingIR
No ratings yet
3 termWeightingIR
32 pages
Text Representation
No ratings yet
Text Representation
16 pages
Chapter-3 Termweighting
No ratings yet
Chapter-3 Termweighting
17 pages
Reference Material For NLP - 1
No ratings yet
Reference Material For NLP - 1
40 pages
L2-4 - Data
No ratings yet
L2-4 - Data
83 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
3 Termweighting
No ratings yet
3 Termweighting
34 pages
Unit-4 Part 3 Feature Engineering
No ratings yet
Unit-4 Part 3 Feature Engineering
29 pages
Vector Space Model: TF - IDF: Adapted From Lectures by
No ratings yet
Vector Space Model: TF - IDF: Adapted From Lectures by
37 pages
ML 3170724 Unit-4
No ratings yet
ML 3170724 Unit-4
97 pages
NLP Crecord Mid2
No ratings yet
NLP Crecord Mid2
36 pages
Allnlp
No ratings yet
Allnlp
15 pages
Clat 4
No ratings yet
Clat 4
36 pages
Ai Bot Trading For Beginners - Victor Abee
No ratings yet
Ai Bot Trading For Beginners - Victor Abee
75 pages
6CS4-02 Machine Learning
No ratings yet
6CS4-02 Machine Learning
2 pages
MIS System
No ratings yet
MIS System
10 pages
Analysis of Machine Learning and Deep Learning Techniques For Prediction of Psychiatric Disorders Using EEG Datasets
No ratings yet
Analysis of Machine Learning and Deep Learning Techniques For Prediction of Psychiatric Disorders Using EEG Datasets
9 pages
5 - Fraud Detection in Insurance Claim Using Machine Learning
No ratings yet
5 - Fraud Detection in Insurance Claim Using Machine Learning
69 pages
Finance Assignment Guidelines
No ratings yet
Finance Assignment Guidelines
12 pages
Rahman Document
No ratings yet
Rahman Document
46 pages
Gisw2019 Web Ethiopia
No ratings yet
Gisw2019 Web Ethiopia
8 pages
Udemy For Business Course List
No ratings yet
Udemy For Business Course List
150 pages
Latihan Bahasa Inggris 23
No ratings yet
Latihan Bahasa Inggris 23
6 pages
AI in India (IIM A - BCG) - Compressed
No ratings yet
AI in India (IIM A - BCG) - Compressed
28 pages
AI's Impact on Hospitality Industry
No ratings yet
AI's Impact on Hospitality Industry
26 pages
Peretz-Andersson Et - Al (2024) AI Implimentation in Manufacturing SMEs
No ratings yet
Peretz-Andersson Et - Al (2024) AI Implimentation in Manufacturing SMEs
23 pages
VR in Retail: A Case Study Analysis
No ratings yet
VR in Retail: A Case Study Analysis
6 pages
B1 Do You Really Know Who Frankenstein Is - Formato Planeador Actividad Extracurricular Inglés - V1 - 11032024
No ratings yet
B1 Do You Really Know Who Frankenstein Is - Formato Planeador Actividad Extracurricular Inglés - V1 - 11032024
3 pages
SC - M1 - Ktunotes - in
No ratings yet
SC - M1 - Ktunotes - in
190 pages
MobileNetV2 Code
No ratings yet
MobileNetV2 Code
3 pages
National Artificial Intelligence Research Resource Task Force Releases Final Report - OSTP - The White House
No ratings yet
National Artificial Intelligence Research Resource Task Force Releases Final Report - OSTP - The White House
2 pages
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
No ratings yet
CS7015 (Deep Learning) : Lecture 12: Object Detection: R-CNN, Fast R-CNN, Faster R-CNN, You Only Look Once (YOLO)
47 pages
DT Unit 2 Lecture Notes
No ratings yet
DT Unit 2 Lecture Notes
18 pages
How To Make Money With AI & Chat GPT (True PDF)
No ratings yet
How To Make Money With AI & Chat GPT (True PDF)
83 pages
Report Writing
100% (1)
Report Writing
9 pages
Intro To BD, Ma & Ai
No ratings yet
Intro To BD, Ma & Ai
28 pages
Group 9 Project Report
No ratings yet
Group 9 Project Report
51 pages
AI Assignments for Comp Eng Students
No ratings yet
AI Assignments for Comp Eng Students
1 page
Research 6
No ratings yet
Research 6
8 pages
Introduction To Artificial Intelligence Week 12
No ratings yet
Introduction To Artificial Intelligence Week 12
16 pages
Deep Learning Course Overview
No ratings yet
Deep Learning Course Overview
4 pages