0% found this document useful (0 votes)

20 views64 pages

CS L04 MachineLearning Basics 02

Uploaded by

Anshul Dessai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views64 pages

CS L04 MachineLearning Basics 02

Uploaded by

Anshul Dessai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

AIMLCZG567: AI & ML

Techniques for Cyber Security

BITS Pilani Jagdish Prasad
Pilani Campus WILP
BITS Pilani
Pilani Campus

Session : 04
Title : Basics for Machine Learning - II
Agenda
• Feature Extraction
• Feature Encoding, Vectorization, Normalization
• Issues: Overfitting, Under fitting, Class Imbalance
• Evaluation Metrics: Precision, Recall, F1-score
• Overview of Machine learning algorithms
• Support Vector Machine (SVM)
• Bayesian Networks
• Decision Trees
• Random Forests
• Hierarchical Algorithms
• Genetic Algorithms
• Similarity Algorithms
• Artificial Neural Networks (ANN)
BITS Pilani, Pilani Campus
Feature Extraction

BITS Pilani, Pilani Campus

Feature Extraction
• Machines only understand numerical data.
• Thus text data ‘as-is’ can not be used as input to Machine
Learning algorithm.
• Process of converting text data into numbers is called Feature
Extraction (also called text vectorization).
• In NLP, Feature Extraction is an important step for a better
understanding of the context of what we are dealing with.

BITS Pilani, Pilani Campus

Feature Extraction
• Feature Extraction aims to reduce the number of features in a
dataset by creating new features from the existing ones (and
then discarding the original features).
• These new reduced set of features should then be able to
summarize most of the information contained in the original set
of features.
• A summarised version of the original features can be created
from a combination of the original set.

BITS Pilani, Pilani Campus

Feature Extraction Techniques
• One Hot Encoding
• Bag of Word (BOW)
• N-grams
• Tf-Idf
• Custom features
• Word2Vec (Word Embedding)

BITS Pilani, Pilani Campus

Commonly Used Terms

Term Description
Corpus (c) The total number of words present in the whole dataset is
known as Corpus.

Vocabulary (V) Total number of unique words available in the corpus.

Document (D) There are multiple records in a dataset so a single record or

review is referred to as a document.

Word (w) Words that are used in a document are known as Word.

BITS Pilani, Pilani Campus

One Hot Encoding
• One hot encoding means converting the words of a document into a V-
dimension vector.
• Example:
• We have documents “We are learning Natural Language Processing”, “We are learning Data
Science”, and “Natural Language Processing comes under Data Science”.
• Corpus: We are learning Natural Language Processing, We are learning Data Science, Natural
Language Processing comes under Data Science
• Vocabulary (Unique words): We are learning Natural Language Processing Data Science
comes under (V – 10)
• Simplest but not very effective technique - Not much in use

BITS Pilani, Pilani Campus

One Hot Encoding: Example
Advantage
• It is Intuitive
• Easy to implement
Disadvantage
• It creates Sparsity.
• Size of each document
after one hot encoding
may be different.
• Out of Vocabulary (OOV)
problem.
• No capturing of semantic
meaning.

BITS Pilani, Pilani Campus

Bag of Words
• Representation of text that describes the Frequency of words
within a document.
• Specially used in the Text Classification task.
• We can directly use Count Vectorizer class by Scikit-learn.
• One of the most used text vectorization techniques.

BITS Pilani, Pilani Campus

Bag of Words

Advantage
• Simple and intuitive.
• Size of each document after BOW same.
Disadvantage
• BOW also creates Sparsity.
• Does not consider sentence ordering issues.

BITS Pilani, Pilani Campus

TF-IDF: Term Frequency and Inverse
Document Frequency
• TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents.
• Term Frequency (TF):
• Number of times a word appears in a document is divided by the total number
of words in that document. 0 < Tf< 1

• Inverse Document Frequency (IDF):

• Logarithm of the number of documents in the corpus divided by the number of
documents where the specific term appears.
• In Scikit-learn use log(N/ni) + 1 formula.

BITS Pilani, Pilani Campus

TF-IDF: Term Frequency and Inverse
Document Frequency
Advantage
• Widely used technique
for Information retrieval
like a search engine.
Disadvantage
• Sparsity
• Dimensionality increases
with a large dataset,
slowing down algorithm.
• Does not capture
Semantic meaning.

BITS Pilani, Pilani Campus

Custom Features

• Creating new custom features using domain knowledge.

• Examples
• Number of a word in the document.
• Number of negative words in the document.
• Ratio of +ve review to -ve review.
• Word count.
• Character count.

BITS Pilani, Pilani Campus

Word2Vec

Word Embeddings
• Word embedding is a term used for the representation of words
for text analysis, typically in the form of a real-valued vector that
encodes the meaning of the word such that the words that are
closer in the vector space are expected to be similar in meaning.
• Example: Boy : Man v/s Boy : Table à which pair has more
similar words to each other?
• Easier for human to understand the associations between words
in a language.
• Word embeddings helps machines understand this kind of
relation automatically in language.

BITS Pilani, Pilani Campus

Word2Vec

Word Embedding Types:

• Frequency-based – Count frequency of word
• BOW
• Tf-idf
• Glove (based on Matric Factorization)
• Prediction based
• Word2Vec

BITS Pilani, Pilani Campus

Word2Vec

• Word2Vec is is a Deep learning-based technique.

• Word2Vec is a word embedding technique, that converts a given
word into a vector as a collection of numbers.
• Why word2vec?
• Word2vec capture semantic meaning like happiness and joy have the
same meaning.
• Word2vec create low dimension vector
• Word2vec creates a Dense vector (non-zeros)
• Two approaches to use Word2Vec:
• Use a pre-trained model
• Self-Trained model

BITS Pilani, Pilani Campus

Feature Encoding, Vectorization
and Normalization

BITS Pilani, Pilani Campus

What is Feature Encoding
• Process of transformation of categorical values of the relevant
features into numerical value is called feature encoding.
• Data frame analytics automatically performs feature encoding.
• The input data is pre-processed with the following encoding
techniques:
• One-Hot encoding: Assigns vectors to each category. The vector represent
whether the corresponding feature is present (1) or not (0).
• Target-Mean encoding: Replaces categorical values with the mean value of
the target variable.
• Frequency Encoding: Takes into account how many times a given
categorical value is present in relation with a feature.

BITS Pilani, Pilani Campus

Label/Ordinal Encoder

• Label Encoder and Ordinal

Encoder encode categories into
numerical values directly.
• Label Encoder is used for
nominal categorical variables
(categories without order i.e.
red, green, blue)
• Ordinal Encoder is used for
ordinal categorical variables
(categories with order i.e. small,
medium, large).

BITS Pilani, Pilani Campus

One Hot / Dummy Encoding

• In One-Hot Encoding and Dummy Encoding, the categorical column is

split into multiple columns consisting of ones and zeros.
• This addresses the drawback to Label and Ordinal Encoding where
columns are now read in as categorical columns due to encoded data
being represented as multiple Boolean columns.

BITS Pilani, Pilani Campus

Count/Frequency Encoding

• Count/Frequency Encoding
encodes categorical variables
to the count of occurrences
and frequency of occurrences
respectively.
• Utilizes the frequency of the
categories as labels.
• In the cases where the
frequency is related with the
target variable, it helps the
model to understand and
assign the weight in direct and
inverse proportion, depending
on the nature of the data.

BITS Pilani, Pilani Campus

Binary/BaseN Encoding

• Binary Encoding encodes

categorical variables into
integers, then converts them to
binary code.
• Output is similar to One-Hot
Encoding, but lesser columns are
created.
• Addresses the drawback to One-
Hot Encoding where a cardinality
of n does not result in n number
of columns, but log2(n) columns.
• BaseN Encoding follows the same
idea but uses other base values
instead of 2, resulting in logN(n)
columns.

BITS Pilani, Pilani Campus

Target/Mean Encoding

• Target encoding is similar to label encoding, except here labels

are correlated directly with the target.
• In Target encoding for each category in the feature label is
decided with the mean value of the target variable on a training
data.
• The advantages of the Target encoding are that it does not
affect the volume of the data and helps in faster learning.
• Target Encoding or Mean Encoding is one very popular encoding
approach.

BITS Pilani, Pilani Campus

Target Encoding

BITS Pilani, Pilani Campus

Feature Vector
• A feature vector is an ordered list of numerical properties of observed
phenomena. It represents input features to a machine learning model that
makes a prediction.
• Humans can analyze qualitative data to make a decision.
• Example: we see the cloudy sky, feel the damp breeze, and decide to take an
umbrella when going outside.
• However, machine learning models can only deal with quantitative data.
• We must always convert features of observed phenomena into numerical
values and feed them into a machine learning model in the same order.
• We must represent features in feature vectors.

BITS Pilani, Pilani Campus

Feature Scaling

• Feature scaling is a data pre-processing technique that involves

transforming the values of features or variables in a dataset to a similar
scale.
• This is done to ensure that all features contribute equally to the model
and to prevent features with larger values from dominating the model.
• Feature scaling is essential when working with datasets where the
features have different ranges, units of measurement, or orders of
magnitude.
• Common feature scaling techniques include standardization,
normalization, and min-max scaling.
• Feature scaling transforms the data to a more consistent scale, making
it easier to build accurate and effective machine learning models.

BITS Pilani, Pilani Campus

Feature Normalization

• Normalization is a feature scaling technique in which values are

shifted and rescaled so that they end up ranging between 0 and
1 (also known as Min-Max scaling).
• Normalization is done as part of data pre-processing to adjust
the values of features in a dataset to a common scale.
• Reduce the impact of different scales on the accuracy of
machine learning models.
• Formula for normalization:

BITS Pilani, Pilani Campus

Feature Standardization
• Standardization is a feature scaling technique where the values are
cantered around the Mean with a unit Standard Deviation.
• Under standardization, Mean of the attribute becomes zero, and
the resultant distribution has a unit Standard Deviation.
• Formula for standardization

BITS Pilani, Pilani Campus

Under / Over fit Models

BITS Pilani, Pilani Campus

Under fitting
• A Machine Learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data, i.e., it only performs well on training
data but performs poorly on testing data.
• Underfitting destroys the accuracy of our machine-learning model.
• Its means that model or the algorithm does not fit the data well enough.
• It usually happens when we have less data to build an accurate model and also
when we try to build a linear model with fewer non-linear data.
• Underfitting can be avoided by using more data and also reducing the features
• Reasons for Under fitting:
– High bias and low variance.
– Size of the training dataset used is not enough.
– Model is too simple.
– Training data is not cleaned and also contains noise in it.

BITS Pilani, Pilani Campus

Over fitting
• A Machine Learning model is said to be overfitted when the model does not
make accurate predictions on testing data.
• When a model gets trained with large data, it starts learning from the noise and
inaccurate data from the data set.
• During testing the model does not categorize the data correctly, because of too
many details and noise.
• Overfitting is caused by non-parametric and non-linear methods as these
machine learning algorithms have more freedom in building the model based on
the dataset and therefore they can really build unrealistic models.
• Using a linear algorithm for linear data or using the parameters like the maximal
depth for decision trees avoids over fitting.
• Reasons for Under fitting:
– High variance and low bias.
– Model is too complex.
– Size of the training data is large.
BITS Pilani, Pilani Campus
Challenges of Imbalanced Class
• Electricity theft (third largest form of theft) is the main challenges faced by the
utility industry today.
• Advanced Analytics and Machine Learning algorithms are used to identify
consumption patterns that indicate theft.
• Biggest challenge in this is the humongous data and its distribution.
• Fraudulent transactions are significantly lower than normal healthy transactions
i.e. around 1-2 % of the total number of observations.
• The ask is to improve identification of the rare minority class as opposed to
achieving higher overall accuracy.
• Machine Learning algorithms tend to produce unsatisfactory classifiers when
faced with imbalanced datasets.
• For an imbalanced data set, if the event to be predicted belongs to the minority
class and the event rate is less than 5%, it is referred to as a rare event.

BITS Pilani, Pilani Campus

Evaluation Metrics
• Evaluation metrics are quantitative measures used to assess the performance
and effectiveness of a Machine Learning model.
• Metrics indicate how well a model is performing and help in comparing different
models or algorithms.
• Evaluation metrics provide objective criteria to evaluate a Machine Learning
model for its:
• Predictive ability
• Generalization capability
• Overall quality
• Choice of evaluation metrics depends on the specific problem domain, the type
of data, and the desired outcome.

BITS Pilani, Pilani Campus

Evaluation Metrics: Terms Definitions
Term Definition
True Positives Predicted Value=Yes, Real Value=Yes
True Negatives Predicted Value=NO, Real Value=No
False Positives Predicted Value=Yes, Real Value=No
False Negatives Predicted Value=No, Real Value=Yes
Accuracy Ratio of total number of correct predictions to that were
correct.
Positive Predictive Ratio of positive cases that were correctly identified.
Value or Precision
Negative Predictive Ratio of negative cases that were correctly identified.
Value
Sensitivity or Recall Ratio of actual positive cases which are correctly
identified.
Specificity Ratio of actual negative cases which are correctly
identified.
BITS Pilani, Pilani Campus
Precision & Recall
Precision
• Precision is a measure of a model’s performance that tells how many of the
positive predictions made by the model are actually correct.
• It is calculated as the number of true positive predictions divided by the number
of true positive and false positive predictions.
Precision = TP / (TP + FP)

Recall
• Lower recall and higher precision give better accuracy but then it misses a large
number of instances.
• The more the F1 score better will be performance. It can be expressed
mathematically in this way:
Recall = TP / (TP + FN)

BITS Pilani, Pilani Campus

F1-Score
• F1-Score is the harmonic mean of precision and recall values for a classification
problem. The formula for F1-Score is as follows:

• Its range is 0 to 1.
• F1-Score tells how precise (correctly classifies how many instances) and robust
(does not miss any significant number of instances) the classifier is.
• Harmonic Mean punishes extreme values more.
• Example: Assume a binary classification model with the following results:
• Precision: 0, Recall: 1
• If we take the arithmetic mean, we get 0.5. It indicates that the above result comes
from a classifier that ignores the input and predicts one of the classes as output.
• If we were to take HM, we would get 0 which is accurate as this model is useless for
all purposes.

BITS Pilani, Pilani Campus

F1-Score
• F1 score gives the same importance to both Recall and Precision.
• If we want to give more weight to one of them, F1 score can be calculated by
attaching a value to either Recall or Precision depending on how many times the
value is important.
• In the equation below, β is the weightage.

BITS Pilani, Pilani Campus

ML Algorithms

BITS Pilani, Pilani Campus

Data Nomenclature
Attributes Output

x1 x2 x3 x4 x5 x6 Y
Mean
Per Capita Human Household
GDP (Trillion GDP Development Life Poverty Index Income Dev/
Country USD) (‘1000 USD) Index Expectancy (Gini as %age) (‘1000 USD) UnderDev
Canada 1.577 39.17 0.908 80.7 32.6 67.293 D
China 5.878 7.54 0.687 73 46.9 10.22 U
India 1.632 3.41 0.547 64.7 36.8 0.735 U
Russia 1.48 19.84 0.755 65.5 39.9 0.72 U
Singapore 0.223 56.69 0.866 80 42.5 67.1 D
USA 14.527 46.86 0.91 78.3 40.8 84.3 D
… … … … … … … …
Instances

[Ref: en.wikipedia.org]

BITS Pilani, Pilani Campus

Example Problem

• Given input x compute output y y

• Example: Compute price of house

slope
from area of house
• Size of house
w0
• No of bedrooms intercept
• Construction age x
• Locality
• Segment – affordable, premium, luxury y = w0 +w1.x Red line
y = w.x Violet line
• ….
y = f(x) General
or
y = h(x) Hypothesis

BITS Pilani, Pilani Campus

Example Problem
• Generalized linear regression model is a linear combination of the
input variables and parameters:
h(x) = wo + w1x1 + w2x2 + … + wnxn
• Key components of this model are:
• parameters w0, w1,…, wn (called Weights or parameters)
• input variable x1, x2, x3, …. xn (called Attributes or Features)
• The equation can be simplified in vector form:
h(x) = wox0 + w1x1 + w2x2 + … + wnxn where x0 = 1
if W = [w0 w1 w2 … wn]
X = [x0 x1 x2 … xn]
then h(x) = WT . X

This is an Example of Linear Regression algorithm

BITS Pilani, Pilani Campus

Linear Regression: Normal Regression
t.axis([0, 2, 0, 15])
t.show()

4-2. Linear Regression model predictions

BITS Pilani, Pilani Campus
Descent does: it measures the local gradient of the error function with regard
parameter vector θ, and it goes in the direction of descending gradient. Once
dient is zero, you have reached a minimum!

Gradient Descent Concretely, you start by filling θ with random values (this is called random in
tion), and then you improve it gradually, taking one baby step at a time, e
attempting to decrease the cost function (e.g., the MSE), until the algorithm c
to a minimum (see Figure 4-3).

• A generic optimization algorithm capable of

finding optimal solutions to a problem.
• Gradient Descent tweaks parameters iteratively
in order to minimize the error or cost function
[θ = θ – 𝜂]
• Measures the local gradient of the cost
function with regards to the parameter θ and
goes in the direction of descending gradient
• A gradient value of zero or minimum refersFigure
to 4-3. Gradient Descent
solution An important parameter in Gradient Descent is the size of the steps, determ
the learning rate hyperparameter. If the learning rate is too small, then the al
Figure 4-4. Learning rate too small

• Learning Rate determines size of the step Figure 4-4).

will have to go through many iterations to converge, which will take a long t
On the other hand, if the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
– If the learning rate is too small, the algorithm may take Figure 4-4. Learning rate too small
solution (see Figure 4-5).
On the other hand, if the learning rate is too high, you might jump across the valley
long to converge and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
solution (see Figure 4-5).
– If the learning rate is too high, the algorithm may skip
minimum

Figure 4-5. Learning rate too large

BITS Pilani, Pilani Campus

Finally, not all cost functions look like nice regular bowls. There may be holes, ridges,
plateaus, and all sorts of irregular terrains, making convergence to the minimum very
Gradient Descent: Example

BITS Pilani, Pilani Campus

Naïve Bayes Algorithm
• Naïve Bayes classifier is a popular supervised machine learning algorithm used
for classification tasks such as text classification.
• Belongs to the family of generative learning algorithms, which means that it
models the distribution of inputs for a given class or category.
• Based on the assumption that the features of the input data are conditionally
independent given the class, allowing the algorithm to make predictions quickly
and accurately.
• Naive Bayes classifiers are among the simplest Bayesian network models, yet
they can achieve high accuracy levels.
• An NB model is easy to build and particularly useful for very large data sets.
• Naive Bayes is known to outperform even highly sophisticated classification
methods.

BITS Pilani, Pilani Campus

Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.

BITS Pilani, Pilani Campus

Decision Tree

• Decision Tree is a Supervised

learning technique that can be
used for both classification and
Regression problems, but mostly it
is preferred for solving
Classification problems.
• It is a tree-structured classifier,
where internal nodes represent
the features of a dataset, branches
represent the decision
rules and each leaf node
represents the outcome.
• In a Decision tree, there are two
nodes, which are the Decision
Node and Leaf Node.

BITS Pilani, Pilani Campus

Random Forests
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance
of the model.
• As the name suggests, "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset."
• Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts
the final output.
• The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.

BITS Pilani, Pilani Campus

Random Forests

• Random Forest is a popular

machine learning algorithm
that belongs to the supervised
learning technique. It can be
used for both Classification and
Regression problems in ML.
• It is based on the concept
of ensemble learning, which is
a process of combining multiple
classifiers to solve a complex
problem and to improve the
performance of the model.

BITS Pilani, Pilani Campus

K-Means Clustering
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabelled dataset into different clusters.
• K defines the number of pre-defined clusters that need to be created in the
process.
• Allows to cluster the data into different groups to discover the categories of
groups in the unlabelled dataset on its own without the need for any training.
• A centroid-based algorithm, where each cluster is associated with a centroid.
• Main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
• K-Means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
• Each cluster has datapoints with some commonalities, and it is away from other
clusters

BITS Pilani, Pilani Campus

K-Means Clustering

• K-Means Clustering is
an Unsupervised Learning
algorithm, which groups the
unlabelled dataset into
different clusters.
• K defines the number of pre-
defined clusters that need to be
created in the process.
• Allows to cluster the data into
different groups to discover the
categories of groups in the
unlabelled dataset on its own
without the need for any
training.

BITS Pilani, Pilani Campus

Support Vector Machine (SVM)
• SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification (primarily) as well as Regression problems.
• SVM algorithm creates the best line or decision boundary (hyperplane) that can
segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future.
• SVM chooses the extreme points/vectors called support vectors, to create the
hyperplane.
• Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.

BITS Pilani, Pilani Campus

Support Vector Machine (SVM)

• SVM is one of the most

popular Supervised Learning
algorithms, which is used for
Classification (primarily) as
well as Regression problems.
• SVM algorithm creates the
best line or decision
boundary (hyperplane) that
can segregate n-dimensional
space into classes so that we
can easily put the new data
point in the correct category
in the future.

BITS Pilani, Pilani Campus

Genetic Algorithm
• Genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature."
• Used to solve complex and long time taking optimization problems in machine
learning.
• Genetic Algorithms are used in real-world applications, for example, Designing
electronic circuits, code-breaking, image processing, and artificial creativity.
• Genetic algorithm works on the evolutionary generational cycle to generate
high-quality solutions.
• Uses different operations that either enhance or replace the population to give
an improved fit solution.
• Five phases to solve the complex optimization problems:
• Initialization
• Fitness Assignment
• Selection
• Reproduction
• Termination
BITS Pilani, Pilani Campus
Genetic Algorithm
• Genetic algorithm is an adaptive heuristic
search algorithm inspired by "Darwin's
theory of evolution in Nature."
• Used to solve complex and long time taking
optimization problems in machine learning.

Cross-over

BITS Pilani, Pilani Campus

Artificial Neural Network (ANN)
• "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
• ANN is usually a computational network based on biological neural networks
that construct the structure of the human brain.
• Like human brain, ANN has neurons that are linked to each other in various
layers of the networks known as nodes.
• An Artificial Neural Network attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand things
and make decisions in a human-like manner.

BITS Pilani, Pilani Campus

Artificial Neural Network
Human Brain Artificial Neural Network

Biological Neural Network Artificial Neural Network

Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
BITS Pilani, Pilani Campus
Artificial Neural Network
• There are around 1000 billion neurons in the human brain.
• Each neuron has an association point somewhere in the range of 1,000 and
100,000.
• In the human brain, data is stored in such a manner as to be distributed, and we
can extract more than one piece of this data when necessary from our memory
parallelly.
• We can say that the human brain is made up of incredibly amazing parallel
processors.
• We can understand the artificial neural network with an example:
• consider an example of a digital logic gate that takes an input and gives an output.
• "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On" in
output.
• If both the inputs are "Off," then we get "Off" in output.
• The outputs to inputs relationship keep changing because of the neurons in our brain, which
are "learning."

BITS Pilani, Pilani Campus

Artificial Neural Network
Input Layer:
• Accepts inputs in several different
formats provided by the
programmer.
Hidden Layer:
• Presents in-between input and
output layers.
• Performs all the calculations to find
hidden features and patterns.
Output Layer:
• The input goes through a series of
• Artificial Neural Network takes input and transformations using the hidden
computes the weighted sum of the inputs layer
and includes a bias. • Final result is conveyed by output
• This computation is represented in the form layer.
of a transfer function.

BITS Pilani, Pilani Campus

Other Algorithms
• Logarithmic Regression
• Hierarchical Networks

BITS Pilani, Pilani Campus

Thank You

BITS Pilani, Pilani Campus

Unit 2
No ratings yet
Unit 2
48 pages
CS L04 MachineLearning Basics 02
No ratings yet
CS L04 MachineLearning Basics 02
69 pages
Unit Ii
No ratings yet
Unit Ii
20 pages
Lesson 2 Feature Engineering On Text Data
No ratings yet
Lesson 2 Feature Engineering On Text Data
89 pages
Unit - 2
No ratings yet
Unit - 2
58 pages
Merged Presentation Choladeck Choladeck-Compressed
No ratings yet
Merged Presentation Choladeck Choladeck-Compressed
239 pages
Srujitha 1
No ratings yet
Srujitha 1
91 pages
5.2 Feature Engineering
No ratings yet
5.2 Feature Engineering
57 pages
DAVIE Peterbilt
100% (2)
DAVIE Peterbilt
103 pages
Ch4 Word Embeddings
No ratings yet
Ch4 Word Embeddings
21 pages
Business Analytics
100% (5)
Business Analytics
46 pages
Word Vectors for NLP Students
No ratings yet
Word Vectors for NLP Students
34 pages
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
No ratings yet
Comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2/comparison of Design and Analysis of Tube Sheet Thickness by Using Uhx Code-2 PDF
13 pages
Introduction To Maximum Entropy Models: Adwait Ratnaparkhi Yahoo! Labs
No ratings yet
Introduction To Maximum Entropy Models: Adwait Ratnaparkhi Yahoo! Labs
46 pages
ML CS-2 CS3 Student Reference V1.0
No ratings yet
ML CS-2 CS3 Student Reference V1.0
88 pages
Unit IV
No ratings yet
Unit IV
57 pages
Word Embeddings 1
No ratings yet
Word Embeddings 1
42 pages
Astm A278 A278m
No ratings yet
Astm A278 A278m
4 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Cluster - ML CS-1 V1.0 - For Student Reference
No ratings yet
Cluster - ML CS-1 V1.0 - For Student Reference
59 pages
Unit IV
No ratings yet
Unit IV
58 pages
AIML ML Session 2 - Student Common Reference (With More Additional Reading Materials) v1
No ratings yet
AIML ML Session 2 - Student Common Reference (With More Additional Reading Materials) v1
71 pages
COMP5046: Natural Language Processing
No ratings yet
COMP5046: Natural Language Processing
71 pages
NLP Text Representation Guide
No ratings yet
NLP Text Representation Guide
131 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
73 pages
MLA TAB Lecture2
No ratings yet
MLA TAB Lecture2
84 pages
Lect 06 Feature Engineering and Selection
No ratings yet
Lect 06 Feature Engineering and Selection
41 pages
Word Embeddings
No ratings yet
Word Embeddings
59 pages
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
No ratings yet
AIML ML Session 4 - Student Common Reference (With More Additional Reading Materials) Part 2
45 pages
Wordembed
No ratings yet
Wordembed
31 pages
Text Vectorization
No ratings yet
Text Vectorization
18 pages
Lect 04
No ratings yet
Lect 04
44 pages
Design of Steel Structures: CIV 342 L:2 T:2 P:0 Credits:4 Btech Iii Year 2 Sem, Lpu Syllabus
No ratings yet
Design of Steel Structures: CIV 342 L:2 T:2 P:0 Credits:4 Btech Iii Year 2 Sem, Lpu Syllabus
17 pages
ML4D-L6 nlp2
No ratings yet
ML4D-L6 nlp2
58 pages
Text Vectorization
No ratings yet
Text Vectorization
10 pages
F2014L
No ratings yet
F2014L
4 pages
Aula 10
No ratings yet
Aula 10
49 pages
XCS224N Module1 Slides
No ratings yet
XCS224N Module1 Slides
72 pages
CS L03 MachineLearning Basics 01
No ratings yet
CS L03 MachineLearning Basics 01
66 pages
Ling571 Class14 Distr Thes
No ratings yet
Ling571 Class14 Distr Thes
122 pages
L11 Autoencoders
No ratings yet
L11 Autoencoders
36 pages
Rav4 Distribucion
100% (2)
Rav4 Distribucion
50 pages
Feature Engineering Guide
No ratings yet
Feature Engineering Guide
51 pages
21 Word2Vec 24 09 2024
No ratings yet
21 Word2Vec 24 09 2024
63 pages
Presentation of Oral Exam 2222
No ratings yet
Presentation of Oral Exam 2222
49 pages
Lab 5
No ratings yet
Lab 5
27 pages
Pinto - pm2 - Session 4 - Shared Slides
No ratings yet
Pinto - pm2 - Session 4 - Shared Slides
78 pages
Lecture - 7 MSDS
No ratings yet
Lecture - 7 MSDS
32 pages
IR Lecture 4b
No ratings yet
IR Lecture 4b
57 pages
3 WordMeaning
No ratings yet
3 WordMeaning
78 pages
Engineering Drawing PDF
No ratings yet
Engineering Drawing PDF
6 pages
Lec 6
No ratings yet
Lec 6
2 pages
Module 3 Indexing Part A
No ratings yet
Module 3 Indexing Part A
46 pages
05 - Feature Engineering (Text)
No ratings yet
05 - Feature Engineering (Text)
28 pages
Unit 2 TB
No ratings yet
Unit 2 TB
20 pages
Lifting Gear For Roller Guide: Note!
No ratings yet
Lifting Gear For Roller Guide: Note!
4 pages
DM Chapter 9 - Word Embedding
No ratings yet
DM Chapter 9 - Word Embedding
7 pages
Word Embedding & Language Modelling
No ratings yet
Word Embedding & Language Modelling
111 pages
L12 Autoencoders and More
No ratings yet
L12 Autoencoders and More
29 pages
A Deep-Learned Embedding Technique For Categorical Features Encoding
No ratings yet
A Deep-Learned Embedding Technique For Categorical Features Encoding
11 pages
TG - Momentum, Acceleration
No ratings yet
TG - Momentum, Acceleration
25 pages
Feature Engineering in ML & NLP
No ratings yet
Feature Engineering in ML & NLP
85 pages
NLP Q2 21SAL54 Scheme
No ratings yet
NLP Q2 21SAL54 Scheme
6 pages
CAPA Test 1 2014 Regular
No ratings yet
CAPA Test 1 2014 Regular
3 pages
Traditional Word Embedding
No ratings yet
Traditional Word Embedding
9 pages
Hydraulic Jack Chap 1
No ratings yet
Hydraulic Jack Chap 1
14 pages
Wordembedding
No ratings yet
Wordembedding
25 pages
Text Mining - Vectorization
No ratings yet
Text Mining - Vectorization
24 pages
ML7 - Text Classification
No ratings yet
ML7 - Text Classification
13 pages
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
No ratings yet
Proplem Chapter 2.pdf - 2023.02.03 - 12.38.41pm
7 pages
Civics: Understanding Judiciary
No ratings yet
Civics: Understanding Judiciary
3 pages
Solution Guide-CAP3
No ratings yet
Solution Guide-CAP3
30 pages
Qalambartar (QB) For Windows and Mac: 10, 2 M Flower @
No ratings yet
Qalambartar (QB) For Windows and Mac: 10, 2 M Flower @
3 pages
Constructing and Evaluating Word Embeddings
No ratings yet
Constructing and Evaluating Word Embeddings
33 pages
Semen Analysis
No ratings yet
Semen Analysis
42 pages
Lecture 1 1
No ratings yet
Lecture 1 1
26 pages
Environment Control System: Types of Absorbents
No ratings yet
Environment Control System: Types of Absorbents
4 pages
Proteus CT1628 Electrical Simulation
No ratings yet
Proteus CT1628 Electrical Simulation
4 pages
CS L08 AIML Lecture0107 Recap
No ratings yet
CS L08 AIML Lecture0107 Recap
123 pages
External Reciprocating Steam Engine
No ratings yet
External Reciprocating Steam Engine
8 pages
Glass Ceramics PDF
No ratings yet
Glass Ceramics PDF
80 pages
CS311 Final Term Question File 2019, 2020, 2021
No ratings yet
CS311 Final Term Question File 2019, 2020, 2021
5 pages
Midpalatal Miniscrew Insertion The Accuracy of Di
No ratings yet
Midpalatal Miniscrew Insertion The Accuracy of Di
7 pages
Nigerian Audit Impact on Reporting
No ratings yet
Nigerian Audit Impact on Reporting
121 pages
STD Viii Civics Notes Ch. 6
No ratings yet
STD Viii Civics Notes Ch. 6
2 pages
Indian Textile and Iron Industries
No ratings yet
Indian Textile and Iron Industries
2 pages
L.No.11 Mensuration Worksheet
No ratings yet
L.No.11 Mensuration Worksheet
2 pages
Sessional - 1 Blockchain (MCA)
No ratings yet
Sessional - 1 Blockchain (MCA)
9 pages
dg1-6 The Gauss Curvature (Detail)
No ratings yet
dg1-6 The Gauss Curvature (Detail)
12 pages
NLP Word Vectors for Students
No ratings yet
NLP Word Vectors for Students
33 pages
Transformers Noise Questions and Answers - Sanfoundry
No ratings yet
Transformers Noise Questions and Answers - Sanfoundry
9 pages
Coccinia Grandis
No ratings yet
Coccinia Grandis
9 pages
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
No ratings yet
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
17 pages