AIMLCZG567: AI & ML
Techniques for Cyber Security
BITS Pilani Jagdish Prasad
Pilani Campus WILP
BITS Pilani
Pilani Campus
Session : 04
Title : Basics for Machine Learning - II
Agenda
• Feature Extraction
• Feature Encoding, Vectorization, Normalization
• Issues: Overfitting, Under fitting, Class Imbalance
• Evaluation Metrics: Precision, Recall, F1-score
• Overview of Machine learning algorithms
• Support Vector Machine (SVM)
• Bayesian Networks
• Decision Trees
• Random Forests
• Hierarchical Algorithms
• Genetic Algorithms
• Similarity Algorithms
• Artificial Neural Networks (ANN)
BITS Pilani, Pilani Campus
Feature Extraction
BITS Pilani, Pilani Campus
Feature Extraction
• Machines only understand numerical data.
• Thus text data ‘as-is’ can not be used as input to Machine
Learning algorithm.
• Process of converting text data into numbers is called Feature
Extraction (also called text vectorization).
• In NLP, Feature Extraction is an important step for a better
understanding of the context of what we are dealing with.
BITS Pilani, Pilani Campus
Feature Extraction
• Feature Extraction aims to reduce the number of features in a
dataset by creating new features from the existing ones (and
then discarding the original features).
• These new reduced set of features should then be able to
summarize most of the information contained in the original set
of features.
• A summarised version of the original features can be created
from a combination of the original set.
BITS Pilani, Pilani Campus
Feature Extraction Techniques
• One Hot Encoding
• Bag of Word (BOW)
• N-grams
• Tf-Idf
• Custom features
• Word2Vec (Word Embedding)
BITS Pilani, Pilani Campus
Commonly Used Terms
Term Description
Corpus (c) The total number of words present in the whole dataset is
known as Corpus.
Vocabulary (V) Total number of unique words available in the corpus.
Document (D) There are multiple records in a dataset so a single record or
review is referred to as a document.
Word (w) Words that are used in a document are known as Word.
BITS Pilani, Pilani Campus
One Hot Encoding
• One hot encoding means converting the words of a document into a V-
dimension vector.
• Example:
• We have documents “We are learning Natural Language Processing”, “We are learning Data
Science”, and “Natural Language Processing comes under Data Science”.
• Corpus: We are learning Natural Language Processing, We are learning Data Science, Natural
Language Processing comes under Data Science
• Vocabulary (Unique words): We are learning Natural Language Processing Data Science
comes under (V – 10)
• Simplest but not very effective technique - Not much in use
BITS Pilani, Pilani Campus
One Hot Encoding: Example
Advantage
• It is Intuitive
• Easy to implement
Disadvantage
• It creates Sparsity.
• Size of each document
after one hot encoding
may be different.
• Out of Vocabulary (OOV)
problem.
• No capturing of semantic
meaning.
BITS Pilani, Pilani Campus
Bag of Words
• Representation of text that describes the Frequency of words
within a document.
• Specially used in the Text Classification task.
• We can directly use Count Vectorizer class by Scikit-learn.
• One of the most used text vectorization techniques.
BITS Pilani, Pilani Campus
Bag of Words
Advantage
• Simple and intuitive.
• Size of each document after BOW same.
Disadvantage
• BOW also creates Sparsity.
• Does not consider sentence ordering issues.
BITS Pilani, Pilani Campus
TF-IDF: Term Frequency and Inverse
Document Frequency
• TF-IDF is a statistical measure that evaluates how relevant a word is to a
document in a collection of documents.
• Term Frequency (TF):
• Number of times a word appears in a document is divided by the total number
of words in that document. 0 < Tf< 1
• Inverse Document Frequency (IDF):
• Logarithm of the number of documents in the corpus divided by the number of
documents where the specific term appears.
• In Scikit-learn use log(N/ni) + 1 formula.
BITS Pilani, Pilani Campus
TF-IDF: Term Frequency and Inverse
Document Frequency
Advantage
• Widely used technique
for Information retrieval
like a search engine.
Disadvantage
• Sparsity
• Dimensionality increases
with a large dataset,
slowing down algorithm.
• Does not capture
Semantic meaning.
BITS Pilani, Pilani Campus
Custom Features
• Creating new custom features using domain knowledge.
• Examples
• Number of a word in the document.
• Number of negative words in the document.
• Ratio of +ve review to -ve review.
• Word count.
• Character count.
BITS Pilani, Pilani Campus
Word2Vec
Word Embeddings
• Word embedding is a term used for the representation of words
for text analysis, typically in the form of a real-valued vector that
encodes the meaning of the word such that the words that are
closer in the vector space are expected to be similar in meaning.
• Example: Boy : Man v/s Boy : Table à which pair has more
similar words to each other?
• Easier for human to understand the associations between words
in a language.
• Word embeddings helps machines understand this kind of
relation automatically in language.
BITS Pilani, Pilani Campus
Word2Vec
Word Embedding Types:
• Frequency-based – Count frequency of word
• BOW
• Tf-idf
• Glove (based on Matric Factorization)
• Prediction based
• Word2Vec
BITS Pilani, Pilani Campus
Word2Vec
• Word2Vec is is a Deep learning-based technique.
• Word2Vec is a word embedding technique, that converts a given
word into a vector as a collection of numbers.
• Why word2vec?
• Word2vec capture semantic meaning like happiness and joy have the
same meaning.
• Word2vec create low dimension vector
• Word2vec creates a Dense vector (non-zeros)
• Two approaches to use Word2Vec:
• Use a pre-trained model
• Self-Trained model
BITS Pilani, Pilani Campus
Feature Encoding, Vectorization
and Normalization
BITS Pilani, Pilani Campus
What is Feature Encoding
• Process of transformation of categorical values of the relevant
features into numerical value is called feature encoding.
• Data frame analytics automatically performs feature encoding.
• The input data is pre-processed with the following encoding
techniques:
• One-Hot encoding: Assigns vectors to each category. The vector represent
whether the corresponding feature is present (1) or not (0).
• Target-Mean encoding: Replaces categorical values with the mean value of
the target variable.
• Frequency Encoding: Takes into account how many times a given
categorical value is present in relation with a feature.
BITS Pilani, Pilani Campus
Label/Ordinal Encoder
• Label Encoder and Ordinal
Encoder encode categories into
numerical values directly.
• Label Encoder is used for
nominal categorical variables
(categories without order i.e.
red, green, blue)
• Ordinal Encoder is used for
ordinal categorical variables
(categories with order i.e. small,
medium, large).
BITS Pilani, Pilani Campus
One Hot / Dummy Encoding
• In One-Hot Encoding and Dummy Encoding, the categorical column is
split into multiple columns consisting of ones and zeros.
• This addresses the drawback to Label and Ordinal Encoding where
columns are now read in as categorical columns due to encoded data
being represented as multiple Boolean columns.
BITS Pilani, Pilani Campus
Count/Frequency Encoding
• Count/Frequency Encoding
encodes categorical variables
to the count of occurrences
and frequency of occurrences
respectively.
• Utilizes the frequency of the
categories as labels.
• In the cases where the
frequency is related with the
target variable, it helps the
model to understand and
assign the weight in direct and
inverse proportion, depending
on the nature of the data.
BITS Pilani, Pilani Campus
Binary/BaseN Encoding
• Binary Encoding encodes
categorical variables into
integers, then converts them to
binary code.
• Output is similar to One-Hot
Encoding, but lesser columns are
created.
• Addresses the drawback to One-
Hot Encoding where a cardinality
of n does not result in n number
of columns, but log2(n) columns.
• BaseN Encoding follows the same
idea but uses other base values
instead of 2, resulting in logN(n)
columns.
BITS Pilani, Pilani Campus
Target/Mean Encoding
• Target encoding is similar to label encoding, except here labels
are correlated directly with the target.
• In Target encoding for each category in the feature label is
decided with the mean value of the target variable on a training
data.
• The advantages of the Target encoding are that it does not
affect the volume of the data and helps in faster learning.
• Target Encoding or Mean Encoding is one very popular encoding
approach.
BITS Pilani, Pilani Campus
Target Encoding
BITS Pilani, Pilani Campus
Feature Vector
• A feature vector is an ordered list of numerical properties of observed
phenomena. It represents input features to a machine learning model that
makes a prediction.
• Humans can analyze qualitative data to make a decision.
• Example: we see the cloudy sky, feel the damp breeze, and decide to take an
umbrella when going outside.
• However, machine learning models can only deal with quantitative data.
• We must always convert features of observed phenomena into numerical
values and feed them into a machine learning model in the same order.
• We must represent features in feature vectors.
BITS Pilani, Pilani Campus
Feature Scaling
• Feature scaling is a data pre-processing technique that involves
transforming the values of features or variables in a dataset to a similar
scale.
• This is done to ensure that all features contribute equally to the model
and to prevent features with larger values from dominating the model.
• Feature scaling is essential when working with datasets where the
features have different ranges, units of measurement, or orders of
magnitude.
• Common feature scaling techniques include standardization,
normalization, and min-max scaling.
• Feature scaling transforms the data to a more consistent scale, making
it easier to build accurate and effective machine learning models.
BITS Pilani, Pilani Campus
Feature Normalization
• Normalization is a feature scaling technique in which values are
shifted and rescaled so that they end up ranging between 0 and
1 (also known as Min-Max scaling).
• Normalization is done as part of data pre-processing to adjust
the values of features in a dataset to a common scale.
• Reduce the impact of different scales on the accuracy of
machine learning models.
• Formula for normalization:
BITS Pilani, Pilani Campus
Feature Standardization
• Standardization is a feature scaling technique where the values are
cantered around the Mean with a unit Standard Deviation.
• Under standardization, Mean of the attribute becomes zero, and
the resultant distribution has a unit Standard Deviation.
• Formula for standardization
BITS Pilani, Pilani Campus
Under / Over fit Models
BITS Pilani, Pilani Campus
Under fitting
• A Machine Learning algorithm is said to have underfitting when it cannot
capture the underlying trend of the data, i.e., it only performs well on training
data but performs poorly on testing data.
• Underfitting destroys the accuracy of our machine-learning model.
• Its means that model or the algorithm does not fit the data well enough.
• It usually happens when we have less data to build an accurate model and also
when we try to build a linear model with fewer non-linear data.
• Underfitting can be avoided by using more data and also reducing the features
• Reasons for Under fitting:
– High bias and low variance.
– Size of the training dataset used is not enough.
– Model is too simple.
– Training data is not cleaned and also contains noise in it.
BITS Pilani, Pilani Campus
Over fitting
• A Machine Learning model is said to be overfitted when the model does not
make accurate predictions on testing data.
• When a model gets trained with large data, it starts learning from the noise and
inaccurate data from the data set.
• During testing the model does not categorize the data correctly, because of too
many details and noise.
• Overfitting is caused by non-parametric and non-linear methods as these
machine learning algorithms have more freedom in building the model based on
the dataset and therefore they can really build unrealistic models.
• Using a linear algorithm for linear data or using the parameters like the maximal
depth for decision trees avoids over fitting.
• Reasons for Under fitting:
– High variance and low bias.
– Model is too complex.
– Size of the training data is large.
BITS Pilani, Pilani Campus
Challenges of Imbalanced Class
• Electricity theft (third largest form of theft) is the main challenges faced by the
utility industry today.
• Advanced Analytics and Machine Learning algorithms are used to identify
consumption patterns that indicate theft.
• Biggest challenge in this is the humongous data and its distribution.
• Fraudulent transactions are significantly lower than normal healthy transactions
i.e. around 1-2 % of the total number of observations.
• The ask is to improve identification of the rare minority class as opposed to
achieving higher overall accuracy.
• Machine Learning algorithms tend to produce unsatisfactory classifiers when
faced with imbalanced datasets.
• For an imbalanced data set, if the event to be predicted belongs to the minority
class and the event rate is less than 5%, it is referred to as a rare event.
BITS Pilani, Pilani Campus
Evaluation Metrics
• Evaluation metrics are quantitative measures used to assess the performance
and effectiveness of a Machine Learning model.
• Metrics indicate how well a model is performing and help in comparing different
models or algorithms.
• Evaluation metrics provide objective criteria to evaluate a Machine Learning
model for its:
• Predictive ability
• Generalization capability
• Overall quality
• Choice of evaluation metrics depends on the specific problem domain, the type
of data, and the desired outcome.
BITS Pilani, Pilani Campus
Evaluation Metrics: Terms Definitions
Term Definition
True Positives Predicted Value=Yes, Real Value=Yes
True Negatives Predicted Value=NO, Real Value=No
False Positives Predicted Value=Yes, Real Value=No
False Negatives Predicted Value=No, Real Value=Yes
Accuracy Ratio of total number of correct predictions to that were
correct.
Positive Predictive Ratio of positive cases that were correctly identified.
Value or Precision
Negative Predictive Ratio of negative cases that were correctly identified.
Value
Sensitivity or Recall Ratio of actual positive cases which are correctly
identified.
Specificity Ratio of actual negative cases which are correctly
identified.
BITS Pilani, Pilani Campus
Precision & Recall
Precision
• Precision is a measure of a model’s performance that tells how many of the
positive predictions made by the model are actually correct.
• It is calculated as the number of true positive predictions divided by the number
of true positive and false positive predictions.
Precision = TP / (TP + FP)
Recall
• Lower recall and higher precision give better accuracy but then it misses a large
number of instances.
• The more the F1 score better will be performance. It can be expressed
mathematically in this way:
Recall = TP / (TP + FN)
BITS Pilani, Pilani Campus
F1-Score
• F1-Score is the harmonic mean of precision and recall values for a classification
problem. The formula for F1-Score is as follows:
• Its range is 0 to 1.
• F1-Score tells how precise (correctly classifies how many instances) and robust
(does not miss any significant number of instances) the classifier is.
• Harmonic Mean punishes extreme values more.
• Example: Assume a binary classification model with the following results:
• Precision: 0, Recall: 1
• If we take the arithmetic mean, we get 0.5. It indicates that the above result comes
from a classifier that ignores the input and predicts one of the classes as output.
• If we were to take HM, we would get 0 which is accurate as this model is useless for
all purposes.
BITS Pilani, Pilani Campus
F1-Score
• F1 score gives the same importance to both Recall and Precision.
• If we want to give more weight to one of them, F1 score can be calculated by
attaching a value to either Recall or Precision depending on how many times the
value is important.
• In the equation below, β is the weightage.
BITS Pilani, Pilani Campus
ML Algorithms
BITS Pilani, Pilani Campus
Data Nomenclature
Attributes Output
x1 x2 x3 x4 x5 x6 Y
Mean
Per Capita Human Household
GDP (Trillion GDP Development Life Poverty Index Income Dev/
Country USD) (‘1000 USD) Index Expectancy (Gini as %age) (‘1000 USD) UnderDev
Canada 1.577 39.17 0.908 80.7 32.6 67.293 D
China 5.878 7.54 0.687 73 46.9 10.22 U
India 1.632 3.41 0.547 64.7 36.8 0.735 U
Russia 1.48 19.84 0.755 65.5 39.9 0.72 U
Singapore 0.223 56.69 0.866 80 42.5 67.1 D
USA 14.527 46.86 0.91 78.3 40.8 84.3 D
… … … … … … … …
Instances
[Ref: en.wikipedia.org]
BITS Pilani, Pilani Campus
Example Problem
• Given input x compute output y y
• Example: Compute price of house
slope
from area of house
• Size of house
w0
• No of bedrooms intercept
• Construction age x
• Locality
• Segment – affordable, premium, luxury y = w0 +w1.x Red line
y = w.x Violet line
• ….
y = f(x) General
or
y = h(x) Hypothesis
BITS Pilani, Pilani Campus
Example Problem
• Generalized linear regression model is a linear combination of the
input variables and parameters:
h(x) = wo + w1x1 + w2x2 + … + wnxn
• Key components of this model are:
• parameters w0, w1,…, wn (called Weights or parameters)
• input variable x1, x2, x3, …. xn (called Attributes or Features)
• The equation can be simplified in vector form:
h(x) = wox0 + w1x1 + w2x2 + … + wnxn where x0 = 1
if W = [w0 w1 w2 … wn]
X = [x0 x1 x2 … xn]
then h(x) = WT . X
This is an Example of Linear Regression algorithm
BITS Pilani, Pilani Campus
Linear Regression: Normal Regression
t.axis([0, 2, 0, 15])
t.show()
4-2. Linear Regression model predictions
BITS Pilani, Pilani Campus
Descent does: it measures the local gradient of the error function with regard
parameter vector θ, and it goes in the direction of descending gradient. Once
dient is zero, you have reached a minimum!
Gradient Descent Concretely, you start by filling θ with random values (this is called random in
tion), and then you improve it gradually, taking one baby step at a time, e
attempting to decrease the cost function (e.g., the MSE), until the algorithm c
to a minimum (see Figure 4-3).
• A generic optimization algorithm capable of
finding optimal solutions to a problem.
• Gradient Descent tweaks parameters iteratively
in order to minimize the error or cost function
[θ = θ – 𝜂]
• Measures the local gradient of the cost
function with regards to the parameter θ and
goes in the direction of descending gradient
• A gradient value of zero or minimum refersFigure
to 4-3. Gradient Descent
solution An important parameter in Gradient Descent is the size of the steps, determ
the learning rate hyperparameter. If the learning rate is too small, then the al
Figure 4-4. Learning rate too small
• Learning Rate determines size of the step Figure 4-4).
will have to go through many iterations to converge, which will take a long t
On the other hand, if the learning rate is too high, you might jump across the valley
and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
– If the learning rate is too small, the algorithm may take Figure 4-4. Learning rate too small
solution (see Figure 4-5).
On the other hand, if the learning rate is too high, you might jump across the valley
long to converge and end up on the other side, possibly even higher up than you were before. This
might make the algorithm diverge, with larger and larger values, failing to find a good
solution (see Figure 4-5).
– If the learning rate is too high, the algorithm may skip
minimum
Figure 4-5. Learning rate too large
BITS Pilani, Pilani Campus
Finally, not all cost functions look like nice regular bowls. There may be holes, ridges,
plateaus, and all sorts of irregular terrains, making convergence to the minimum very
Gradient Descent: Example
BITS Pilani, Pilani Campus
Naïve Bayes Algorithm
• Naïve Bayes classifier is a popular supervised machine learning algorithm used
for classification tasks such as text classification.
• Belongs to the family of generative learning algorithms, which means that it
models the distribution of inputs for a given class or category.
• Based on the assumption that the features of the input data are conditionally
independent given the class, allowing the algorithm to make predictions quickly
and accurately.
• Naive Bayes classifiers are among the simplest Bayesian network models, yet
they can achieve high accuracy levels.
• An NB model is easy to build and particularly useful for very large data sets.
• Naive Bayes is known to outperform even highly sophisticated classification
methods.
BITS Pilani, Pilani Campus
Naïve Bayes Algorithm
• Naïve Bayes classifier is a popular supervised machine learning algorithm used
for classification tasks such as text classification.
• Belongs to the family of generative learning algorithms, which means that it
models the distribution of inputs for a given class or category.
BITS Pilani, Pilani Campus
Decision Tree
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
• In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node.
• Decision nodes are used to make any decision and have multiple branches, whereas Leaf
nodes are the output of those decisions and do not contain any further branches.
• The decisions or the test are performed on the basis of features of the given dataset.
• It is a graphical representation for getting all the possible solutions to a problem/decision
based on given conditions.
• In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
• A decision tree simply asks a question, and based on the answer (Yes/No), it further split
the tree into subtrees.
BITS Pilani, Pilani Campus
Decision Tree
• Decision Tree is a Supervised
learning technique that can be
used for both classification and
Regression problems, but mostly it
is preferred for solving
Classification problems.
• It is a tree-structured classifier,
where internal nodes represent
the features of a dataset, branches
represent the decision
rules and each leaf node
represents the outcome.
• In a Decision tree, there are two
nodes, which are the Decision
Node and Leaf Node.
BITS Pilani, Pilani Campus
Random Forests
• Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML.
• It is based on the concept of ensemble learning, which is a process of combining
multiple classifiers to solve a complex problem and to improve the performance
of the model.
• As the name suggests, "Random Forest is a classifier that contains a number of
decision trees on various subsets of the given dataset and takes the average to
improve the predictive accuracy of that dataset."
• Instead of relying on one decision tree, the random forest takes the prediction
from each tree and based on the majority votes of predictions, and it predicts
the final output.
• The greater number of trees in the forest leads to higher accuracy and prevents
the problem of overfitting.
BITS Pilani, Pilani Campus
Random Forests
• Random Forest is a popular
machine learning algorithm
that belongs to the supervised
learning technique. It can be
used for both Classification and
Regression problems in ML.
• It is based on the concept
of ensemble learning, which is
a process of combining multiple
classifiers to solve a complex
problem and to improve the
performance of the model.
BITS Pilani, Pilani Campus
K-Means Clustering
• K-Means Clustering is an Unsupervised Learning algorithm, which groups the
unlabelled dataset into different clusters.
• K defines the number of pre-defined clusters that need to be created in the
process.
• Allows to cluster the data into different groups to discover the categories of
groups in the unlabelled dataset on its own without the need for any training.
• A centroid-based algorithm, where each cluster is associated with a centroid.
• Main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.
• K-Means clustering algorithm mainly performs two tasks:
• Determines the best value for K center points or centroids by an iterative process.
• Assigns each data point to its closest k-center. Those data points which are near to the
particular k-center, create a cluster.
• Each cluster has datapoints with some commonalities, and it is away from other
clusters
BITS Pilani, Pilani Campus
K-Means Clustering
• K-Means Clustering is
an Unsupervised Learning
algorithm, which groups the
unlabelled dataset into
different clusters.
• K defines the number of pre-
defined clusters that need to be
created in the process.
• Allows to cluster the data into
different groups to discover the
categories of groups in the
unlabelled dataset on its own
without the need for any
training.
BITS Pilani, Pilani Campus
Support Vector Machine (SVM)
• SVM is one of the most popular Supervised Learning algorithms, which is used
for Classification (primarily) as well as Regression problems.
• SVM algorithm creates the best line or decision boundary (hyperplane) that can
segregate n-dimensional space into classes so that we can easily put the new
data point in the correct category in the future.
• SVM chooses the extreme points/vectors called support vectors, to create the
hyperplane.
• Linear SVM: Linear SVM is used for linearly separable data, which means if a
dataset can be classified into two classes by using a single straight line, then
such data is termed as linearly separable data, and classifier is used called as
Linear SVM classifier.
• Non-linear SVM: Non-Linear SVM is used for non-linearly separated data, which
means if a dataset cannot be classified by using a straight line, then such data is
termed as non-linear data and classifier used is called as Non-linear SVM
classifier.
BITS Pilani, Pilani Campus
Support Vector Machine (SVM)
• SVM is one of the most
popular Supervised Learning
algorithms, which is used for
Classification (primarily) as
well as Regression problems.
• SVM algorithm creates the
best line or decision
boundary (hyperplane) that
can segregate n-dimensional
space into classes so that we
can easily put the new data
point in the correct category
in the future.
BITS Pilani, Pilani Campus
Genetic Algorithm
• Genetic algorithm is an adaptive heuristic search algorithm inspired by
"Darwin's theory of evolution in Nature."
• Used to solve complex and long time taking optimization problems in machine
learning.
• Genetic Algorithms are used in real-world applications, for example, Designing
electronic circuits, code-breaking, image processing, and artificial creativity.
• Genetic algorithm works on the evolutionary generational cycle to generate
high-quality solutions.
• Uses different operations that either enhance or replace the population to give
an improved fit solution.
• Five phases to solve the complex optimization problems:
• Initialization
• Fitness Assignment
• Selection
• Reproduction
• Termination
BITS Pilani, Pilani Campus
Genetic Algorithm
• Genetic algorithm is an adaptive heuristic
search algorithm inspired by "Darwin's
theory of evolution in Nature."
• Used to solve complex and long time taking
optimization problems in machine learning.
Cross-over
BITS Pilani, Pilani Campus
Artificial Neural Network (ANN)
• "Artificial Neural Network" is derived from Biological neural networks that
develop the structure of a human brain.
• ANN is usually a computational network based on biological neural networks
that construct the structure of the human brain.
• Like human brain, ANN has neurons that are linked to each other in various
layers of the networks known as nodes.
• An Artificial Neural Network attempts to mimic the network of neurons makes
up a human brain so that computers will have an option to understand things
and make decisions in a human-like manner.
BITS Pilani, Pilani Campus
Artificial Neural Network
Human Brain Artificial Neural Network
Biological Neural Network Artificial Neural Network
Dendrites Inputs
Cell nucleus Nodes
Synapse Weights
Axon Output
BITS Pilani, Pilani Campus
Artificial Neural Network
• There are around 1000 billion neurons in the human brain.
• Each neuron has an association point somewhere in the range of 1,000 and
100,000.
• In the human brain, data is stored in such a manner as to be distributed, and we
can extract more than one piece of this data when necessary from our memory
parallelly.
• We can say that the human brain is made up of incredibly amazing parallel
processors.
• We can understand the artificial neural network with an example:
• consider an example of a digital logic gate that takes an input and gives an output.
• "OR" gate, which takes two inputs. If one or both the inputs are "On," then we get "On" in
output.
• If both the inputs are "Off," then we get "Off" in output.
• The outputs to inputs relationship keep changing because of the neurons in our brain, which
are "learning."
BITS Pilani, Pilani Campus
Artificial Neural Network
Input Layer:
• Accepts inputs in several different
formats provided by the
programmer.
Hidden Layer:
• Presents in-between input and
output layers.
• Performs all the calculations to find
hidden features and patterns.
Output Layer:
• The input goes through a series of
• Artificial Neural Network takes input and transformations using the hidden
computes the weighted sum of the inputs layer
and includes a bias. • Final result is conveyed by output
• This computation is represented in the form layer.
of a transfer function.
BITS Pilani, Pilani Campus
Other Algorithms
• Logarithmic Regression
• Hierarchical Networks
BITS Pilani, Pilani Campus
Thank You
BITS Pilani, Pilani Campus