Deep Learning - Unit 1 Notes
I. Introduction to Neural Networks
Neural Networks are a subset of Artificial Intelligence (AI) and Machine
Learning (ML) that are designed to mimic the way the human brain
processes information. They consist of interconnected layers of nodes,
also called neurons, which work together to analyse and interpret data.
1. Structure:
A neural network typically has three types of layers:
o Input Layer: Accepts input features.
o Hidden Layers: Perform computations and extract patterns
by applying weights, biases, and activation functions.
o Output Layer: Produces the final output or prediction.
2. Learning Mechanism:
Neural networks learn through a process called training, where
they adjust the weights and biases using optimization techniques
like gradient descent to minimize the error (loss function).
3. Applications:
They are widely used in areas such as image and speech
recognition, natural language processing, robotics, and predictive
analytics.
4. Types of Neural Networks:
o Feedforward Neural Networks
o Convolutional Neural Networks (CNNs) for image data
o Recurrent Neural Networks (RNNs) for sequential data
Neural networks are powerful tools due to their ability to handle large
datasets, detect complex patterns, and improve performance through
training.
Working of Neural Networks
Neural networks process data through interconnected layers of neurons.
Each neuron applies a weighted sum of inputs, adds a bias, and passes
the result through an activation function to introduce non-linearity. The
network adjusts weights during training to minimize prediction errors.
Forward Propagation
1. Input Layer: Accepts input features and passes them to the next
layer.
2. Hidden Layers: Perform weighted computations z=∑(w⋅x)+b and
apply an activation function (e.g., ReLU, sigmoid) to generate outputs.
3. Output Layer: Produces predictions by transforming the final
computed values.
Backpropagation
1. Error Calculation: Compares the output with the true label using a
loss function (e.g., Mean Squared Error, Cross-Entropy).
2. Gradient Computation: Calculates the gradient of the loss with
respect to weights using the chain rule of calculus.
3. Weight Update: Adjusts weights in the network using optimization
techniques like Gradient Descent to minimize error.
(where η is the learning rate and L is the loss).
This iterative process ensures the network learns patterns in data
effectively.
II) Deep Learning
Deep Learning is a subset of Machine Learning (ML) that focuses on
training models with multiple layers of artificial neural networks to
automatically extract hierarchical patterns and representations from data.
It is inspired by the structure and functioning of the human brain.
Key Characteristics
1. Deep Architectures: Deep learning models use multiple hidden
layers in neural networks to learn progressively complex features.
For example, in image recognition:
o Early layers detect edges.
o Middle layers detect shapes and textures.
o Deeper layers identify objects.
2. Representation Learning: Unlike traditional ML, which requires
manual feature extraction, deep learning automatically learns
relevant features from raw data.
Working of Deep Learning
1. Input Layer: Takes raw data (e.g., images, text, or numerical data).
2. Hidden Layers: Consist of neurons that process data using
weights, biases, and activation functions. These layers learn feature
representations.
3. Output Layer: Provides predictions or classifications.
4. Training: Uses algorithms like backpropagation and optimizers
(e.g., Gradient Descent) to minimize the loss function and improve
model accuracy.
Key Algorithms and Architectures
1. Feedforward Neural Networks (FNNs): Basic networks used for
simple tasks.
2. Convolutional Neural Networks (CNNs): Specialized for image
data; extract spatial features.
3. Recurrent Neural Networks (RNNs): Handle sequential data like
time series or text.
4. Transformers: Used for Natural Language Processing (NLP),
including models like GPT.
5. Generative Adversarial Networks (GANs): Used for generating
new data, such as images or videos.
Applications of Deep Learning
1. Image Recognition: Face detection, medical imaging (e.g.,
detecting tumors).
2. Natural Language Processing (NLP): Machine translation,
chatbots, sentiment analysis.
3. Speech Recognition: Voice assistants like Alexa and Siri.
4. Autonomous Vehicles: Lane detection, object recognition.
5. Finance: Fraud detection, stock market prediction.
Advantages
1. Handles large, unstructured data efficiently (e.g., images, audio,
text).
2. Reduces reliance on feature engineering.
3. Provides state-of-the-art results in various fields.
Challenges
1. Requires large datasets and significant computational power.
2. Training can be time-consuming.
3. Models are often considered "black boxes," making them harder to
interpret.
III) Comparison of Machine Learning and Deep Learning
Aspect Machine Learning (ML) Deep Learning (DL)
A subset of AI where models learn A subset of ML that uses neural networks with
Definition
patterns from data using algorithms. multiple layers to learn complex patterns.
Feature Requires manual feature extraction Features are automatically learned from data by
Engineering by domain experts. the network.
Performs well with small to medium- Requires large amounts of labeled data for
Data Dependency
sized datasets. effective learning.
Works well with structured/tabular Excels with unstructured data (images, audio,
Complexity
data and simpler tasks. text) and complex tasks.
Algorithm Linear Regression, Decision Trees, Convolutional Neural Networks (CNNs), Recurrent
Examples Random Forest, SVM, etc. Neural Networks (RNNs), Transformers, etc.
Requires moderate computational Requires high computational resources
Processing Power
power. (GPUs/TPUs).
Faster to train, especially for smaller Takes longer to train due to large models and
Training Time
datasets. multiple layers.
Interpretability Models are easier to interpret (e.g., Models are often considered "black boxes" and
Aspect Machine Learning (ML) Deep Learning (DL)
feature importance). harder to interpret.
Fraud detection, predictive Image recognition, NLP, autonomous vehicles,
Applications
modelling, basic classification tasks. deep reinforcement learning.
Examples of Use Predicting house prices, customer Face recognition, self-driving cars, speech-to-text
Cases churn analysis. systems.
1V) Biological Motivation of Neural Networks
The development of artificial neural networks (ANNs) is inspired by the
structure and functioning of the human brain. Neural networks aim to
replicate how biological neurons in the brain process, learn, and store
information. Below are the key biological motivations:
1. Structure of the Brain
Biological Neurons: The brain consists of billions of interconnected
neurons. Each neuron has:
o Dendrites: Receive signals from other neurons.
o Cell Body (Soma): Processes the input signals.
o Axon: Transmits the output signal to other neurons.
Artificial Neurons: Modeled as nodes in a neural network, they
mimic this structure by receiving input, applying a function, and
passing output to other nodes.
2. Synaptic Connections
Biological Synapses: Connections between neurons that
strengthen or weaken based on learning (synaptic plasticity).
Weights in Neural Networks: In ANNs, weights mimic synapses,
determining the strength of influence one neuron has on another.
Adjusting weights during training is analogous to the brain's learning
process.
3. Learning and Adaptation
Hebbian Learning: In biological systems, learning is based on the
principle "neurons that fire together, wire together."
Gradient Descent in ANNs: Neural networks adjust weights using
algorithms like backpropagation to minimize error, analogous to
how the brain adapts its connections to optimize responses.
4. Non-Linearity
Non-Linear Responses in the Brain: Biological neurons do not
respond in a linear manner to stimuli; instead, they have activation
thresholds.
Activation Functions: Neural networks use activation functions
(e.g., sigmoid, ReLU) to introduce non-linearity, mimicking the non-
linear nature of biological neuron responses.
5. Parallel Processing
Brain’s Parallel Processing: The brain processes multiple signals
simultaneously through its vast network of neurons.
Neural Networks: ANNs perform parallel computations, making
them suitable for complex tasks like image recognition and
language processing.
6. Learning from Data
Human Learning: The brain learns from sensory data (vision,
sound, etc.) and builds representations over time.
Neural Networks: Similarly, ANNs learn patterns and features from
raw data (structured or unstructured) through training.
V) Fundamentals of Tensor Flow - Data Structures in
Tensor Flow
Refer Below Link for Study
https://classroom.google.com/c/NzI1MDIyNDYwODc0/m/
NzQ0NzQxNDMwNTc4/details
VI) Perceptron model-perceptron learning rule
The Perceptron is the simplest type of artificial neural network
introduced by Frank Rosenblatt in 1958. It is a binary classifier that
works by mapping input features to one of two possible outputs using a
linear decision boundary.
Structure of Perceptron
1. Input Layer:
o Takes input features x1,x2,...xn
2. Weights (www):
o Each input is associated with a weight w1,w2,...,wn to determine its
importance.
3. Summation Function:
4. Activation Function:
Applies a step function to decide the output:
Working of the Perceptron Model
1. Initialize weights and bias to small random values (often zero).
2. Compute the weighted sum of inputs.
3. Pass the result through the step activation function.
4. Compare the predicted output to the actual output and update
weights if there is an error (using the Perceptron Learning Rule).
5. Repeat the process for all training examples until convergence (no
errors).
Perceptron Learning Rule
The Perceptron Learning Rule is used to update weights and bias
during training to minimize the error. It adjusts weights incrementally for
each misclassified example.
Limitations of Perceptron
1. Can only classify linearly separable data.
2. Cannot solve problems like XOR (requires multi-layer networks).
VII) Learning XOR Problems
This problem is significant because it highlights the limitations of
single-layer perceptrons. A single-layer perceptron can only learn
linearly separable patterns, whereas a straight line or hyperplane can
separate the data points. However, they requires a non-linear
decision boundary to classify the inputs accurately. This means that a
single-layer perceptron fails to solve the XOR problem, emphasizing the
need for more complex neural networks.
Explaining the XOR Problem
To understand the XOR problem better, let’s take a look at the XOR
gate and its truth table. The XOR gate takes two binary inputs and
returns true if exactly one of the inputs is true. The truth table for the
XOR gate is as follows:
| Input 1 | Input 2 | Output |
|———|———|——–|
| 0 | 0 | 0 |
| 0 | 1 | 1 |
| 1 | 0 | 1 |
| 1 | 1 | 0 |
As we can see from the truth table, the XOR gate produces a true
output only when the inputs are different. This non-linear relationship
between the inputs and the output poses a challenge for single-layer
perceptrons, which can only learn linearly separable patterns.
Solving the XOR Problem with Neural Networks
To solve the XOR problem, we need to introduce multi-layer
perceptrons (MLPs) and the backpropagation algorithm. MLPs are
neural networks with one or more hidden layers between the input and
output layers. These hidden layers allow the network to learn non-
linear relationships between the inputs and outputs.
The backpropagation algorithm is a learning algorithm that adjusts the
weights of the neurons in the network based on the error between the
predicted output and the actual output. It works by propagating the
error backwards through the network and updating the weights using
gradient descent.
In addition to MLPs and the backpropagation algorithm, the choice of
activation functions also plays a crucial role in solving the XOR
problem. Activation functions introduce non-linearity into the
network, allowing it to learn complex patterns. Popular activation
functions for solving the XOR problem include the sigmoid function and
the hyperbolic tangent function.
XIII) Activation Function in neural networks
Before delve into the details of activation function in deep learning, let
us quickly go through the concept of Activation functions in neural
networks and how they work. A neural network is a very powerful
machine learning mechanism which basically mimics how a human
brain learns.
The brain receives the stimulus from the outside world, does the
processing on the input, and then generates the output. As the task
gets complicated, multiple neurons form a complex network, passing
information among themselves.
An Artificial Neural Network tries to mimic a similar behavior. The
network you see below is a neural network made of interconnected
neurons. Each neuron is characterized by its weight, bias and
activation function in deep learning.
The input is fed to the input layer, the neurons perform a linear
transformation on this input using the weights and biases.
x = (weight * input) + bias
Post that, an activation function is applied on the above result.
Finally, the output from the activation function moves to the next
hidden layer and the same process is repeated. This forward movement
of information is known as the forward propagation.
What if the output generated is far away from the actual value? Using
the output from the forward propagation, error is calculated. Based on
this error value, the weights and biases of the neurons are updated.
This process is known as back-propagation.
Can we do without an activation function?
We understand that using an activation function introduces an
additional step at each layer during the forward propagation. Now the
question is – if the activation function increases the complexity so
much, can we do without an activation function?
Imagine a neural network without the activation functions. In that case,
every neuron will only be performing a linear transformation on the
inputs using the weights and biases. Although linear transformations
make the neural network simpler, but this network would be less
powerful and will not be able to learn the complex patterns from the
data.
A neural network without an activation function in deep learning is
essentially just a linear regression model.
Thus we use a non-linear transformation to the inputs of the neuron
and this non-linearity in the network is introduced by an activation
function.
Why do we need Non-linear activation function?
Here’s how non-linear activation functions are essential for
neural networks, Here down into steps:
1. Data Processing: Neural networks process data layer by layer.
Each layer performs a weighted sum of its inputs and adds a bias.
2. Linear Limitation: If all layers used linear activation functions (y
= mx + b), stacking these layers would just create another linear
function. No matter how many layers you add, the output would
still be a straight line.
3. Introducing Non-linearity: Non-linear activation functions are
introduced after the linear step in each layer. These functions
transform the linear data into a non-linear form (e.g., sigmoid
function curves the output).
4. Learning Complex Patterns: Because of this non-linear
transformation, the network can learn complex patterns in the
data that wouldn’t be possible with just linear functions. Imagine
stacking multiple curved shapes instead of straight lines.
5. Beyond Linear Separation: This allows the network to move
beyond simply separating data linearly, like logistic regression. It
can learn intricate relationships between features in the data.
6. Foundation for Complex Tasks: By enabling the network to
represent complex features, non-linear activation functions
become the building blocks for neural networks to tackle tasks
like image recognition, natural language processing, and
more.pen_spark
Different types of activation functions
Comparison of Activation Functions
Function Range Use Case Advantages Disadvantages
Smooth, easy to
Sigmoid (0,1) Binary classification Vanishing gradient
understand
Tanh (−1,1) Binary classification Zero-centered Vanishing gradient
Hidden layers of deep Efficient, avoids
ReLU [0,∞) Dying ReLU problem
networks saturation
Leaky Hidden layers of deep Predefined slope (α\
(−∞,∞) Solves dying ReLU
ReLU networks alphaα)
Multi-class High computational
Softmax (0,1) Outputs probabilities
classification cost
Higher computational
Swish (−∞,∞) Complex deep networks Smooth, adaptive
cost
NLP tasks (e.g., Smooth, improves Computationally
GELU (−∞,∞)
Transformers) performance complex
IX) Feedforward Neural Networks
Feedforward Neural Networks (FNNs)
A Feedforward Neural Network (FNN) is the simplest type of artificial
neural network where information flows in one direction—from the input
layer, through the hidden layers (if any), to the output layer. There are no
loops or cycles in the network.
Key Components of FNN
1. Input Layer:
o Accepts the features of the dataset.
o Each neuron corresponds to a specific feature in the input.
2. Hidden Layers:
o One or more layers where neurons apply weights and biases
to the input and pass it through an activation function to
model non-linearity.
o These layers learn representations of the data.
3. Output Layer:
o Provides the final output of the network, such as a prediction
or classification.
o Uses appropriate activation functions:
Sigmoid for binary classification.
Softmax for multi-class classification.
No activation or linear activation for regression.
4. Weights and Biases:
o Weights determine the importance of each input to a neuron.
o Bias shifts the activation function, providing additional
flexibility.
5. Activation Function:
o Introduces non-linearity to help the network learn complex
patterns.
o Common activation functions include ReLU, Sigmoid, Tanh,
and Softmax.
Working of a Feedforward Neural Network
1. Initialization:
o Weights and biases are initialized randomly or using
techniques like Xavier or He initialization.
2. Forward Propagation:
o Information flows from the input layer to the output layer.
o Each neuron performs the following computation
Where z is the weighted sum, a is the activated output, www are
weights, x are inputs, b is bias, and f is the activation function.
3. Output Calculation:
The output layer generates predictions based on the final layer's
activations.
4. Loss Function:
Calculates the error (loss) between the predicted output and the
actual target value.
Common loss functions:
o Mean Squared Error (MSE) for regression.
o Cross-Entropy Loss for classification.
5. Backpropagation:
The error is propagated backward through the network to update
weights and biases using optimization algorithms like Gradient
Descent.
6. Weight Update:
Weights are updated using the rule
7. Iteration:
Steps 2–6 are repeated for a fixed number of iterations (epochs) or
until convergence.
Applications of FNN
1. Regression:
o Predicting continuous values (e.g., house prices, stock prices).
2. Classification:
o Binary classification (e.g., spam detection).
o Multi-class classification (e.g., digit recognition).
3. Pattern Recognition:
o Image and speech recognition tasks.
4. Forecasting:
o Time series forecasting (e.g., weather prediction, demand
forecasting).
Advantages of FNN
1. Simple Architecture:
o Easy to implement and understand.
2. Flexibility:
o Can approximate any continuous function with sufficient
neurons and layers (Universal Approximation Theorem).
3. Versatility:
o Applicable to a wide range of supervised learning problems.
Limitations of FNN
1. Inefficiency with Complex Data:
o Struggles with high-dimensional or unstructured data like
images and text.
2. Overfitting:
o Can memorize training data if not regularized properly.
3. No Memory:
o Cannot handle sequential data, as there is no feedback or
memory mechanism (unlike Recurrent Neural Networks).
X) Backpropagation in Neural Networks
Backpropagation (short for Backward Propagation of Errors) is an
optimization algorithm used to train neural networks. It adjusts the
weights and biases of the network to minimize the error between the
predicted output and the actual target values. It uses the chain rule of
calculus to compute gradients and propagates the error backward from
the output layer to the input layer.
Key Steps in Backpropagation
1. Initialization:
o Randomly initialize the weights and biases of the network.
2. Forward Propagation:
o Compute the output of the network by passing inputs through
the layers using the following steps:
Advantages of Backpropagation
1. Efficient:
o Reduces the computational complexity of training deep
networks.
2. Widely Applicable:
oCan be used for a variety of network architectures and tasks.
3. Automatic Differentiation:
o Frameworks like TensorFlow and PyTorch automate gradient
computation.
Limitations
1. Vanishing Gradient Problem:
o Gradients can become very small in deep networks, slowing
learning.
2. Overfitting:
o May overfit the training data if the model is too complex or
lacks regularization.
3. Computationally Intensive:
o Training deep networks requires significant computational
resources.
XI) Chain Rule
The chain rule is a fundamental concept in calculus used to compute the
derivative of a composite function. It is crucial in the backpropagation
algorithm in neural networks, where it helps calculate gradients for
updating weights and biases.
Advantages of the Chain Rule
1. Handles Complex Functions:
o Allows computation of derivatives for nested functions, which
is critical in deep learning.
2. Scalable:
o Can be applied iteratively to compute derivatives for deep
neural networks with multiple layers.
XII) Loss Function in Deep Learning
A loss function in deep learning is a mathematical function that
measures the difference between the predicted output (y^\hat{y}y^) of a
model and the actual target value (yyy). It quantifies the error or "loss" in
the model’s predictions and serves as a guide for optimizing the model
during training.
The goal of training a neural network is to minimize the loss function,
thereby improving the model's accuracy and performance.
Key Concepts of Loss Function
1. Purpose:
o Quantify how well or poorly the model is performing.
o Provide feedback to update model parameters (weights and
biases) using optimization algorithms like Gradient Descent.
2. Optimization Objective:
o Minimize the loss function to reduce the error.
o The weights and biases of the network are adjusted iteratively
to achieve this goal.
3. Global vs. Local Minima:
o The optimizer seeks to find the global minimum of the loss
function, where the error is the smallest.