Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
75 views27 pages

Deep Learning - Unit 1 Notes

The document provides an overview of neural networks and deep learning, detailing their structure, learning mechanisms, and applications. It explains key concepts such as forward propagation, backpropagation, and the importance of activation functions in enabling non-linearity for complex pattern recognition. Additionally, it compares machine learning and deep learning, highlights the biological motivations behind neural networks, and discusses the limitations of simple models like perceptrons, particularly in solving non-linear problems like XOR.

Uploaded by

Bhuvana H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
75 views27 pages

Deep Learning - Unit 1 Notes

The document provides an overview of neural networks and deep learning, detailing their structure, learning mechanisms, and applications. It explains key concepts such as forward propagation, backpropagation, and the importance of activation functions in enabling non-linearity for complex pattern recognition. Additionally, it compares machine learning and deep learning, highlights the biological motivations behind neural networks, and discusses the limitations of simple models like perceptrons, particularly in solving non-linear problems like XOR.

Uploaded by

Bhuvana H
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Deep Learning - Unit 1 Notes

I. Introduction to Neural Networks

Neural Networks are a subset of Artificial Intelligence (AI) and Machine


Learning (ML) that are designed to mimic the way the human brain
processes information. They consist of interconnected layers of nodes,
also called neurons, which work together to analyse and interpret data.

1. Structure:
A neural network typically has three types of layers:
o Input Layer: Accepts input features.
o Hidden Layers: Perform computations and extract patterns
by applying weights, biases, and activation functions.
o Output Layer: Produces the final output or prediction.

2. Learning Mechanism:
Neural networks learn through a process called training, where
they adjust the weights and biases using optimization techniques
like gradient descent to minimize the error (loss function).
3. Applications:
They are widely used in areas such as image and speech
recognition, natural language processing, robotics, and predictive
analytics.
4. Types of Neural Networks:
o Feedforward Neural Networks
o Convolutional Neural Networks (CNNs) for image data
o Recurrent Neural Networks (RNNs) for sequential data

Neural networks are powerful tools due to their ability to handle large
datasets, detect complex patterns, and improve performance through
training.

Working of Neural Networks

Neural networks process data through interconnected layers of neurons.


Each neuron applies a weighted sum of inputs, adds a bias, and passes
the result through an activation function to introduce non-linearity. The
network adjusts weights during training to minimize prediction errors.

Forward Propagation

1. Input Layer: Accepts input features and passes them to the next
layer.
2. Hidden Layers: Perform weighted computations z=∑(w⋅x)+b and
apply an activation function (e.g., ReLU, sigmoid) to generate outputs.
3. Output Layer: Produces predictions by transforming the final
computed values.

Backpropagation

1. Error Calculation: Compares the output with the true label using a
loss function (e.g., Mean Squared Error, Cross-Entropy).
2. Gradient Computation: Calculates the gradient of the loss with
respect to weights using the chain rule of calculus.
3. Weight Update: Adjusts weights in the network using optimization
techniques like Gradient Descent to minimize error.

(where η is the learning rate and L is the loss).

This iterative process ensures the network learns patterns in data


effectively.

II) Deep Learning


Deep Learning is a subset of Machine Learning (ML) that focuses on
training models with multiple layers of artificial neural networks to
automatically extract hierarchical patterns and representations from data.
It is inspired by the structure and functioning of the human brain.

Key Characteristics

1. Deep Architectures: Deep learning models use multiple hidden


layers in neural networks to learn progressively complex features.
For example, in image recognition:
o Early layers detect edges.
o Middle layers detect shapes and textures.
o Deeper layers identify objects.
2. Representation Learning: Unlike traditional ML, which requires
manual feature extraction, deep learning automatically learns
relevant features from raw data.

Working of Deep Learning

1. Input Layer: Takes raw data (e.g., images, text, or numerical data).
2. Hidden Layers: Consist of neurons that process data using
weights, biases, and activation functions. These layers learn feature
representations.
3. Output Layer: Provides predictions or classifications.
4. Training: Uses algorithms like backpropagation and optimizers
(e.g., Gradient Descent) to minimize the loss function and improve
model accuracy.

Key Algorithms and Architectures

1. Feedforward Neural Networks (FNNs): Basic networks used for


simple tasks.
2. Convolutional Neural Networks (CNNs): Specialized for image
data; extract spatial features.
3. Recurrent Neural Networks (RNNs): Handle sequential data like
time series or text.
4. Transformers: Used for Natural Language Processing (NLP),
including models like GPT.
5. Generative Adversarial Networks (GANs): Used for generating
new data, such as images or videos.

Applications of Deep Learning

1. Image Recognition: Face detection, medical imaging (e.g.,


detecting tumors).
2. Natural Language Processing (NLP): Machine translation,
chatbots, sentiment analysis.
3. Speech Recognition: Voice assistants like Alexa and Siri.
4. Autonomous Vehicles: Lane detection, object recognition.
5. Finance: Fraud detection, stock market prediction.

Advantages

1. Handles large, unstructured data efficiently (e.g., images, audio,


text).
2. Reduces reliance on feature engineering.
3. Provides state-of-the-art results in various fields.

Challenges

1. Requires large datasets and significant computational power.


2. Training can be time-consuming.
3. Models are often considered "black boxes," making them harder to
interpret.

III) Comparison of Machine Learning and Deep Learning

Aspect Machine Learning (ML) Deep Learning (DL)

A subset of AI where models learn A subset of ML that uses neural networks with
Definition
patterns from data using algorithms. multiple layers to learn complex patterns.

Feature Requires manual feature extraction Features are automatically learned from data by
Engineering by domain experts. the network.

Performs well with small to medium- Requires large amounts of labeled data for
Data Dependency
sized datasets. effective learning.

Works well with structured/tabular Excels with unstructured data (images, audio,
Complexity
data and simpler tasks. text) and complex tasks.

Algorithm Linear Regression, Decision Trees, Convolutional Neural Networks (CNNs), Recurrent
Examples Random Forest, SVM, etc. Neural Networks (RNNs), Transformers, etc.

Requires moderate computational Requires high computational resources


Processing Power
power. (GPUs/TPUs).

Faster to train, especially for smaller Takes longer to train due to large models and
Training Time
datasets. multiple layers.

Interpretability Models are easier to interpret (e.g., Models are often considered "black boxes" and
Aspect Machine Learning (ML) Deep Learning (DL)

feature importance). harder to interpret.

Fraud detection, predictive Image recognition, NLP, autonomous vehicles,


Applications
modelling, basic classification tasks. deep reinforcement learning.

Examples of Use Predicting house prices, customer Face recognition, self-driving cars, speech-to-text
Cases churn analysis. systems.

1V) Biological Motivation of Neural Networks

The development of artificial neural networks (ANNs) is inspired by the


structure and functioning of the human brain. Neural networks aim to
replicate how biological neurons in the brain process, learn, and store
information. Below are the key biological motivations:

1. Structure of the Brain

 Biological Neurons: The brain consists of billions of interconnected


neurons. Each neuron has:
o Dendrites: Receive signals from other neurons.
o Cell Body (Soma): Processes the input signals.
o Axon: Transmits the output signal to other neurons.
 Artificial Neurons: Modeled as nodes in a neural network, they
mimic this structure by receiving input, applying a function, and
passing output to other nodes.

2. Synaptic Connections

 Biological Synapses: Connections between neurons that


strengthen or weaken based on learning (synaptic plasticity).
 Weights in Neural Networks: In ANNs, weights mimic synapses,
determining the strength of influence one neuron has on another.
Adjusting weights during training is analogous to the brain's learning
process.
3. Learning and Adaptation

 Hebbian Learning: In biological systems, learning is based on the


principle "neurons that fire together, wire together."
 Gradient Descent in ANNs: Neural networks adjust weights using
algorithms like backpropagation to minimize error, analogous to
how the brain adapts its connections to optimize responses.

4. Non-Linearity

 Non-Linear Responses in the Brain: Biological neurons do not


respond in a linear manner to stimuli; instead, they have activation
thresholds.
 Activation Functions: Neural networks use activation functions
(e.g., sigmoid, ReLU) to introduce non-linearity, mimicking the non-
linear nature of biological neuron responses.

5. Parallel Processing

 Brain’s Parallel Processing: The brain processes multiple signals


simultaneously through its vast network of neurons.
 Neural Networks: ANNs perform parallel computations, making
them suitable for complex tasks like image recognition and
language processing.

6. Learning from Data

 Human Learning: The brain learns from sensory data (vision,


sound, etc.) and builds representations over time.
 Neural Networks: Similarly, ANNs learn patterns and features from
raw data (structured or unstructured) through training.

V) Fundamentals of Tensor Flow - Data Structures in


Tensor Flow
Refer Below Link for Study

https://classroom.google.com/c/NzI1MDIyNDYwODc0/m/
NzQ0NzQxNDMwNTc4/details

VI) Perceptron model-perceptron learning rule

The Perceptron is the simplest type of artificial neural network


introduced by Frank Rosenblatt in 1958. It is a binary classifier that
works by mapping input features to one of two possible outputs using a
linear decision boundary.
Structure of Perceptron

1. Input Layer:
o Takes input features x1,x2,...xn
2. Weights (www):
o Each input is associated with a weight w1,w2,...,wn to determine its
importance.
3. Summation Function:

4. Activation Function:

Applies a step function to decide the output:

Working of the Perceptron Model

1. Initialize weights and bias to small random values (often zero).


2. Compute the weighted sum of inputs.
3. Pass the result through the step activation function.
4. Compare the predicted output to the actual output and update
weights if there is an error (using the Perceptron Learning Rule).
5. Repeat the process for all training examples until convergence (no
errors).

Perceptron Learning Rule

The Perceptron Learning Rule is used to update weights and bias


during training to minimize the error. It adjusts weights incrementally for
each misclassified example.
Limitations of Perceptron

1. Can only classify linearly separable data.


2. Cannot solve problems like XOR (requires multi-layer networks).

VII) Learning XOR Problems

This problem is significant because it highlights the limitations of

single-layer perceptrons. A single-layer perceptron can only learn


linearly separable patterns, whereas a straight line or hyperplane can

separate the data points. However, they requires a non-linear

decision boundary to classify the inputs accurately. This means that a

single-layer perceptron fails to solve the XOR problem, emphasizing the

need for more complex neural networks.

Explaining the XOR Problem

To understand the XOR problem better, let’s take a look at the XOR

gate and its truth table. The XOR gate takes two binary inputs and

returns true if exactly one of the inputs is true. The truth table for the

XOR gate is as follows:

| Input 1 | Input 2 | Output |

|———|———|——–|

| 0 | 0 | 0 |

| 0 | 1 | 1 |

| 1 | 0 | 1 |

| 1 | 1 | 0 |
As we can see from the truth table, the XOR gate produces a true
output only when the inputs are different. This non-linear relationship
between the inputs and the output poses a challenge for single-layer
perceptrons, which can only learn linearly separable patterns.

Solving the XOR Problem with Neural Networks


To solve the XOR problem, we need to introduce multi-layer
perceptrons (MLPs) and the backpropagation algorithm. MLPs are
neural networks with one or more hidden layers between the input and
output layers. These hidden layers allow the network to learn non-
linear relationships between the inputs and outputs.

The backpropagation algorithm is a learning algorithm that adjusts the

weights of the neurons in the network based on the error between the
predicted output and the actual output. It works by propagating the

error backwards through the network and updating the weights using

gradient descent.

In addition to MLPs and the backpropagation algorithm, the choice of

activation functions also plays a crucial role in solving the XOR

problem. Activation functions introduce non-linearity into the

network, allowing it to learn complex patterns. Popular activation

functions for solving the XOR problem include the sigmoid function and

the hyperbolic tangent function.

XIII) Activation Function in neural networks

Before delve into the details of activation function in deep learning, let

us quickly go through the concept of Activation functions in neural

networks and how they work. A neural network is a very powerful

machine learning mechanism which basically mimics how a human

brain learns.

The brain receives the stimulus from the outside world, does the

processing on the input, and then generates the output. As the task

gets complicated, multiple neurons form a complex network, passing

information among themselves.


An Artificial Neural Network tries to mimic a similar behavior. The
network you see below is a neural network made of interconnected
neurons. Each neuron is characterized by its weight, bias and
activation function in deep learning.

The input is fed to the input layer, the neurons perform a linear
transformation on this input using the weights and biases.
x = (weight * input) + bias

Post that, an activation function is applied on the above result.


Finally, the output from the activation function moves to the next

hidden layer and the same process is repeated. This forward movement

of information is known as the forward propagation.

What if the output generated is far away from the actual value? Using

the output from the forward propagation, error is calculated. Based on

this error value, the weights and biases of the neurons are updated.

This process is known as back-propagation.

Can we do without an activation function?

We understand that using an activation function introduces an

additional step at each layer during the forward propagation. Now the

question is – if the activation function increases the complexity so

much, can we do without an activation function?

Imagine a neural network without the activation functions. In that case,

every neuron will only be performing a linear transformation on the

inputs using the weights and biases. Although linear transformations

make the neural network simpler, but this network would be less

powerful and will not be able to learn the complex patterns from the

data.

A neural network without an activation function in deep learning is

essentially just a linear regression model.

Thus we use a non-linear transformation to the inputs of the neuron

and this non-linearity in the network is introduced by an activation

function.
Why do we need Non-linear activation function?

Here’s how non-linear activation functions are essential for

neural networks, Here down into steps:

1. Data Processing: Neural networks process data layer by layer.

Each layer performs a weighted sum of its inputs and adds a bias.

2. Linear Limitation: If all layers used linear activation functions (y

= mx + b), stacking these layers would just create another linear

function. No matter how many layers you add, the output would

still be a straight line.

3. Introducing Non-linearity: Non-linear activation functions are

introduced after the linear step in each layer. These functions

transform the linear data into a non-linear form (e.g., sigmoid

function curves the output).

4. Learning Complex Patterns: Because of this non-linear

transformation, the network can learn complex patterns in the

data that wouldn’t be possible with just linear functions. Imagine

stacking multiple curved shapes instead of straight lines.

5. Beyond Linear Separation: This allows the network to move

beyond simply separating data linearly, like logistic regression. It

can learn intricate relationships between features in the data.

6. Foundation for Complex Tasks: By enabling the network to

represent complex features, non-linear activation functions

become the building blocks for neural networks to tackle tasks


like image recognition, natural language processing, and

more.pen_spark

Different types of activation functions


Comparison of Activation Functions
Function Range Use Case Advantages Disadvantages
Smooth, easy to
Sigmoid (0,1) Binary classification Vanishing gradient
understand
Tanh (−1,1) Binary classification Zero-centered Vanishing gradient
Hidden layers of deep Efficient, avoids
ReLU [0,∞) Dying ReLU problem
networks saturation
Leaky Hidden layers of deep Predefined slope (α\
(−∞,∞) Solves dying ReLU
ReLU networks alphaα)
Multi-class High computational
Softmax (0,1) Outputs probabilities
classification cost
Higher computational
Swish (−∞,∞) Complex deep networks Smooth, adaptive
cost
NLP tasks (e.g., Smooth, improves Computationally
GELU (−∞,∞)
Transformers) performance complex
IX) Feedforward Neural Networks

Feedforward Neural Networks (FNNs)

A Feedforward Neural Network (FNN) is the simplest type of artificial


neural network where information flows in one direction—from the input
layer, through the hidden layers (if any), to the output layer. There are no
loops or cycles in the network.

Key Components of FNN

1. Input Layer:
o Accepts the features of the dataset.
o Each neuron corresponds to a specific feature in the input.

2. Hidden Layers:
o One or more layers where neurons apply weights and biases
to the input and pass it through an activation function to
model non-linearity.
o These layers learn representations of the data.

3. Output Layer:
o Provides the final output of the network, such as a prediction
or classification.
o Uses appropriate activation functions:
 Sigmoid for binary classification.
 Softmax for multi-class classification.
 No activation or linear activation for regression.

4. Weights and Biases:


o Weights determine the importance of each input to a neuron.
o Bias shifts the activation function, providing additional
flexibility.

5. Activation Function:
o Introduces non-linearity to help the network learn complex
patterns.
o Common activation functions include ReLU, Sigmoid, Tanh,
and Softmax.

Working of a Feedforward Neural Network

1. Initialization:
o Weights and biases are initialized randomly or using
techniques like Xavier or He initialization.

2. Forward Propagation:
o Information flows from the input layer to the output layer.
o Each neuron performs the following computation
Where z is the weighted sum, a is the activated output, www are
weights, x are inputs, b is bias, and f is the activation function.

3. Output Calculation:

The output layer generates predictions based on the final layer's


activations.

4. Loss Function:

Calculates the error (loss) between the predicted output and the
actual target value.

Common loss functions:

o Mean Squared Error (MSE) for regression.


o Cross-Entropy Loss for classification.

5. Backpropagation:

The error is propagated backward through the network to update


weights and biases using optimization algorithms like Gradient
Descent.

6. Weight Update:

Weights are updated using the rule

7. Iteration:
Steps 2–6 are repeated for a fixed number of iterations (epochs) or
until convergence.

Applications of FNN

1. Regression:
o Predicting continuous values (e.g., house prices, stock prices).

2. Classification:
o Binary classification (e.g., spam detection).
o Multi-class classification (e.g., digit recognition).

3. Pattern Recognition:
o Image and speech recognition tasks.

4. Forecasting:
o Time series forecasting (e.g., weather prediction, demand
forecasting).

Advantages of FNN

1. Simple Architecture:
o Easy to implement and understand.

2. Flexibility:
o Can approximate any continuous function with sufficient
neurons and layers (Universal Approximation Theorem).

3. Versatility:
o Applicable to a wide range of supervised learning problems.

Limitations of FNN

1. Inefficiency with Complex Data:


o Struggles with high-dimensional or unstructured data like
images and text.

2. Overfitting:
o Can memorize training data if not regularized properly.

3. No Memory:
o Cannot handle sequential data, as there is no feedback or
memory mechanism (unlike Recurrent Neural Networks).

X) Backpropagation in Neural Networks

Backpropagation (short for Backward Propagation of Errors) is an


optimization algorithm used to train neural networks. It adjusts the
weights and biases of the network to minimize the error between the
predicted output and the actual target values. It uses the chain rule of
calculus to compute gradients and propagates the error backward from
the output layer to the input layer.

Key Steps in Backpropagation

1. Initialization:
o Randomly initialize the weights and biases of the network.

2. Forward Propagation:
o Compute the output of the network by passing inputs through
the layers using the following steps:
Advantages of Backpropagation

1. Efficient:
o Reduces the computational complexity of training deep
networks.
2. Widely Applicable:
oCan be used for a variety of network architectures and tasks.
3. Automatic Differentiation:
o Frameworks like TensorFlow and PyTorch automate gradient
computation.

Limitations

1. Vanishing Gradient Problem:


o Gradients can become very small in deep networks, slowing
learning.
2. Overfitting:
o May overfit the training data if the model is too complex or
lacks regularization.
3. Computationally Intensive:
o Training deep networks requires significant computational
resources.

XI) Chain Rule

The chain rule is a fundamental concept in calculus used to compute the


derivative of a composite function. It is crucial in the backpropagation
algorithm in neural networks, where it helps calculate gradients for
updating weights and biases.
Advantages of the Chain Rule

1. Handles Complex Functions:


o Allows computation of derivatives for nested functions, which
is critical in deep learning.
2. Scalable:
o Can be applied iteratively to compute derivatives for deep
neural networks with multiple layers.

XII) Loss Function in Deep Learning

A loss function in deep learning is a mathematical function that


measures the difference between the predicted output (y^\hat{y}y^) of a
model and the actual target value (yyy). It quantifies the error or "loss" in
the model’s predictions and serves as a guide for optimizing the model
during training.

The goal of training a neural network is to minimize the loss function,


thereby improving the model's accuracy and performance.

Key Concepts of Loss Function

1. Purpose:
o Quantify how well or poorly the model is performing.
o Provide feedback to update model parameters (weights and
biases) using optimization algorithms like Gradient Descent.

2. Optimization Objective:
o Minimize the loss function to reduce the error.
o The weights and biases of the network are adjusted iteratively
to achieve this goal.

3. Global vs. Local Minima:


o The optimizer seeks to find the global minimum of the loss
function, where the error is the smallest.

You might also like