21CS743 | DEEP LEARNING | SEARCH CREATORS.
Module-04
Convolutional Networks
Definition of Convolution
• Convolution: A mathematical operation that combines two functions (input signal/image
and filter/kernel) to produce a third function.
• Purpose: Captures important patterns and structures in the input data, crucial for tasks like
image recognition.
2. Mathematical Formulation
Search Creators... Page 1
21CS743 | DEEP LEARNING | SEARCH CREATORS.
3. Parameters of Convolution
a. Stride
• Definition: The number of pixels the filter moves over the input.
• Types:
o Stride of 1: Filter moves one pixel at a time, resulting in a detailed output.
o Stride of 2: Filter moves two pixels at a time, reducing output size (downsampling).
b.Padding
• Definition: Adding extra pixels around the input image.
• Types:
o Valid Padding: No padding applied; results in a smaller output feature map.
o Same Padding: Padding applied to maintain the same output dimensions as the
input.
4. Significance in Neural Networks
• Application: Used in convolutional layers of CNNs to extract features from images.
• Learning Hierarchical Representations: Stacked convolutional layers enable learning of
complex patterns, essential for image classification and other tasks.
Search Creators... Page 2
21CS743 | DEEP LEARNING | SEARCH CREATORS.
Purpose of Pooling
• Spatial Size Reduction: Decreases the dimensions of the feature maps.
• Parameter and Computation Reduction: Reduces the number of parameters and
computations in the network.
• Overfitting Control: Helps to control overfitting by providing a form of translational
invariance.
2. Types of Pooling
a. Max Pooling
• Definition: Selects the maximum value from each patch (sub-region) of the feature map.
• Purpose: Captures the most prominent features while reducing spatial dimensions.
b. Average Pooling
• Definition: Takes the average value from each patch of the feature map.
• Purpose: Provides a smooth representation of features, reducing sensitivity to noise.
Search Creators... Page 3
21CS743 | DEEP LEARNING | SEARCH CREATORS.
3. Operation of Pooling
4. Significance in Neural Networks
• Feature Extraction: Reduces the size of the feature maps while retaining the most relevant
features.
• Efficiency: Decreases computational load, allowing deeper networks to train faster.
• Robustness: Provides a degree of invariance to small translations in the input, making the
model more robust.
Search Creators... Page 4
21CS743 | DEEP LEARNING | SEARCH CREATORS.
1. Convolution as an Infinitely Strong Prior
• Focus on Local Patterns: Emphasizes the importance of local patterns in the data (e.g.,
edges and textures) over global patterns.
• Effectiveness in CNNs: This locality assumption enhances the effectiveness of
Convolutional Neural Networks (CNNs) for image and video analysis.
2. Pooling as an Infinitely Strong Prior
• Enhances Translational Invariance: Allows the network to recognize objects regardless
of their position within the image.
• Reduces Sensitivity to Position: By downsampling, pooling reduces sensitivity to the
exact location of features, improving generalization.
3. Significance in Neural Networks
• Feature Learning: Both operations prioritize local features, enabling efficient learning of
essential characteristics from input data.
• Improved Generalization: The combination of convolution and pooling enhances the
model's ability to generalize across various input variations.
Search Creators... Page 5
21CS743 | DEEP LEARNING | SEARCH CREATORS.
Variants of the Basic Convolution Function
1. Dilated Convolutions
• Definition: Introduces spacing (dilation) between kernel elements.
• Wider Context: Allows the model to incorporate a wider context of the input data without
significantly increasing the number of parameters.
• Applications: Useful in tasks where understanding broader spatial relationships is
important, such as in semantic segmentation.
2. Depthwise Separable Convolutions
• Two-Stage Process:
o Depthwise Convolution: Applies a separate convolution for each input channel,
reducing computational complexity.
o Pointwise Convolution: Uses 1x1 convolutions to combine the outputs from the
depthwise convolution.
• Parameter Efficiency: Reduces the number of parameters and computations compared to
standard convolutions while maintaining performance.
• Applications: Commonly used in lightweight models, such as MobileNets, for mobile and
edge devices.
Search Creators... Page 6
21CS743 | DEEP LEARNING | SEARCH CREATORS.
1. Definition of Structured Outputs
• Structured Outputs: Refers to tasks where the output has a specific structure or spatial
arrangement, such as pixel-wise predictions in image segmentation or keypoint localization
in object detection.
2. Importance in Semantic Segmentation
• Maintaining Spatial Structure: For tasks like semantic segmentation, it’s crucial to
maintain the spatial relationships between pixels in predictions to ensure that the output
accurately represents the original input image.
3. Specialized Networks
• Network Design: Specialized neural network architectures, such as Fully Convolutional
Networks (FCNs), are designed to handle structured outputs by replacing fully connected
layers with convolutional layers, allowing for spatially consistent predictions.
• Skip Connections: Techniques like skip connections (used in U-Net and ResNet) help
preserve high-resolution features from earlier layers, improving the accuracy of the output.
4. Adjusted Loss Functions
• Loss Function Modification: Loss functions may be adjusted to enforce structural
consistency in the predictions. Common approaches include:
o Pixel-wise Loss: Evaluating the loss on a per-pixel basis (e.g., Cross-Entropy Loss
for segmentation).
o Structural Loss: Incorporating penalties for structural deviations, such as Dice
Loss or Intersection over Union (IoU) metrics, which consider the overlap between
predicted and true regions.
Search Creators... Page 7
21CS743 | DEEP LEARNING | SEARCH CREATORS.
5. Applications
• Use Cases: Structured output networks are widely used in various applications, including:
o Semantic Segmentation: Assigning class labels to each pixel in an image.
o Instance Segmentation: Identifying and segmenting individual object instances
within an image.
o Object Detection: Predicting bounding boxes and class labels for objects in an
image while maintaining spatial relations.
Data Types
1. 2D Images
• Standard Input: The most common input type for CNNs, typically used in image
classification, object detection, and segmentation tasks.
• Format: Represented as height × width × channels (e.g., RGB images have three channels).
Search Creators... Page 8
21CS743 | DEEP LEARNING | SEARCH CREATORS.
2. 3D Data
• Definition: Includes video processing and volumetric data, such as those found in medical
imaging (e.g., MRI or CT scans).
• Format: Represented as depth × height × width × channels, allowing the network to
capture spatial and temporal information.
• Applications: Useful in tasks like action recognition in videos or analyzing 3D medical
images for diagnosis.
3. 1D Data
• Definition: Consists of sequential data, such as time-series data or audio signals.
• Format: Represented as sequences of data points, often one-dimensional.
• Applications: Used in tasks like speech recognition, audio classification, and analyzing
sensor data from IoT devices.
Efficient Convolution Algorithms
1. Fast Fourier Transform (FFT)
• Definition: A mathematical algorithm that computes the discrete Fourier transform (DFT)
and its inverse, converting signals between time (or spatial) domain and frequency domain.
• Convolution in Frequency Domain:
o Convolution in the time or spatial domain can be transformed into multiplication in
the frequency domain, which is often more computationally efficient for large
kernels.
Search Creators... Page 9
21CS743 | DEEP LEARNING | SEARCH CREATORS.
• Applications: Commonly used in applications requiring large kernel convolutions, such as
in image processing and signal analysis.
2. Winograd's Algorithms
• Definition: A set of algorithms designed to optimize convolution operations by reducing
the number of multiplications needed.
• Efficiency Improvement:
o Winograd's algorithms work by rearranging the computation of convolution to
minimize redundant calculations.
o They can reduce the complexity of convolution operations, particularly for small
kernels, making them more efficient in terms of computational resources.
• Key Concepts:
o The algorithms break down the convolution operation into smaller components,
allowing for fewer multiplicative operations and leveraging addition and
subtraction instead.
o They are particularly effective in scenarios where computational efficiency is
critical, such as mobile devices or real-time applications.
• Applications: Frequently used in lightweight models and resource-constrained
environments where computational power and memory usage are limited.
Search Creators... Page 10
21CS743 | DEEP LEARNING | SEARCH CREATORS.
1. Random Feature Maps
• Definition: A technique that uses random projections to map input data into a higher-
dimensional space, facilitating the extraction of features without the need for labels.
• Purpose: Helps to approximate kernel methods, enabling linear models to learn complex
functions.
• Advantages:
o Efficiency: Reduces the computational burden of traditional kernel methods while
retaining useful information.
o Scalability: Suitable for large datasets as it allows for faster training times.
• Applications: Commonly used in tasks where labeled data is scarce, such as clustering and
anomaly detection.
2. Autoencoders
• Definition: A type of neural network designed to learn efficient representations of data
through unsupervised learning by encoding the input into a lower-dimensional space and
then reconstructing it back.
• Structure:
o Encoder: Compresses the input data into a latent representation.
o Decoder: Reconstructs the original input from the latent representation.
• Purpose: Learns to capture important features and structures in the data without
supervision, making it effective for dimensionality reduction and feature extraction.
• Advantages:
o Robustness: Can learn from noisy data and still produce meaningful
representations.
Search Creators... Page 11
21CS743 | DEEP LEARNING | SEARCH CREATORS.
o Flexibility: Can be adapted for various tasks, including denoising, anomaly
detection, and generative modeling.
• Applications: Used in scenarios such as image compression, data denoising, and
generating new data samples.
3. Facilitation of Unsupervised Learning
• Role in Unsupervised Learning: Both methods enable the extraction of meaningful
features from unlabelled data, facilitating learning in scenarios where obtaining labeled
data is challenging or expensive.
• Enhancing Model Performance: By leveraging these techniques, models can improve
their performance on downstream tasks, such as clustering, classification, or regression,
even in the absence of labels.
Search Creators... Page 12
21CS743 | DEEP LEARNING | SEARCH CREATORS.
Notable Architectures
1. LeNet-5
• Introduction:
o Developed by Yann LeCun and colleagues in 1998.
o One of the first convolutional networks designed specifically for image recognition
tasks.
• Architecture Details:
o Input Layer: Takes in grayscale images of size 32x32 pixels.
o Convolutional Layer 1:
▪ 6 filters (5x5) with a stride of 1.
▪ Output size: 28x28x6.
o Activation Function: Sigmoid or hyperbolic tangent (tanh).
Search Creators... Page 13
21CS743 | DEEP LEARNING | SEARCH CREATORS.
o Pooling Layer 1:
▪ Average pooling (subsampling) with a 2x2 filter and a stride of 2.
▪ Output size: 14x14x6.
o Convolutional Layer 2:
▪ 16 filters (5x5).
▪ Output size: 10x10x16.
o Pooling Layer 2:
▪ Average pooling (2x2).
▪ Output size: 5x5x16.
o Fully Connected Layers:
▪ 120 neurons in the first layer.
▪ 84 neurons in the second layer.
▪ Output layer with 10 neurons (for digit classes 0-9).
• Significance:
o Introduced the concept of using convolutional layers for feature extraction followed
by pooling layers for dimensionality reduction.
o Paved the way for modern CNNs, influencing later architectures.
2. AlexNet
• Introduction:
o Developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012.
o Marked a breakthrough in deep learning by achieving top performance in the
ImageNet competition.
Search Creators... Page 14
21CS743 | DEEP LEARNING | SEARCH CREATORS.
• Architecture Details:
o Input Layer: Accepts images of size 224x224 pixels (RGB).
o Convolutional Layer 1:
▪ 96 filters (11x11) with a stride of 4.
▪ Output size: 55x55x96.
o Activation Function: ReLU, introduced to improve training speed.
o Pooling Layer 1:
▪ Max pooling (3x3) with a stride of 2.
▪ Output size: 27x27x96.
o Convolutional Layer 2:
▪ 256 filters (5x5).
▪ Output size: 27x27x256.
o Pooling Layer 2:
▪ Max pooling (3x3).
▪ Output size: 13x13x256.
o Convolutional Layer 3:
▪ 384 filters (3x3).
▪ Output size: 13x13x384.
o Convolutional Layer 4:
▪ 384 filters (3x3).
▪ Output size: 13x13x384.
Search Creators... Page 15
21CS743 | DEEP LEARNING | SEARCH CREATORS.
o Convolutional Layer 5:
▪ 256 filters (3x3).
▪ Output size: 13x13x256.
o Pooling Layer 3:
▪ Max pooling (3x3).
▪ Output size: 6x6x256.
o Fully Connected Layers:
▪ First layer with 4096 neurons.
▪ Second layer with 4096 neurons.
▪ Output layer with 1000 neurons (for 1000 classes).
• Innovative Techniques Introduced:
o ReLU Activation:
▪ Enabled faster convergence during training compared to traditional
activation functions like sigmoid or tanh.
o Dropout:
▪ Regularization method that randomly drops neurons during training to
prevent overfitting, significantly improving generalization.
o Data Augmentation:
▪ Used techniques like image rotation, translation, and flipping to artificially
expand the training dataset and improve robustness.
Search Creators... Page 16
21CS743 | DEEP LEARNING | SEARCH CREATORS.
o GPU Utilization:
▪ Leveraged parallel processing power of GPUs, enabling training on large
datasets in a reasonable timeframe.
• Significance:
o Established deep learning as a powerful approach for image classification and
sparked widespread research and development in CNN architectures.
o Highlighted the importance of large labeled datasets and robust training techniques
in achieving state-of-the-art performance.
Search Creators... Page 17