Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
12 views45 pages

Lecture05 DeepLearningCNN Trang 2

The document discusses specialized convolutional layers aimed at improving efficiency and reducing computational costs in convolutional neural networks (CNNs). It covers various types of specialized convolutions, including depthwise, grouped, pointwise, and depthwise separable convolutions, highlighting their benefits and applications. Additionally, it reviews significant CNN architectures like LeNet-5, AlexNet, VGG-16, and GoogLeNet, emphasizing their innovations and impacts on the field of deep learning and computer vision.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views45 pages

Lecture05 DeepLearningCNN Trang 2

The document discusses specialized convolutional layers aimed at improving efficiency and reducing computational costs in convolutional neural networks (CNNs). It covers various types of specialized convolutions, including depthwise, grouped, pointwise, and depthwise separable convolutions, highlighting their benefits and applications. Additionally, it reviews significant CNN architectures like LeNet-5, AlexNet, VGG-16, and GoogLeNet, emphasizing their innovations and impacts on the field of deep learning and computer vision.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Specialized Convolutional Layers: Motivation

• Computational Cost of Standard Convolutions: Standard convolutions can be computationally


demanding, especially with many channels, large kernels, or high-resolution inputs.
• Need for Efficiency: Resource-constrained applications (e.g., mobile, embedded systems) require
more efficient alternatives.
• Benefits of Specialized Convolutions: These offer:
• Reduced computational cost (fewer parameters and FLOPs).
• Improved efficiency (faster inference/training).
• Potential performance gains.
• Types of Specialized Convolutions (covered next):
• Depthwise Convolution
• Grouped Convolution
• Pointwise Convolution (1x1)
• Depthwise Separable Convolution

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 40 / 84


Standard Convolution: Recap

• Input: H × W × Cin
• Kernel: K × K × Cin
• Number of filters: Cout
• Output: H ′ × W ′ × Cout
• Number of parameters: K × K × Cin × Cout
• FLOPs: H ′ × W ′ × K × K × Cin × Cout

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 41 / 84


Depthwise Convolution
• Applies a single filter to each input channel independently.
• Input: H × W × Cin
• Kernel: K × K × 1 (one filter per channel)
• Output: H ′ × W ′ × Cin (same number of channels as input)
• Number of parameters: K × K × Cin
• FLOPs: H ′ × W ′ × K × K × Cin
• Much more efficient than standard convolution, especially when Cout >> 1.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 42 / 84


Pointwise Convolution (1x1 Convolution)
• Input: H × W × Cin
• Kernel: 1 × 1 × Cin - 1x1 kernel to perform a linear combination of the input channels.
• Number of filters: Cout
• Output: H × W × Cout (spatial dimensions remain the same)
• Number of parameters: 1 × 1 × Cin × Cout = Cin × Cout
• FLOPs: H × W × Cin × Cout
• Used for:
• Reducing/increasing the number of channels.
• Adding non-linearity after depthwise convolution.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 43 / 84


Grouped Convolution
• Divides the input channels into G groups and applies standard conv independently within each
group (depthwise convolution is a special case of grouped convolution where G = Cin )
• Input: H × W × Cin
• Kernel: K × K × Cin
G
• Number of filters per group: Cout
G
• Output: H ′ × W ′ × Cout
• Number of parameters: K × K × Cin Cout Cout
G × G × G = K × K × Cin × G
• FLOPs: H ′ × W ′ × K × K × Cin × Cout
G

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 44 / 84


Depthwise Separable Convolution
• Combines depthwise and pointwise convolutions.
• First, a depthwise convolution is applied.
• Then, a pointwise convolution is used to combine the output channels.
• Significantly reduces computational cost compared to standard convolution.
• Used in MobileNet and Xception.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 45 / 84


Convolutional Layers: Animated Explanation

Groups, Depthwise, and Depthwise-Separable Convolution

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 46 / 84


Backbone CNN Models: Review

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 47 / 84


Introduction to LeNet-5
• Historical Significance: LeNet-5, developed by Yann LeCun et al. in the 1990s, is one of the
earliest and most influential Convolutional Neural Network (CNN) architectures.
• Purpose: Designed for handwritten and machine-printed character recognition (e.g., MNIST
dataset).
• Key Innovations: Introduced fundamental CNN concepts:
• Convolutional layers with learnable weights.
• Local receptive fields.
• Spatial subsampling (pooling).
• Shared weights (parameter sharing).

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 48 / 84


LeNet-5 Architecture

• LeNet-5 consists of seven layers (excluding the input):


• Input Layer: 32x32 grayscale image.
• Convolutional Layer C1: 6 5x5 filters, stride 1, no padding. Output: 28x28x6.
• Subsampling Layer S2 (Average Pooling): 2x2 pooling, stride 2. Output: 14x14x6.
• Convolutional Layer C3: 16 5x5 filters. Output: 10x10x16. Note: the connections between feature
maps in S2 and C3 are not fully connected in the original LeNet-5 paper.
• Subsampling Layer S4 (Average Pooling): 2x2 pooling, stride 2. Output: 5x5x16.
• Fully Connected Layer F5: 120 neurons.
• Fully Connected Layer F6: 84 neurons.
• Output Layer: 10 neurons (one for each digit 0-9) with RBF (Radial Basis Function) or Softmax
activation.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 49 / 84


Key Concepts and Impact

Key concepts:
• Convolutional Layers: Local receptive fields, feature extraction.
• Subsampling (Pooling): Reducing spatial resolution, increasing robustness to small shifts and distortions.
• Parameter Sharing: Reducing the number of parameters and improving generalization.
• Hierarchical Feature Learning: Lower layers detect simple features (edges, lines), higher layers detect more
complex features (combinations of edges, shapes).
Impacts:
• LeNet-5 laid the foundation for modern CNN architectures.
• Its key concepts are still used in many state-of-the-art models.
• It demonstrated the power of CNNs for image recognition and other tasks involving structured data.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 50 / 84


Introduction to AlexNet
• Revolutionary Impact: AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by a
significant margin, marking a turning point in DL for computer vision.
• Key Contributions:
• Deeper architecture than previous CNNs.
• Use of ReLU activation functions.
• Training on GPUs for faster training.
• Local response normalization (LRN).
• Overlapping pooling and data augmentation

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 51 / 84


AlexNet Architecture
AlexNet consists of eight layers (excluding the input):
• Input Layer: 227x227x3 RGB image.
• Convolutional Layer 1: 96 11x11 filters, stride 4, no padding. Output: 55x55x96.
• Max Pooling Layer 1: 3x3 pooling, stride 2. Output: 27x27x96.
• Convolutional Layer 2: 256 5x5 filters, stride 1, padding 2. Output: 27x27x256.
• Max Pooling Layer 2: 3x3 pooling, stride 2. Output: 13x13x256.
• Convolutional Layer 3: 384 3x3 filters, stride 1, padding 1. Output: 13x13x384.
• Convolutional Layer 4: 384 3x3 filters, stride 1, padding 1. Output: 13x13x384.
• Convolutional Layer 5: 256 3x3 filters, stride 1, padding 1. Output: 13x13x256.
• Max Pooling Layer 3: 3x3 pooling, stride 2. Output: 6x6x256.
• Fully Connected Layer 1: 4096 neurons.
• Fully Connected Layer 2: 4096 neurons.
• Output Layer: 1000 neurons (for 1000 ImageNet classes) with Softmax activation.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 52 / 84


Key Innovations and Impact

Key Innovations:
• ReLU Activations: Accelerated training by mitigating vanishing gradients.
• GPU Training: Enabled training of larger models on larger datasets.
• Local Response Normalization (LRN): Local channel normalization (minor impact).
• Overlapping Pooling: Reduced overfitting.
• Data Augmentation: Improved generalization by increasing training data diversity.
Impact:
• Deep Learning Resurgence in CV: Sparked renewed interest and rapid progress in deep learning for
computer vision.
• Foundation for Modern CNNs: Influenced many subsequent CNN architectures.
• Influence on Other Fields: Impacted other areas of deep learning like NLP and speech recognition.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 53 / 84


Introduction to VGG-16
• Visual Geometry Group (VGG): Developed by the VGG at the University of Oxford.
• Key Insight: Demonstrated the importance of network depth in achieving better performance in
image classification.
• Uniform Architecture: Used very small (3x3) convolutional filters throughout the entire network,
leading to a much deeper architecture than AlexNet.
• ILSVRC 2014: Achieved top performance in the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) 2014.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 54 / 84


VGG-16 Architecture
• Key Characteristics:
• Only 3x3 convolutional filters with stride 1 and padding 1 are used.
• 2x2 max pooling with stride 2 is used for downsampling.
• Multiple convolutional layers are stacked before each pooling layer.
• Layers (simplified): VGG-16 refers to 16 layers with weights (convolutional or fully connected):
• Input: 224x224x3 RGB image.
• Conv1 (2 layers): 64 filters. Output: 224x224x64
• Max Pool 1: Output: 112x112x64
• Conv2 (2 layers): 128 filters. Output: 112x112x128
• Max Pool 2: Output: 56x56x128
• Conv3 (3 layers): 256 filters. Output: 56x56x256
• Max Pool 3: Output: 28x28x256
• Conv4 (3 layers): 512 filters. Output: 28x28x512
• Max Pool 4: Output: 14x14x512
• Conv5 (3 layers): 512 filters. Output: 14x14x512
• Max Pool 5: Output: 7x7x512
• FC1: 4096 neurons
• FC2: 4096 neurons
• Output (FC3): 1000 neurons (ImageNet classes) with Softmax

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 55 / 84


VGG-16 vs VGG-19

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 56 / 84


Advantages of Small 3x3 Convolutions

• Deeper Network: Stacking multiple 3x3 convolutions allows for a deeper network, which
can learn more complex features.
• Reduced Number of Parameters: Two stacked 3x3 convolutions have the same
receptive field as one 5x5 convolution but with fewer parameters:
• One 5x5: 5 × 5 = 25 parameters
• Two 3x3: (3 × 3) + (3 × 3) = 18 parameters
• More Non-linearities: Stacking more layers increases the number of non-linear
activations (ReLU), which makes the network more expressive.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 57 / 84


Impact of VGG Networks

• Emphasis on Depth: Solidified the importance of network depth for achieving high
performance.
• Simple and Effective Design: The uniform architecture with small filters made VGG
networks easy to understand and implement.
• Transfer Learning: VGG models pretrained on ImageNet became widely used for transfer
learning in various computer vision tasks.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 58 / 84


Introduction to GoogLeNet

• ILSVRC 2014 Winner: GoogLeNet, developed by Google, won the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2014, achieving a significant improvement over previous
architectures (including VGG).
• Key Innovation: Inception Module: Introduced the Inception module, a novel building block
that significantly improved efficiency and performance.
• Depth and Efficiency: Achieved greater depth than previous networks while maintaining
manageable computational cost.
• Reduced Parameters: Significantly fewer parameters than AlexNet, making it more efficient and
less prone to overfitting.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 59 / 84


The Inception Module
• Motivation: To capture features at multiple scales simultaneously.
• Structure: Consists of parallel branches with different convolutional filter sizes (1x1, 3x3, 5x5)
and max pooling.
• 1x1 Convolutions: Used 1x1 convolutions for dimensionality reduction before the more expensive
3x3 and 5x5 convolutions, significantly reducing computational cost.
• Concatenation: The outputs of all branches are concatenated along the channel dimension.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 60 / 84


GoogLeNet Architecture
• Stacking Inception Modules: GoogLeNet consists of multiple Inception modules stacked on top
of each other.
• Auxiliary Classifiers: Included auxiliary classifiers at intermediate layers to improve gradient flow
during training and prevent vanishing gradients.
• No Fully Connected Layers at the End: Used Global Average Pooling (GAP) at the end
instead of fully connected layers, further reducing the number of parameters.
• Simplified Structure (Conceptual): Input - Several Convolutional Layers - Several
Convolutional Layers - Global Average Pooling - Softmax Output

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 61 / 84


Advantages of GoogLeNet

• Increased Depth and Width: The Inception module allows for increasing both the
depth and width of the network without a significant increase in computational cost.
• Computational Efficiency: Using 1x1 convolutions for dimensionality reduction
significantly reduces the number of parameters and FLOPs.
• Improved Performance: Achieved state-of-the-art performance on ImageNet with
significantly fewer parameters than previous models.
• Reduced Overfitting: The reduced number of parameters and the use of auxiliary
classifiers helped to reduce overfitting.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 62 / 84


Impact of GoogLeNet

• Shift Towards Efficient Architectures: Influenced the development of more efficient


CNN architectures.
• Inception Module as a Building Block: The Inception module became a popular
building block in many subsequent CNNs.
• Focus on Computational Cost: Highlighted the importance of considering
computational cost in deep learning model design.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 63 / 84


Introduction to Inception-v3

• Evolution of Inception: Inception-v3 is the third iteration of the Inception architecture,


building upon the ideas introduced in GoogLeNet (Inception-v1).
• Focus on Efficiency and Performance: Aimed to further improve both computational
efficiency and classification performance.
• Key Improvements: Introduced several architectural refinements:
• Factorization of larger convolutions into smaller ones.
• Asymmetric convolutions.
• Auxiliary classifiers with improved loss.
• Batch Normalization in auxiliary classifiers.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 64 / 84


Factorization of Convolutions
• Factorizing 5x5 Convolutions: A 5x5 convolution can be factorized into two consecutive 3x3
convolutions, reducing the number of parameters and computations:
• One 5x5: 5 × 5 = 25 parameters
• Two 3x3: (3 × 3) + (3 × 3) = 18 parameters
This increases depth, adding more non-linearities (ReLU activations) and thus increasing the
network’s expressiveness.
• Factorizing n × n Convolutions: More generally, any n × n convolution can be factorized into a
sequence of 1 × n and n × 1 convolutions. For example, a 3x3 convolution can be factorized into a
1x3 followed by a 3x1.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 65 / 84


Asymmetric Convolutions

• Further Factorization: Inception-v3 further factorizes convolutions by using asymmetric


convolutions, such as 1xn followed by nx1.
• Example: Instead of a 3x3 convolution, Inception-v3 uses a 1x3 convolution followed by a 3x1
convolution.
• Benefits: This further reduces the number of parameters and computations compared to using
two 3x3 convolutions.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 66 / 84


Improved Auxiliary Classifiers
• Purpose of Auxiliary Classifiers: To improve gradient flow during training, especially in very
deep networks, and prevent vanishing gradients.
• Improvements in v3: In Inception-v3, the auxiliary classifiers were improved by:
• Using batch normalization in the auxiliary classifiers.
• Using a different loss function (softmax cross-entropy) for the auxiliary classifiers.
• Contribution to Final Loss: The loss from the auxiliary classifiers is added to the main loss with
a smaller weight (e.g., 0.3).

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 67 / 84


Overall Impact of Inception-v3

• State-of-the-Art Performance: Achieved even better performance on ImageNet


compared to its predecessors.
• Emphasis on Efficient Design: Further emphasized the importance of efficient network
design.
• Influence on Subsequent Architectures: Influenced the design of many subsequent
CNN architectures by demonstrating the effectiveness of factorization and asymmetric
convolutions.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 68 / 84


Introduction to ResNet

• Challenge of Deep Networks: Training very deep neural networks was a major challenge due to
the vanishing gradient problem.
• Key Innovation: Residual Connections (Skip Connections): ResNet, introduced by He et al.,
addressed this problem with the concept of residual connections (also known as skip connections
or shortcuts).
• ILSVRC 2015 Winner: Achieved state-of-the-art results on ImageNet in 2015, surpassing
human-level performance on the classification task.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 69 / 84


The Vanishing Gradient Problem
• Gradient Propagation: During backpropagation, gradients are multiplied as they are passed
through multiple layers.
• Vanishing Gradients: In very deep networks, these repeated multiplications can cause the
gradients to become extremely small, effectively preventing the earlier layers from learning.
• Impact: This makes it difficult to train very deep networks effectively.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 70 / 84


Residual Connections
• Concept: Instead of directly learning a mapping H(x), ResNet learns a residual mapping
F (x) = H(x) − x, where x is the input to the layer.
• Residual Block: The output of a residual block is then H(x) = F (x) + x. The addition is
performed using element-wise addition.
• Identity Mapping: If the identity mapping is optimal, the network can easily learn it by setting
F (x) = 0.
• Gradient Flow: Residual connections provide a direct path for gradients to flow through,
mitigating the vanishing gradient problem.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 71 / 84


ResNet Architectures
• Different Depths: ResNet comes in various depths (e.g., ResNet-18, ResNet-34, ResNet-50,
ResNet-101, ResNet-152), with the number indicating the number of layers.
• Bottleneck Layers: Deeper ResNet architectures (e.g., ResNet-50 and above) use bottleneck
layers to reduce computational cost. A bottleneck layer consists of a 1x1 convolution, a 3x3
convolution, and another 1x1 convolution.
• Overall Structure (General):
1. Input Convolution and Pooling
2. Several Blocks of Residual Layers (repeated)
3. Global Average Pooling
4. Fully Connected Layer (for classification)

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 72 / 84


Benefits and Impact of ResNet

• Training Very Deep Networks: Enabled the training of significantly deeper networks
than previously possible.
• Improved Performance: Achieved state-of-the-art results on various computer vision
tasks.
• Foundation for Future Architectures: The concept of residual connections has become
a fundamental building block in many subsequent CNN architectures.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 73 / 84


MobileNet: Efficient Mobile-First CNNs

• Key Idea: Focuses on extreme computational


efficiency for mobile and embedded devices.
• Key Components:
• Depthwise Separable Convolutions:
Factorizes standard convolutions into
depthwise and pointwise convolutions to
significantly reduce computation.
• Width Multiplier: A hyperparameter to
control the number of channels, further
reducing computation.
• Resolution Multiplier: A hyperparameter to
control the input image resolution, also
impacting computation.
• Goal: Achieve a good balance between accuracy
and latency/model size.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 74 / 84


DenseNet: Dense Connections for Feature Reuse
• Idea: Maximizes information flow between layers by connecting each layer to all preceding layers.
• Dense Blocks: Each layer receives feature maps from all preceding layers as input and passes its
own feature maps to all subsequent layers.
• Benefits:
• Strong feature reuse, leading to more compact models.
• Mitigates vanishing gradient problem.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 75 / 84


SENet (Squeeze-and-Excitation Networks): Channel Attention

• Key Idea: Introduces channel-wise attention mechanisms to dynamically recalibrate channel-wise


feature responses.
• Key Component: Squeeze-and-Excitation (SE) Block:
• Squeeze: Global average pooling to obtain channel-wise statistics.
• Excitation: Two fully connected layers with a sigmoid activation to learn channel-wise weights.
• Scale: Element-wise multiplication of the channel weights with the original feature maps.

• Benefit: Improves feature discrimination by emphasizing important channels and suppressing less
important ones.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 76 / 84


ResNeXt: Aggregated Residual Transformations
• Key Idea: Extends ResNet by replicating multiple parallel paths (transformations) within each
residual block, aggregating their outputs.
• Key Component: Cardinality: The number of parallel paths, acting as a new dimension besides
depth and width.
• Benefit: Improves performance by exploring a richer set of transformations while maintaining
computational efficiency.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 77 / 84


Recent Cutting-Edge Models (Brief Overview)

• EfficientNet: Focuses on compound scaling of network width, depth, and resolution using a
principled approach.
• RegNet: Explores network design space using a population-based search to find optimal
architectures.
• Vision Transformers (ViT): Applies the Transformer architecture from NLP to image
classification, treating images as sequences of patches.
• ConvNeXt: A modern take on the classical ConvNet design inspired by the Transformer
architecture, showing the strong potential of carefully designed ConvNets.

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 78 / 84


Performance: Accuracy vs Complexity

A good neural network has a high accuracy and is fast.


Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 79 / 84
Python Code - Image Classification (Part 01)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import ResNet50V2 # Example: ResNet50V2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
import numpy as np
# Data paths (adjust these for your data)
train_dir = ’/content/drive/MyDrive/Colab Notebooks/final_project_dataset/training_set’
validation_dir = ’/content/drive/MyDrive/Colab Notebooks/final_project_dataset/test_set’
IMG_SIZE = (224, 224) # ResNet50V2 input size
# Data augmentation and preprocessing
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 80 / 84
Python Code - Image Classification (Part 02)
validation_datagen = ImageDataGenerator(rescale=1./255)

try:
# Attempt to create data generators
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=IMG_SIZE,
batch_size=32,
class_mode=’categorical’ # or ’binary’ if you have two classes
)

validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=IMG_SIZE,
batch_size=32,
class_mode=’categorical’ # or ’binary’ if you have two classes
)
except OSError as e:
print(f"Error creating data generators: {e}")
raise # Re-raise to stop execution on data generator errors

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 81 / 84


Python Code - Image Classification (Part 03)

# Load pre-trained model (ResNet50V2 in this example)


base_model = ResNet50V2(
weights=’imagenet’,
include_top=False, # Exclude the classification layer
input_shape=IMG_SIZE + (3,)
)

# Freeze the base model layers


base_model.trainable = False

# Add custom classification head


x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation=’relu’)(x) # Add a dense layer
predictions = Dense(train_generator.num_classes, activation=’softmax’)(x) # Output layer

model = Model(inputs=base_model.input, outputs=predictions)

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 82 / 84


Python Code - Image Classification (Part 04)
# Compile the model
model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[’accuracy’]) # Adjust loss

# Train the model


epochs = 10 # Adjust as needed
try:
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)
except Exception as e: # Catch any training errors
print(f"Error during training: {e}")

# Access training history (outside the try block)


acc = history.history[’accuracy’]
val_acc = history.history[’val_accuracy’]
loss = history.history[’loss’]
val_loss = history.history[’val_loss’]

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 83 / 84


Python Code - Image Classification (Part 05)

epochs_range = range(epochs)

plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label=’Training Accuracy’)
plt.plot(epochs_range, val_acc, label=’Validation Accuracy’)
plt.legend(loc=’lower right’)
plt.title(’Training and Validation Accuracy’)

plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label=’Training Loss’)
plt.plot(epochs_range, val_loss, label=’Validation Loss’)
plt.legend(loc=’upper right’)
plt.title(’Training and Validation Loss’)
plt.show()

# Save the model


model.save(’image_classifier_model.h5’)

Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 84 / 84

You might also like