Specialized Convolutional Layers: Motivation
• Computational Cost of Standard Convolutions: Standard convolutions can be computationally
demanding, especially with many channels, large kernels, or high-resolution inputs.
• Need for Efficiency: Resource-constrained applications (e.g., mobile, embedded systems) require
more efficient alternatives.
• Benefits of Specialized Convolutions: These offer:
• Reduced computational cost (fewer parameters and FLOPs).
• Improved efficiency (faster inference/training).
• Potential performance gains.
• Types of Specialized Convolutions (covered next):
• Depthwise Convolution
• Grouped Convolution
• Pointwise Convolution (1x1)
• Depthwise Separable Convolution
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 40 / 84
Standard Convolution: Recap
• Input: H × W × Cin
• Kernel: K × K × Cin
• Number of filters: Cout
• Output: H ′ × W ′ × Cout
• Number of parameters: K × K × Cin × Cout
• FLOPs: H ′ × W ′ × K × K × Cin × Cout
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 41 / 84
Depthwise Convolution
• Applies a single filter to each input channel independently.
• Input: H × W × Cin
• Kernel: K × K × 1 (one filter per channel)
• Output: H ′ × W ′ × Cin (same number of channels as input)
• Number of parameters: K × K × Cin
• FLOPs: H ′ × W ′ × K × K × Cin
• Much more efficient than standard convolution, especially when Cout >> 1.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 42 / 84
Pointwise Convolution (1x1 Convolution)
• Input: H × W × Cin
• Kernel: 1 × 1 × Cin - 1x1 kernel to perform a linear combination of the input channels.
• Number of filters: Cout
• Output: H × W × Cout (spatial dimensions remain the same)
• Number of parameters: 1 × 1 × Cin × Cout = Cin × Cout
• FLOPs: H × W × Cin × Cout
• Used for:
• Reducing/increasing the number of channels.
• Adding non-linearity after depthwise convolution.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 43 / 84
Grouped Convolution
• Divides the input channels into G groups and applies standard conv independently within each
group (depthwise convolution is a special case of grouped convolution where G = Cin )
• Input: H × W × Cin
• Kernel: K × K × Cin
G
• Number of filters per group: Cout
G
• Output: H ′ × W ′ × Cout
• Number of parameters: K × K × Cin Cout Cout
G × G × G = K × K × Cin × G
• FLOPs: H ′ × W ′ × K × K × Cin × Cout
G
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 44 / 84
Depthwise Separable Convolution
• Combines depthwise and pointwise convolutions.
• First, a depthwise convolution is applied.
• Then, a pointwise convolution is used to combine the output channels.
• Significantly reduces computational cost compared to standard convolution.
• Used in MobileNet and Xception.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 45 / 84
Convolutional Layers: Animated Explanation
Groups, Depthwise, and Depthwise-Separable Convolution
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 46 / 84
Backbone CNN Models: Review
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 47 / 84
Introduction to LeNet-5
• Historical Significance: LeNet-5, developed by Yann LeCun et al. in the 1990s, is one of the
earliest and most influential Convolutional Neural Network (CNN) architectures.
• Purpose: Designed for handwritten and machine-printed character recognition (e.g., MNIST
dataset).
• Key Innovations: Introduced fundamental CNN concepts:
• Convolutional layers with learnable weights.
• Local receptive fields.
• Spatial subsampling (pooling).
• Shared weights (parameter sharing).
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 48 / 84
LeNet-5 Architecture
• LeNet-5 consists of seven layers (excluding the input):
• Input Layer: 32x32 grayscale image.
• Convolutional Layer C1: 6 5x5 filters, stride 1, no padding. Output: 28x28x6.
• Subsampling Layer S2 (Average Pooling): 2x2 pooling, stride 2. Output: 14x14x6.
• Convolutional Layer C3: 16 5x5 filters. Output: 10x10x16. Note: the connections between feature
maps in S2 and C3 are not fully connected in the original LeNet-5 paper.
• Subsampling Layer S4 (Average Pooling): 2x2 pooling, stride 2. Output: 5x5x16.
• Fully Connected Layer F5: 120 neurons.
• Fully Connected Layer F6: 84 neurons.
• Output Layer: 10 neurons (one for each digit 0-9) with RBF (Radial Basis Function) or Softmax
activation.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 49 / 84
Key Concepts and Impact
Key concepts:
• Convolutional Layers: Local receptive fields, feature extraction.
• Subsampling (Pooling): Reducing spatial resolution, increasing robustness to small shifts and distortions.
• Parameter Sharing: Reducing the number of parameters and improving generalization.
• Hierarchical Feature Learning: Lower layers detect simple features (edges, lines), higher layers detect more
complex features (combinations of edges, shapes).
Impacts:
• LeNet-5 laid the foundation for modern CNN architectures.
• Its key concepts are still used in many state-of-the-art models.
• It demonstrated the power of CNNs for image recognition and other tasks involving structured data.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 50 / 84
Introduction to AlexNet
• Revolutionary Impact: AlexNet, developed by Alex Krizhevsky, Ilya Sutskever, and Geoffrey
Hinton, won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by a
significant margin, marking a turning point in DL for computer vision.
• Key Contributions:
• Deeper architecture than previous CNNs.
• Use of ReLU activation functions.
• Training on GPUs for faster training.
• Local response normalization (LRN).
• Overlapping pooling and data augmentation
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 51 / 84
AlexNet Architecture
AlexNet consists of eight layers (excluding the input):
• Input Layer: 227x227x3 RGB image.
• Convolutional Layer 1: 96 11x11 filters, stride 4, no padding. Output: 55x55x96.
• Max Pooling Layer 1: 3x3 pooling, stride 2. Output: 27x27x96.
• Convolutional Layer 2: 256 5x5 filters, stride 1, padding 2. Output: 27x27x256.
• Max Pooling Layer 2: 3x3 pooling, stride 2. Output: 13x13x256.
• Convolutional Layer 3: 384 3x3 filters, stride 1, padding 1. Output: 13x13x384.
• Convolutional Layer 4: 384 3x3 filters, stride 1, padding 1. Output: 13x13x384.
• Convolutional Layer 5: 256 3x3 filters, stride 1, padding 1. Output: 13x13x256.
• Max Pooling Layer 3: 3x3 pooling, stride 2. Output: 6x6x256.
• Fully Connected Layer 1: 4096 neurons.
• Fully Connected Layer 2: 4096 neurons.
• Output Layer: 1000 neurons (for 1000 ImageNet classes) with Softmax activation.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 52 / 84
Key Innovations and Impact
Key Innovations:
• ReLU Activations: Accelerated training by mitigating vanishing gradients.
• GPU Training: Enabled training of larger models on larger datasets.
• Local Response Normalization (LRN): Local channel normalization (minor impact).
• Overlapping Pooling: Reduced overfitting.
• Data Augmentation: Improved generalization by increasing training data diversity.
Impact:
• Deep Learning Resurgence in CV: Sparked renewed interest and rapid progress in deep learning for
computer vision.
• Foundation for Modern CNNs: Influenced many subsequent CNN architectures.
• Influence on Other Fields: Impacted other areas of deep learning like NLP and speech recognition.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 53 / 84
Introduction to VGG-16
• Visual Geometry Group (VGG): Developed by the VGG at the University of Oxford.
• Key Insight: Demonstrated the importance of network depth in achieving better performance in
image classification.
• Uniform Architecture: Used very small (3x3) convolutional filters throughout the entire network,
leading to a much deeper architecture than AlexNet.
• ILSVRC 2014: Achieved top performance in the ImageNet Large Scale Visual Recognition
Challenge (ILSVRC) 2014.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 54 / 84
VGG-16 Architecture
• Key Characteristics:
• Only 3x3 convolutional filters with stride 1 and padding 1 are used.
• 2x2 max pooling with stride 2 is used for downsampling.
• Multiple convolutional layers are stacked before each pooling layer.
• Layers (simplified): VGG-16 refers to 16 layers with weights (convolutional or fully connected):
• Input: 224x224x3 RGB image.
• Conv1 (2 layers): 64 filters. Output: 224x224x64
• Max Pool 1: Output: 112x112x64
• Conv2 (2 layers): 128 filters. Output: 112x112x128
• Max Pool 2: Output: 56x56x128
• Conv3 (3 layers): 256 filters. Output: 56x56x256
• Max Pool 3: Output: 28x28x256
• Conv4 (3 layers): 512 filters. Output: 28x28x512
• Max Pool 4: Output: 14x14x512
• Conv5 (3 layers): 512 filters. Output: 14x14x512
• Max Pool 5: Output: 7x7x512
• FC1: 4096 neurons
• FC2: 4096 neurons
• Output (FC3): 1000 neurons (ImageNet classes) with Softmax
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 55 / 84
VGG-16 vs VGG-19
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 56 / 84
Advantages of Small 3x3 Convolutions
• Deeper Network: Stacking multiple 3x3 convolutions allows for a deeper network, which
can learn more complex features.
• Reduced Number of Parameters: Two stacked 3x3 convolutions have the same
receptive field as one 5x5 convolution but with fewer parameters:
• One 5x5: 5 × 5 = 25 parameters
• Two 3x3: (3 × 3) + (3 × 3) = 18 parameters
• More Non-linearities: Stacking more layers increases the number of non-linear
activations (ReLU), which makes the network more expressive.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 57 / 84
Impact of VGG Networks
• Emphasis on Depth: Solidified the importance of network depth for achieving high
performance.
• Simple and Effective Design: The uniform architecture with small filters made VGG
networks easy to understand and implement.
• Transfer Learning: VGG models pretrained on ImageNet became widely used for transfer
learning in various computer vision tasks.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 58 / 84
Introduction to GoogLeNet
• ILSVRC 2014 Winner: GoogLeNet, developed by Google, won the ImageNet Large Scale Visual
Recognition Challenge (ILSVRC) 2014, achieving a significant improvement over previous
architectures (including VGG).
• Key Innovation: Inception Module: Introduced the Inception module, a novel building block
that significantly improved efficiency and performance.
• Depth and Efficiency: Achieved greater depth than previous networks while maintaining
manageable computational cost.
• Reduced Parameters: Significantly fewer parameters than AlexNet, making it more efficient and
less prone to overfitting.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 59 / 84
The Inception Module
• Motivation: To capture features at multiple scales simultaneously.
• Structure: Consists of parallel branches with different convolutional filter sizes (1x1, 3x3, 5x5)
and max pooling.
• 1x1 Convolutions: Used 1x1 convolutions for dimensionality reduction before the more expensive
3x3 and 5x5 convolutions, significantly reducing computational cost.
• Concatenation: The outputs of all branches are concatenated along the channel dimension.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 60 / 84
GoogLeNet Architecture
• Stacking Inception Modules: GoogLeNet consists of multiple Inception modules stacked on top
of each other.
• Auxiliary Classifiers: Included auxiliary classifiers at intermediate layers to improve gradient flow
during training and prevent vanishing gradients.
• No Fully Connected Layers at the End: Used Global Average Pooling (GAP) at the end
instead of fully connected layers, further reducing the number of parameters.
• Simplified Structure (Conceptual): Input - Several Convolutional Layers - Several
Convolutional Layers - Global Average Pooling - Softmax Output
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 61 / 84
Advantages of GoogLeNet
• Increased Depth and Width: The Inception module allows for increasing both the
depth and width of the network without a significant increase in computational cost.
• Computational Efficiency: Using 1x1 convolutions for dimensionality reduction
significantly reduces the number of parameters and FLOPs.
• Improved Performance: Achieved state-of-the-art performance on ImageNet with
significantly fewer parameters than previous models.
• Reduced Overfitting: The reduced number of parameters and the use of auxiliary
classifiers helped to reduce overfitting.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 62 / 84
Impact of GoogLeNet
• Shift Towards Efficient Architectures: Influenced the development of more efficient
CNN architectures.
• Inception Module as a Building Block: The Inception module became a popular
building block in many subsequent CNNs.
• Focus on Computational Cost: Highlighted the importance of considering
computational cost in deep learning model design.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 63 / 84
Introduction to Inception-v3
• Evolution of Inception: Inception-v3 is the third iteration of the Inception architecture,
building upon the ideas introduced in GoogLeNet (Inception-v1).
• Focus on Efficiency and Performance: Aimed to further improve both computational
efficiency and classification performance.
• Key Improvements: Introduced several architectural refinements:
• Factorization of larger convolutions into smaller ones.
• Asymmetric convolutions.
• Auxiliary classifiers with improved loss.
• Batch Normalization in auxiliary classifiers.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 64 / 84
Factorization of Convolutions
• Factorizing 5x5 Convolutions: A 5x5 convolution can be factorized into two consecutive 3x3
convolutions, reducing the number of parameters and computations:
• One 5x5: 5 × 5 = 25 parameters
• Two 3x3: (3 × 3) + (3 × 3) = 18 parameters
This increases depth, adding more non-linearities (ReLU activations) and thus increasing the
network’s expressiveness.
• Factorizing n × n Convolutions: More generally, any n × n convolution can be factorized into a
sequence of 1 × n and n × 1 convolutions. For example, a 3x3 convolution can be factorized into a
1x3 followed by a 3x1.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 65 / 84
Asymmetric Convolutions
• Further Factorization: Inception-v3 further factorizes convolutions by using asymmetric
convolutions, such as 1xn followed by nx1.
• Example: Instead of a 3x3 convolution, Inception-v3 uses a 1x3 convolution followed by a 3x1
convolution.
• Benefits: This further reduces the number of parameters and computations compared to using
two 3x3 convolutions.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 66 / 84
Improved Auxiliary Classifiers
• Purpose of Auxiliary Classifiers: To improve gradient flow during training, especially in very
deep networks, and prevent vanishing gradients.
• Improvements in v3: In Inception-v3, the auxiliary classifiers were improved by:
• Using batch normalization in the auxiliary classifiers.
• Using a different loss function (softmax cross-entropy) for the auxiliary classifiers.
• Contribution to Final Loss: The loss from the auxiliary classifiers is added to the main loss with
a smaller weight (e.g., 0.3).
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 67 / 84
Overall Impact of Inception-v3
• State-of-the-Art Performance: Achieved even better performance on ImageNet
compared to its predecessors.
• Emphasis on Efficient Design: Further emphasized the importance of efficient network
design.
• Influence on Subsequent Architectures: Influenced the design of many subsequent
CNN architectures by demonstrating the effectiveness of factorization and asymmetric
convolutions.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 68 / 84
Introduction to ResNet
• Challenge of Deep Networks: Training very deep neural networks was a major challenge due to
the vanishing gradient problem.
• Key Innovation: Residual Connections (Skip Connections): ResNet, introduced by He et al.,
addressed this problem with the concept of residual connections (also known as skip connections
or shortcuts).
• ILSVRC 2015 Winner: Achieved state-of-the-art results on ImageNet in 2015, surpassing
human-level performance on the classification task.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 69 / 84
The Vanishing Gradient Problem
• Gradient Propagation: During backpropagation, gradients are multiplied as they are passed
through multiple layers.
• Vanishing Gradients: In very deep networks, these repeated multiplications can cause the
gradients to become extremely small, effectively preventing the earlier layers from learning.
• Impact: This makes it difficult to train very deep networks effectively.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 70 / 84
Residual Connections
• Concept: Instead of directly learning a mapping H(x), ResNet learns a residual mapping
F (x) = H(x) − x, where x is the input to the layer.
• Residual Block: The output of a residual block is then H(x) = F (x) + x. The addition is
performed using element-wise addition.
• Identity Mapping: If the identity mapping is optimal, the network can easily learn it by setting
F (x) = 0.
• Gradient Flow: Residual connections provide a direct path for gradients to flow through,
mitigating the vanishing gradient problem.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 71 / 84
ResNet Architectures
• Different Depths: ResNet comes in various depths (e.g., ResNet-18, ResNet-34, ResNet-50,
ResNet-101, ResNet-152), with the number indicating the number of layers.
• Bottleneck Layers: Deeper ResNet architectures (e.g., ResNet-50 and above) use bottleneck
layers to reduce computational cost. A bottleneck layer consists of a 1x1 convolution, a 3x3
convolution, and another 1x1 convolution.
• Overall Structure (General):
1. Input Convolution and Pooling
2. Several Blocks of Residual Layers (repeated)
3. Global Average Pooling
4. Fully Connected Layer (for classification)
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 72 / 84
Benefits and Impact of ResNet
• Training Very Deep Networks: Enabled the training of significantly deeper networks
than previously possible.
• Improved Performance: Achieved state-of-the-art results on various computer vision
tasks.
• Foundation for Future Architectures: The concept of residual connections has become
a fundamental building block in many subsequent CNN architectures.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 73 / 84
MobileNet: Efficient Mobile-First CNNs
• Key Idea: Focuses on extreme computational
efficiency for mobile and embedded devices.
• Key Components:
• Depthwise Separable Convolutions:
Factorizes standard convolutions into
depthwise and pointwise convolutions to
significantly reduce computation.
• Width Multiplier: A hyperparameter to
control the number of channels, further
reducing computation.
• Resolution Multiplier: A hyperparameter to
control the input image resolution, also
impacting computation.
• Goal: Achieve a good balance between accuracy
and latency/model size.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 74 / 84
DenseNet: Dense Connections for Feature Reuse
• Idea: Maximizes information flow between layers by connecting each layer to all preceding layers.
• Dense Blocks: Each layer receives feature maps from all preceding layers as input and passes its
own feature maps to all subsequent layers.
• Benefits:
• Strong feature reuse, leading to more compact models.
• Mitigates vanishing gradient problem.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 75 / 84
SENet (Squeeze-and-Excitation Networks): Channel Attention
• Key Idea: Introduces channel-wise attention mechanisms to dynamically recalibrate channel-wise
feature responses.
• Key Component: Squeeze-and-Excitation (SE) Block:
• Squeeze: Global average pooling to obtain channel-wise statistics.
• Excitation: Two fully connected layers with a sigmoid activation to learn channel-wise weights.
• Scale: Element-wise multiplication of the channel weights with the original feature maps.
• Benefit: Improves feature discrimination by emphasizing important channels and suppressing less
important ones.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 76 / 84
ResNeXt: Aggregated Residual Transformations
• Key Idea: Extends ResNet by replicating multiple parallel paths (transformations) within each
residual block, aggregating their outputs.
• Key Component: Cardinality: The number of parallel paths, acting as a new dimension besides
depth and width.
• Benefit: Improves performance by exploring a richer set of transformations while maintaining
computational efficiency.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 77 / 84
Recent Cutting-Edge Models (Brief Overview)
• EfficientNet: Focuses on compound scaling of network width, depth, and resolution using a
principled approach.
• RegNet: Explores network design space using a population-based search to find optimal
architectures.
• Vision Transformers (ViT): Applies the Transformer architecture from NLP to image
classification, treating images as sequences of patches.
• ConvNeXt: A modern take on the classical ConvNet design inspired by the Transformer
architecture, showing the strong potential of carefully designed ConvNets.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 78 / 84
Performance: Accuracy vs Complexity
A good neural network has a high accuracy and is fast.
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 79 / 84
Python Code - Image Classification (Part 01)
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.applications import ResNet50V2 # Example: ResNet50V2
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model
import matplotlib.pyplot as plt
import numpy as np
# Data paths (adjust these for your data)
train_dir = ’/content/drive/MyDrive/Colab Notebooks/final_project_dataset/training_set’
validation_dir = ’/content/drive/MyDrive/Colab Notebooks/final_project_dataset/test_set’
IMG_SIZE = (224, 224) # ResNet50V2 input size
# Data augmentation and preprocessing
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True
)
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 80 / 84
Python Code - Image Classification (Part 02)
validation_datagen = ImageDataGenerator(rescale=1./255)
try:
# Attempt to create data generators
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=IMG_SIZE,
batch_size=32,
class_mode=’categorical’ # or ’binary’ if you have two classes
)
validation_generator = validation_datagen.flow_from_directory(
validation_dir,
target_size=IMG_SIZE,
batch_size=32,
class_mode=’categorical’ # or ’binary’ if you have two classes
)
except OSError as e:
print(f"Error creating data generators: {e}")
raise # Re-raise to stop execution on data generator errors
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 81 / 84
Python Code - Image Classification (Part 03)
# Load pre-trained model (ResNet50V2 in this example)
base_model = ResNet50V2(
weights=’imagenet’,
include_top=False, # Exclude the classification layer
input_shape=IMG_SIZE + (3,)
)
# Freeze the base model layers
base_model.trainable = False
# Add custom classification head
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation=’relu’)(x) # Add a dense layer
predictions = Dense(train_generator.num_classes, activation=’softmax’)(x) # Output layer
model = Model(inputs=base_model.input, outputs=predictions)
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 82 / 84
Python Code - Image Classification (Part 04)
# Compile the model
model.compile(optimizer=’adam’, loss=’categorical_crossentropy’, metrics=[’accuracy’]) # Adjust loss
# Train the model
epochs = 10 # Adjust as needed
try:
history = model.fit(
train_generator,
steps_per_epoch=train_generator.samples // train_generator.batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=validation_generator.samples // validation_generator.batch_size
)
except Exception as e: # Catch any training errors
print(f"Error during training: {e}")
# Access training history (outside the try block)
acc = history.history[’accuracy’]
val_acc = history.history[’val_accuracy’]
loss = history.history[’loss’]
val_loss = history.history[’val_loss’]
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 83 / 84
Python Code - Image Classification (Part 05)
epochs_range = range(epochs)
plt.figure(figsize=(15, 5))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label=’Training Accuracy’)
plt.plot(epochs_range, val_acc, label=’Validation Accuracy’)
plt.legend(loc=’lower right’)
plt.title(’Training and Validation Accuracy’)
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label=’Training Loss’)
plt.plot(epochs_range, val_loss, label=’Validation Loss’)
plt.legend(loc=’upper right’)
plt.title(’Training and Validation Loss’)
plt.show()
# Save the model
model.save(’image_classifier_model.h5’)
Thien Huynh-The - HCMUTE Convolutional Neural Networks February 10, 2025 84 / 84