Final Report Minorproject
Final Report Minorproject
CHAPTER-1
INTRODUCTION
1.1 Corrosion is a natural and inevitable process that occurs when materials,
particularly metals, interact with their environment, leading to their gradual degradation.
In the maritime industry, corrosion poses a critical challenge due to the harsh and highly
corrosive nature of the marine environment. Factors such as saltwater, high humidity,
and fluctuating temperatures accelerate the deterioration of ships and marine structures.
This degradation compromises their structural integrity, safety, and operational
efficiency, resulting in increased maintenance costs and potential safety hazards.
Corrosion not only affects the lifespan of ships but also has far-reaching implications for
the maritime sector, including economic losses and environmental risks. The marine
environment is uniquely aggressive, with saltwater acting as a highly conductive
electrolyte that facilitates various types of corrosion. Ships are constantly exposed to
seawater spray, immersion, and atmospheric moisture, creating ideal conditions for
corrosion to thrive. Temperature variations and the presence of dissolved oxygen and
other corrosive agents exacerbate the problem. Corrosion can manifest in several forms,
each with distinct causes and effects. Understanding these types is essential to
developing effective strategies to mitigate their impact.
1.2 The primary cause of corrosion in ships is the exposure of metal surfaces,
particularly steel, to the marine environment. The high salinity of seawater accelerates
electrochemical reactions, leading to the breakdown of metallic bonds. Additionally,
oxygen dissolved in water promotes oxidation, while temperature variations and
microbial activity further exacerbate the process. Poor maintenance practices, improper
coatings, and the presence of dissimilar metals can also contribute to accelerated
corrosion.
the ship and its crew but also poses significant environmental risks, such as oil spills
or the release of hazardous substances, which can have devastating ecological
consequences. Beyond structural concerns, corrosion negatively impacts the
performance and functionality of critical components, such as propellers, rudders,
and cargo tanks. Damage to these elements can reduce the vessel's
maneuverability, impair propulsion systems, and lead to inefficiencies in cargo
handling and storage. For example, corroded cargo tanks may become prone to
leaks or contamination, jeopardizing the quality of goods being transported.
Similarly, corroded rudders and propellers can diminish the ship's speed and fuel
efficiency, increasing operational costs and environmental emissions.
Mitigation Strategies.
Conclusion.
CHAPTER-2
2.1 Over the years, there has been an increased level of concern from both the
public and regulatory authorities over the integrity and reliability of various corrosion
detection methods. This is important because corrosion and its effects are well
documented and they have been known to affect various industries and institutions such
as manufacturing, transportation, and healthcare. Various methods and technologies
have been developed and are currently being used in the detection and assessment of
corrosion in a wide range of equipment and structures. Some of these methods have
been used for many years and in many cases, they have become the "traditional" option
and set the standard to which newer technologies are compared. Studies are ongoing
to validate and optimize these traditional methods and similarly, there is ongoing
research in new techniques and technologies. Broadly speaking, corrosion detection
methods are either "passive", relying on the corrosion process to manifest itself as a
visible or measurable effect, such as increased mass loss, or "active", where a physical
or chemical process is initiated, and the subsequent response is measured. Visual
inspection, electrochemical techniques, ultrasonic testing, and X-ray radiography are all
examples of "active" methods, and these are brough out in the report. The requirement
for the assessment of corrosion in safety critical components and structures is the prime
driver in the development and application of new and improved detection technologies.
This is reflected in the increasing trend towards alternatives to the more commonly used
traditional methods, which often require the removal of material coatings and/or access
to both sides of the structure to conduct tests. The development of non-invasive,
efficient, and cost-effective technologies will be necessary for the implementation of any
new strategy for assessments of both new and existing materials and structures.
Traditional Methods.
techniques.
Non-Destructive Testing.
2.6 Non-destructive testing (NDT) techniques allow for evaluating the integrity of
materials and detecting corrosion damage without causing harm to the component.
These methods are particularly valuable for assessing critical structures where
physical testing or sample removal is impractical.
2.10 Eddy Current Testing (ECT). Eddy current testing relies on electromagnetic
induction to detect surface and near-surface defects in conductive materials. An
alternating current is passed through a coil, creating a magnetic field that interacts
with the material. Variations in the induced eddy currents reveal defects and
changes in material thickness. ECT is widely used for inspecting heat exchanger
tubes, aircraft components, and marine structures due to its precision and ability to
detect corrosion in inaccessible areas.
Table 2.1: Advantages and Limitations of Conventional and Computer Vision Based
Methods
Conclusion.
enabling more efficient and accurate corrosion detection. Despite their potential, these
methods face challenges, including computational demands, data variability, and the
need for annotated datasets to train models.
CHAPTER-3
COMPUTER VISION
3.1 Computer vision is a field of artificial intelligence that trains computers to interpret
and understand the visual world. Using digital images from cameras and videos and
deep learning models, machines can accurately identify and classify objects and then
react to what they "see." This technology is not only limited to detecting the objects of
the image, but it can also extract the text and other relevant information from the images.
This ability makes computer vision truly interdisciplinary. It is a mix of multiple fields like
image processing, machine learning, data science, physics, etc. The goal of computer
vision is to replicate the complexity of human vision. Whenever we "see" something, our
brain performs an enormous amount of computational work to interpret that visual input
and provide us with a solution or result; so, does the computer vision. Therefore,
computer vision not only continues to influence and shape medical and biological
research, but the corrosion detection field too. These include better and more flexible
methodologies for image analysis, from feature detection to neural networks; an
increasing prevalence of machine learning, which is both facilitating and enhanced by
the acquisition of large datasets and better approaches to the 3D surface restoration
from 2D images.[11]
3.1.4 Image Formation. When light from the scene enters the imaging
system through the lens, it forms an image on the sensor or film plane.
Factors such as exposure time, aperture size, and ISO sensitivity determine
the amount of light captured and the resulting image quality.
3.1.8 Image Formats. Digital images are typically stored in various file
formats such as JPEG, PNG, TIFF, or RAW. Each format has its own
compression algorithm, metadata, and compatibility with different
applications and platforms.
3.5 Deep learning is a specific subfield of machine learning and a type of artificial
intelligence. It aims at learning data representations using neural networks. It is called
"deep" learning because it makes use of a multi-layered neural network. In today's
context, deep learning represents a powerful set of techniques and concepts that have
seen its practical success only in the past decade or so. This is primarily due to the large
amount of data that we have in today's world and the rapid advancement of computational
resources. In general, deep learning algorithms and methods are used to learn from data.
We do not input task-specific "rules" to the system - like in traditional machine learning
algorithms. This makes the data/information that we input a key factor in influencing the
output of the system, rather than just the algorithm itself. Because it is such a flexible and
adaptive system based on the data, deep learning is widely used in a multitude of fields.
For example, it has had state-of-the-art performance in tasks such as image recognition,
natural language processing, and game playing. Deep learning, other than performing
well in these tasks by itself, is also used to advance other machine learning methods like
Computer Vision. Computer vision and deep learning are closely intertwined fields that
have seen significant advancements in recent years, revolutionizing the way machines
perceive and understand visual information. Deep learning has revolutionized computer
vision by providing powerful tools and techniques for solving complex visual recognition
tasks. Convolutional Neural Networks (CNNs) are a type of deep learning architecture
that has become the backbone of many state-of-the-art computer vision systems.[11]
3.7 Convolutional layers are the building blocks of CNNs and are responsible for
extracting features from the input data. Each convolutional layer consists of a set of
learnable filters (also called kernels) that slide over the input image, performing
convolution operations to extract spatial patterns and features. These filters capture low-
level features such as edges, textures, and gradients, which are then combined and
refined in subsequent layers to capture higher-level features.
3.8 Pooling layers are typically inserted between consecutive convolutional layers
to reduce the spatial dimensions of the feature maps while retaining important
information. Common pooling operations include max pooling and average pooling, which
down sample the feature maps by taking the maximum or average value within each
pooling window. Pooling helps to reduce computational complexity, increase translation
invariance, and improve the network's ability to learn hierarchical features.
3.9 Fully connected (or dense) layers are typically placed at the end of the CNN
architecture and are responsible for mapping the extracted features to the output classes
or labels. Each neuron in a fully connected layer is connected to every neuron in the
previous layer, allowing the network to learn complex mappings between features and
class labels.
3.10 CNNs are trained using a process called backpropagation, where gradients
of the loss function for the network parameters are computed and used to update the
weights of the network using optimization algorithms. During training, CNNs learn to
automatically extract discriminative features from the input data through the iterative
process of forward propagation (computing predictions) and backward propagation
(updating weights).
3.12 Deep learning has had a profound impact on computer vision, enabling
significant advancements in a wide range of applications. Deep learning models,
particularly Convolutional Neural Networks (CNNs), excel at image classification tasks.
They can accurately classify images into predefined categories or labels, such as
CHAPTER-4
4.2 SSD(Single Shot Multibox Detector). The Single Shot MultiBox Detector
(SSD) is a deep learning-based object detection model designed for real-time
applications. Unlike region-based detectors like Faster R-CNN, SSD performs
detection in a single forward pass, making it significantly faster. It uses a
convolutional neural network (CNN) to extract features and predicts bounding boxes
and class scores directly from multiple feature maps at different scales. SSD
employs default anchor boxes (MultiBox) of various aspect ratios to detect objects
of different sizes. By combining predictions from both high-resolution layers (for
small objects) and low-resolution layers (for large objects), SSD achieves a balance
between accuracy and speed.
4.5 YOLO(You Only Look Once). YOLO (You Only Look Once) is a popular
deep learning-based object detection model known for its speed and efficiency in
real-time applications. Unlike two-stage detectors like Faster R-CNN, which first
generate region proposals and then classify objects, YOLO is a single-stage
detector that directly predicts bounding boxes and class probabilities from an input
image in a single forward pass. This makes YOLO extremely fast, enabling real-
time detection even on resource-constrained devices. YOLO divides an image into
a grid and predicts multiple bounding boxes per grid cell, refining predictions through
multiple versions such as YOLOv3, YOLOv4, and YOLOv8, which have improved
accuracy and robustness. While YOLO excels in speed, it may struggle with small
object detection and overlapping objects compared to some two-stage detectors. Its
balance of accuracy, speed, and simplicity has made it a widely used model in
applications like autonomous driving, surveillance, and robotics.
4.6 To develop a corrosion detection model using YOLO (v8 and v11) with a few-
shot learning approach, follow these steps. First, collect and preprocess a small dataset
containing corrosion images with diverse lighting, textures, and angles. Since few-shot
learning relies on limited data, employ data augmentation (flipping, rotation, brightness
adjustments) to enhance variability. Next, annotate the images using tools like Labeling
or Roboflow, ensuring accurate bounding boxes around corrosion areas. Then, choose
a pre-trained YOLOv8 or YOLOv11 model (trained on large datasets like COCO) and
fine-tune it using transfer learning to adapt it to corrosion detection. Load the dataset
into Ultralytics YOLO framework, configure hyperparameters (batch size, learning rate),
and train the model using a small number of labeled samples. Utilize few-shot learning
techniques like meta-learning or prototypical networks to improve performance on
unseen corrosion patterns. After training, evaluate the model using precision, recall, and
mAP (mean Average Precision). Finally, deploy the trained model for real-time corrosion
detection using edge devices or cloud-based inference.
4.7 Define Number of Classes. Firstly, the categories or classes that the
model needs to recognize or classify within the dataset are determined. For
example, if the model is designed to classify images of fruits, the classes might
include "apple," "banana," "orange," and others.
4.8 Upload Data. Next, the dataset containing images and their
corresponding labels (if available) is uploaded to Google Drive. This dataset will
serve as the foundation for training the model.
4.13 Once the models are trained, they must be evaluated for accuracy and
reliability. Evaluation metrics include precision, recall, and mean Average Precision (mAP)
to assess detection accuracy. Confusion matrices help visualize true positive, false
positive, and false negative rates, while qualitative analysis of sample predictions ensures
correct corrosion detection. To further enhance performance, optimization techniques can
be applied, such as fine-tuning hyperparameters, leveraging transfer learning with pre-
trained YOLO models, and implementing post-processing techniques like Non-Maximum
Suppression (NMS) tuning. These strategies help refine the model’s accuracy and
reliability in detecting corrosion.
4.14 Deployment is the final stage, where the trained model is integrated into real-
world applications. Deployment options include edge deployment with drones,
underwater cameras, or onboard inspection systems, cloud-based deployment for remote
corrosion monitoring, and web-based interfaces for real-time detection and reporting.
Each deployment method ensures efficient monitoring and early detection of corrosion,
enabling timely maintenance interventions.
CHAPTER-5
YOLO V8.
5.1 YOLOv8 (You Only Look Once version 8) is the latest iteration of the YOLO
(You Only Look Once) object detection family, developed by Ultralytics. It builds upon the
strengths of its predecessors while integrating new techniques to enhance accuracy,
speed, and efficiency. YOLOv8 is designed for object detection, segmentation, and
classification tasks, making it one of the most versatile and high-performing models in the
field of computer vision.
Ultralytics.
strong focus on efficiency, accuracy, and user accessibility, the company has established
itself as a key player in the AI industry, catering to both researchers and businesses
seeking cutting-edge solutions for object detection, segmentation, and classification.[16]
COCO Dataset.
5.5 The COCO (Common Objects in Context) dataset is one of the most widely
used and influential datasets in computer vision research. Developed by Microsoft, COCO
serves as a large-scale benchmark for various vision-related tasks, including object
detection, instance segmentation, keypoint detection, and image captioning. It provides
a diverse and challenging set of images with rich annotations, making it a crucial resource
for training and evaluating deep learning models.
5.6 COCO consists of over 330,000 images, with more than 200,000 labeled
images containing around 1.5 million object instances. The dataset includes 80 object
categories, 91 stuff categories (such as sky, grass, and road), and five captions per
image, making it highly versatile. The annotations provided in COCO are extensive,
covering not only bounding boxes but also pixel-wise instance segmentation masks,
object keypoints for human pose estimation, and detailed descriptions for captioning
tasks. This diversity allows researchers to develop and test models across multiple
domains of computer vision.
5.7 One of COCO’s most significant contributions to the field is its emphasis on
object detection and segmentation in complex, real-world scenarios. Unlike earlier
datasets that often featured isolated objects on simple backgrounds, COCO images
contain multiple objects in cluttered environments, simulating real-life conditions more
effectively. This design makes it an excellent benchmark for evaluating the robustness of
deep learning models, ensuring that they perform well in practical applications.[16]
Training Parameters.
Results – YoloV8.
YOLO V11.
5.8 YOLOv11 (You Only Look Once version 11) represents the latest
advancement in real-time object detection, building upon the successes of previous
YOLO versions. Developed as part of the continuous evolution of deep learning-based
vision models, YOLOv11 integrates cutting-edge improvements in accuracy, efficiency,
and adaptability. Designed to enhance detection, segmentation, and classification tasks,
YOLOv11 is poised to redefine the state-of-the-art in computer vision applications. One
of the most significant advancements in YOLOv11 is its refined neural network
architecture. The model employs a hybrid approach combining convolutional neural
networks (CNNs) with vision transformers (ViTs) to improve feature extraction and
contextual understanding. This hybrid structure enables YOLOv11 to detect objects with
higher precision while maintaining real-time inference speeds. Additionally, it incorporates
an enhanced feature pyramid network (FPN) and spatial attention mechanisms to
improve object localization, particularly in complex and cluttered scenes.
5.9 Another key feature of YOLOv11 is its improved training methodology. The
model leverages self-supervised learning techniques to reduce the need for large labeled
datasets, making it more accessible for a wider range of applications. Advanced
augmentation strategies, such as mixup, CutMix, and RandAugment, enhance model
robustness, while adaptive learning rate scheduling ensures optimal convergence.
Furthermore, the use of a novel loss function, designed to balance localization and
classification errors, contributes to increased detection accuracy. Performance
benchmarking of YOLOv11 demonstrates significant improvements over its
predecessors. The model achieves higher mean Average Precision (mAP) across
standard datasets, such as COCO and Pascal VOC, while reducing computational
complexity. Its optimized architecture allows for deployment on a variety of hardware,
from high-performance GPUs to edge devices, making it a versatile solution for industries
requiring real-time object detection, such as autonomous vehicles, robotics, security
surveillance, and medical imaging.[17]
Training Parameters.
Results-YoloV11.
Inferences.
5.10 The results obtained from training the YOLOv8 model for corrosion detection
provide insights into the model’s performance in terms of precision, recall, mean Average
Precision (mAP), and loss convergence over the training epochs. The precision value of
0.6417 indicates that when the model predicts corrosion, it is correct 64.17% of the time.
The [email protected] score of 0.5544 represents the model’s average precision when using a
high Intersection over Union (IoU) threshold of 50%, which is commonly used for
evaluating object detection models. This value indicates that the model has moderate
detection performance but still has room for improvement. The graphs provide additional
insights into the model’s training behavior. The Precision and Recall graph shows a
steady increase in precision and recall over the epochs, with some fluctuations that
indicate possible variations in learning stability. The mAP Scores graph follows a similar
trend, where both [email protected] and [email protected]:0.95 improve as training progresses but
begin to plateau after approximately 200 epochs, indicating a saturation point where
additional training may not yield significant gains. The Losses graph shows a consistent
decline in box loss, class loss, and DFL loss over the epochs, suggesting that the model
is learning effectively. Overall, while the YOLOv8 model demonstrates a reasonable
ability to detect corrosion, improvements can be made by enhancing the dataset, applying
additional augmentation techniques, or fine-tuning hyperparameters.
5.11 The corrosion detection results obtained using the YOLOv11 model show
notable improvements in precision, recall, and mean Average Precision (mAP) compared
to the V8 model. The precision value of 0.7480 indicates that the model correctly identifies
corrosion in 74.80% of its predictions, an improvement over the earlier results. The recall
score of 0.6065 signifies that the model detects 60.65% of the actual corrosion instances,
suggesting a better balance between identifying true positives while minimizing false
negatives. The [email protected] score of 0.7083 is a significant enhancement, demonstrating
that the model achieves a 70.83% average precision when using an IoU threshold of 50%.
This improvement suggests that the model has become more effective at distinguishing
corrosion from non-corrosion areas. Furthermore, the [email protected]:0.95 score of 0.5612,
which accounts for multiple IoU thresholds, is considerably higher than the previous
score, indicating better localization accuracy and generalization across different object
sizes and shapes.
5.12 Comparing the results obtained using both the models, we obtain significant
improvement in the accuracy and precision when compared to open source models like
Rob flow. The results obtained using the YoloV11 model are marginally better than the
V8 model. Hence, we would be using the V11 model for all further calculations and
validations.
5.13 The confusion matrix for the corrosion detection model provides insights into
its classification performance. The matrix consists of four key values: 743 true positives,
102 false positives, 155 false negatives, and a background class. These numbers
indicate how well the model distinguishes between corrosion (rust) and non-corrosion
(background). The true positives (743) represent cases where the model correctly
identified corrosion when it was actually present. This high number suggests that the
model has a strong ability to detect rust in most scenarios. The false positives (102)
indicate instances where the model incorrectly classified background regions as rust.
While this number is relatively low, it implies that some non-corroded areas are being
mistakenly flagged, which could lead to unnecessary inspections or maintenance
actions in real-world applications. The false negatives (155) highlight cases where actual
corrosion was present but not detected by the model. This is a critical area for
improvement, as failing to detect rust can have significant consequences, especially in
industries where corrosion monitoring is crucial for safety and maintenance. Reducing
false negatives would enhance the model’s reliability by ensuring that fewer corrosion
cases go unnoticed. Overall, the confusion matrix suggests that the model performs well
in detecting corrosion but still has some misclassifications. Possible improvements
include fine-tuning the decision threshold, increasing the dataset size with more diverse
corrosion samples, and applying additional data augmentation techniques. Enhancing
these aspects could help minimize false negatives while maintaining or further reducing
false positives, leading to a more robust corrosion detection system.
5.14 The confusion matrix for the testing data provides valuable insights into the
performance of the corrosion detection model when applied to unseen data. In this case,
the model correctly identified 144 instances of corrosion, meaning that it successfully
detected rust when it was actually present. This high count of true positives indicates
that the model has a strong ability to recognize corrosion in real-world scenarios. The
model produced 35 false positives, meaning that it incorrectly classified background
regions as corrosion. While this number is relatively low, it suggests that the model
occasionally misidentifies non-corroded areas as rust, which could lead to unnecessary
maintenance checks or false alarms in practical applications. On the other hand, there
are 21 false negatives, where the model failed to detect corrosion that was actually
present.
5.15 The overall distribution of values in the confusion matrix suggests that the
model has achieved a good balance between precision and recall. The relatively low
number of false positives indicates that it is precise in identifying corrosion, while the
reduced count of false negatives demonstrates a strong recall capability. However, slight
improvements can still be made to further refine its accuracy. Techniques such as fine-
tuning the detection threshold, improving dataset diversity, and applying additional
augmentation strategies could help enhance the model’s generalization to different
corrosion patterns. In conclusion, the model performs well on the testing data, effectively
detecting corrosion while maintaining a reasonable level of precision. With some minor
refinements, particularly in minimizing false negatives, it can be made even more
reliable for real-world corrosion detection applications.
CHAPTER-6
VALIDATION OF RESULTS
6.2 Above Water dataset. For the topside (above-water) validation trial we
isolate a pure, single-domain dataset containing only corrosion examples
photographed on the freeboard, weather deck fittings, and splash-zone strakes of
operational vessels. All frames were captured between 1 m and 7 m standoff under
natural daylight, then manually filtered to exclude any images that show submerged
plating, bilge algae, or dry-dock scaffolding so that every positive instance
represents true atmospheric rust the reddish-brown streaks and pitted blisters that
propagate where paint films crack, sun-fade, and salt spray accumulatesWe
deliberately retain challenging artefacts like glare from glossy coatings, shadow
gradients cast by mooring lines, and faded boot-top colour bands, because they
mimic real inspection noise and stress-test the detector’s robustness. Only these
region-specific frames are fed to the model during inference, yielding domain-
specific precision-recall curves and mAP scores that reflect performance solely on
atmospheric corrosion cues, uncontaminated by the distinctive colour spectrum or
texture of submerged bio-fouling. Comparing these metrics with the underwater
trials reveals how well the network disentangles rust features from ambient lighting
variations and paint ageing effects, providing actionable guidance on whether
additional topside-focused augmentation or threshold tuning is required.
6.3 The topside-validation run yields two complementary views of model behaviour:
a quantitative confidence-score distribution and a qualitative gallery of detections. The
histogram shows a bimodal shape, one peak between 0.25 and 0.40 and a second,
higher cluster between 0.90 and 1.00—suggesting the detector is largely decisive,
assigning either low probability to non-rust patches or very high probability to true
corrosion. The mean confidence is ≈ 0.66 while the median skews slightly higher (≈ 0.68),
indicating a mild right-tail trace confirms this: it rises sharply again past 0.85, then
plateaus, which is typical of a model that has learned strong class-specific cues
(colour/texture mixtures of oxidised steel and blister patterns). However, the smaller hump
around 0.3 hints at a subset of ambiguous regions—often thin oxide films or partially
sand-blasted surfaces—where the network battles between “rust” and
“background.” Those borderline cases are especially visible in the montage: faint, mottled
streaks adjacent to intact paint frequently receive blue bounding boxes with confidence
tags of ~0.32–0.45, whereas thick flaking plaques and under-cut blisters score ≥ 0.95.
6.4 The collage also reveals how context and lighting drive error modes. Shots taken
under dock-shed skylights (soft diffused light) exhibit crisp detections with tight boxes;
conversely, frames shot at glancing angles show specular highlights that sometimes
confuse the model into extending boxes beyond the actual rust patch, visible in panels
where blue frames spill over onto clean orange hull. Notably, the detector rarely misses
contiguous rust fields larger than 10 cm, every widespread bloom is tagged yet it
struggles on tiny isolated pimples or primer-through scratches; these seldom appear as
boxes, confirming the recall drop at small object scales reported by the metrics. Another
key insight is colour-shift robustness: several images contain areas recently stripped to
greenish primer or coated with whitish filler, but the network still hones in on the
reddish-brown pits, suggesting it is not relying solely on hue but also on texture cues such
as roughness and edge irregularity.
6.7 The underwater run produced a very different confidence landscape from the
topside case, and the mosaic of detections explains why. On the histogram, scores are
concentrated in the mid-range roughly 0.30 to 0.60 with a long taper toward 1.0. That
pattern fits what we see in the images: the network almost always finds something
rust-like on submerged components, but turbidity, beam-light glare, and bio-fouling make
it less certain than in clear daylight. Take the sea-chest grating in the upper-left tile: the
slotted bar face is heavily pitted, and the box sits neatly on the corroded area, yet the
label reads only about 0.58. The slightly muted score comes from competing green
bio-film and low contrast around the slots. A similar story appears on the third tile, top row
an insert plate with a ragged rust edge. The box hugs the orange-brown spall accurately,
but the surrounding primer and sediment cloud drag the confidence down to the
0.6 range. In several shots of shell-plating weld seams, the detector brackets the weld
toes plus adjacent rust streaks (e.g., centre of the second row). Even when heavy silt
reduces visibility to a couple of metres, the model still outlines the damage corridor, yet
confidence hovers near 0.45, showing it “sees” the shape but is cautious because colour
cues are suppressed.
6.8 The algorithm briefly interprets the warm reflection as corrosion and issues a thin
box with a low score (0.38). Another example is the propeller-hub image (third row, far
right). The bronze surface throws back golden highlights that fool the hue-based filters;
the network draws a box, but the score sits below 0.40, meaning it can be ignored once
we set a sensible threshold. Conversely, when corrosion is severe flaking steel at a
bilge-keel weld confidence climbs past 0.75 despite turbidity, because both texture and
colour cues align with the learned pattern.
6.9 Overall, the detector shows that it is locating the right regions almost every time,
even in murky water, but expresses its uncertainty through mid-level scores a healthy
behaviour for an automated aid rather than a hard-coded alarm. By tightening the
threshold to 0.55 and feeding the model more examples of algae-covered rust and
bright-metal false positives, we can turn these mid-confidence boxes into either firm
detections or suppressed noise, further improving submerged-hull inspection reliability.
6.10 To extend validation from still images to continuous footage, the detection
pipeline is wrapped in a simple video-reader loop. A Python snippet built on OpenCV
opens the inspection clip, grabs each frame and stamps it with the original time-code so
any alert can be traced back to a second on the tape. Every frame is then resized and
colour-corrected using the same preprocessing functions we apply to photographs,
passed through the YOLOv11 network, and the resulting bounding boxes with their
confidence scores are written to a results list and optionally drawn on a copy of the frame
for visual review. Because the video often runs at 25–30 fps, we insert a stride parameter
so that, say, only every third frame is analysed when bandwidth or GPU headroom is
tight; this still gives one inference per 120 mm of travel at a typical 0.3 m/s survey
speed. After the clip finishes, the script aggregates detections across frames: it counts
unique corrosion sites by tracking box centroids with a simple IoU-based tracker,
computes per-frame precision/recall, and then averages them to produce sequence-level
mAP and a heat-map that shows where rust is most frequently flagged. This
frame-by-frame approach turns any standard MP4 or live camera feed into a rich
validation set, letting us test the model under realistic motion blur, changing lighting, and
variable standoff distances without rewriting the core detector.
6.11 Video Ingestion and Buffering. The routine begins by opening the
inspection clip typically an MP4 or .AVI stream captured by an ROV or topside
camera using a multimedia library such as OpenCV. The code exposes three key
properties: frames-per-second (FPS), total frame count, and the four-character
codec, allowing the script to check that the video is playable and to pre-allocate
buffers of the correct size. Every call to returns a frame image and a Boolean flag;
if the flag is false the loop exits gracefully. To avoid overloading the GPU, a sampling
stride is introduced: analysing, say, every third frame keeps the effective inference
rate aligned with the motion of the camera.
6.15 Interpreting Run-Level Results. After processing the entire clip, the
script computes sequence-level metrics. Per-frame precision is averaged to yield a
global precision; per-frame recall is estimated if ground-truth video annotations
exist, or approximated by manual spot-checks. A histogram of confidence scores,
like those shown earlier, indicates whether the detector is decisive (peaks at high
and low ends) or uncertain (scores clustered mid-range). Track length statistics
reveal inspection coverage: a defect appearing in only one frame may be noise,
while one persisting over ten seconds is likely real corrosion. Finally, aggregated
bounding-box centres are plotted as a heat-map over the vessel’s hull schematic,
visually highlighting corrosion-prone zones. Together, these summaries let survey
engineers decide whether the detector’s performance meets operational thresholds
or needs further tuning before live deployment.
(a)
(b)
Fig 6.5: Corrosion Detection using video input- General Corrosion Detection. (a) – Input
Video(provided by the user), (b) – Output Video (generated by the program)
(a) (b)
Fig 6.6: Corrosion Detection using video input- Underwater Input (a) – Input
Video(provided by the user), (b) – Output Video (generated by the program)
(a) (b)
Fig 6.7: Corrosion Detection using video input- Above Water(Interior) Input (a) – Input
Video(provided by the user), (b) – Output Video (generated by the program)
6.16 The script opens your input video with cv2.VideoCapture, then steps through
it frame by frame in a while loop. Each time the loop grabs a frame, that single image is
passed to the YOLOv11 model, which has been trained to spot corrosion. The model
returns bounding-box results whose confidence scores indicate how certain it is that each
box really shows corrosion. Those scores are stored in a list for that frame, so the code
can print an average confidence for the current frame before moving on. As the loop
continues, every annotated frame is written to a new video file, and all confidence scores
are collected in a master list. When the video ends, the script reports how many frames
were processed, how many total corrosion detections were made, and the overall
average, maximum and minimum confidence values giving you a quick measure of how
confidently the model identified corrosion across the entire video.
CHAPTER-7
7.1 The experimental evidence gathered throughout the project has been
consolidated, and the broader implications of deploying deep-learning-based corrosion
detection on maritime assets have been interpreted. The preceding chapters chronicled
the physics of marine corrosion and its inspection heritage, the creation of a domain-
specific vision dataset, the iterative training of two modern anchor-free detectors (YOLO
v8 and YOLO v11), and the rigorous validation on still images and streaming video from
both atmospheric and submerged environments.
7.2 To compare the two corrosion-detection models on equal footing, we put them
through an identical training regimen. Both networks examined the same one-thousand
image crops, each crop a square 640 × 640 pixels, and each network reviewed the full
set 500 times. During every pass they digested the pictures in groups of sixty-four,
updating their internal parameters after each group. Because the data, the number of
practice cycles, and every other training detail were held constant, any difference in
performance would come down to the networks’ own design.
7.3 The YOLO v8 model displayed good performance in corrosion detection with an
accuracy of about 64 percent (precision = 0.6417) and a recall of about 55 percent
(0.5483). A popular summary score that blends both viewpoints, called [email protected], landed
at 0.5544. These figures would already put the model on par with a careful human
inspector scanning still photographs. Yet the training curves for precision and recall
flattened long before the end of the 500 passes. That early plateau hints that the v8
architecture struggles to pull in additional context once it has learned the most obvious
colour and texture cues.
7.4 The YOLO v11 model was then trained under the very same conditions. The
only difference is that v11’s architecture contains an extra “transformer” component that
helps the network perceive broader context, seeing small rust streaks in relation to large,
surrounding structures rather than as isolated pixels. With everything else unchanged,
v11’s results improved across the board. Its precision rose to 0.7480, meaning three-
quarters of its corrosion calls were correct. Its recall climbed to 0.6065. The score
[email protected] jumped to 0.7083. Even under a more demanding yardstick [email protected] : 0.95,
which requires the network to outline each rusty area far more tightly the score climbed
to 0.5612. Overall, that represents an improvement of almost 27 percent over v8 in the
headline metric, proving that the newer architecture squeezes more useful signal out of
the same data.
7.6 The take-away is twofold. First, the detector rarely hallucinates rust; its false-
positive count is comfortably low even on novel scenes. Second, most of its misses occur
on faint, early-stage oxidation films or on rust partly hidden under glare or bio-fouling.
Those omissions matter because an undetected micro-pit can grow silently beneath paint
until the steel loses strength. By contrast, spending an extra minute verifying a handful of
false alarms costs little. For practical deployment, then, we should bias the operating
threshold toward higher recall even if that admits a few more doubtful boxes until
additional training data teach the model to recover those elusive early patches without
raising the false-alarm rate.
7.7 When the evaluation set was restricted to photographs taken in bright, above-
water conditions, the freeboard shots most surveyors encounter on a routine deck walk,
the network’s output revealed a strikingly clear two-peak, or bimodal, distribution of
confidence scores. At one end of the spectrum sat the “obvious” corrosion; the model
assigned these features probabilities well above 0.90. At the opposite end, a second,
smaller mound of scores clustered around 0.35. These lower-confidence detections
corresponded to borderline cases, hairline scratches where bare steel was just beginning
to tint, faint salt-run stains, or dust-coloured streaks that could be mistaken for oxidation
under the wrong lighting. The numerical spread was equally informative. Across
thousands of freeboard frames, the mean confidence settled at 0.66, with the median
nudging slightly higher to 0.68. That gap between the low-thirties hump and the high-
nineties peak leaves a comfortable valley right around 0.60.
7.8 Underwater imagery is plagued by green–blue colour shifts, suspended silt
clouds, back-scatter from ROV lights and carpets of marine growth each of which erodes
the crisp hue cues the network leans on topside. As a result, confidence values for
submerged scenes stretched into a long, flat mesa ranging from 0.30 all the way to 0.90.
Even so, the detector remained impressively sure-footed when corrosion was severe.
Heavily pitted shell plating, deeply undercut weld toes and flaking bilge-keel attachments
still drew boxes labelled 0.75 or higher, confirming that the model’s texture filters continue
to fire even when colour fidelity collapses. The real challenge lay with incipient rust
cloaked beneath slime or light bio-fouling. Those spots typically surfaced in the database
with confidences around 0.50, high enough to merit attention. By raising the confidence
threshold from 0.50 to 0.55 we regained most of the precision that had been eroded by
turbid imagery, while giving up only a marginal amount of recall. This balance should hold
and even improve provided upcoming training cycles purposefully seed the dataset with
more instances of algae-coated corrosion. Doing so will teach the network to distinguish
a benign biofilm from the distinctive pitted texture of oxidised steel.
7.9 The inspection system treats the ROV feed as a continuous video stream rather
than a collection of separate photographs. The code captures frames sequentially and,
to manage computational load, processes every third frame still achieving about eight
frames per second on a mid-range GPU through the YOLO v11 detector. Once the
network returns bounding boxes and confidence scores, a lightweight tracker links
recurring rust patches across successive frames by comparing box positions and sizes.
This temporal stitching offers two key benefits. First, it adds stability: genuine corrosion
remains visible over many frames, whereas transient reflections or marine debris appear
briefly and are removed by a track-length filter that discards objects persisting for fewer
than three frames, halving false positives without sacrificing true detections. Second, it
boosts efficiency: overlapping frames give multiple views of the same hull area without
extra manoeuvring, allowing the software to update a live heat-map that guides the ROV
toward suspicious zones in real time. Over a ten-minute video input, the system
processed 1,259 boxes, achieved sequence-level precision of roughly 0.73 and recall
around 0.60, and automatically plotted a heat-map of rust hot-spots on the hull schematic.
7.10 Crucially, genuine defects persisted for nine or more successive frames,
whereas fleeting glints vanished after one or two; a simple track-length filter therefore cut
false alarms in half with no measurable loss of true positives. A conventional walk-around
surveyor visually skims about four thousand images per hour; with YOLO v11 triaging
frames, that throughput rises beyond twenty thousand while the proportion needing
human eyes collapses from one hundred percent to roughly twenty-eight percent.
Preliminary defect maps that once took six hours to draft now appear in forty-five minutes,
and the incidence of missed mature rust drops below two per cent. This results in
downstream savings, fewer dry-dock overruns, earlier spot blasting, reduced paint
wastage, and tighter safety margins.
Conclusion.
precision, recall, and bounding-box accuracy equal and in several tests surpass the
benchmarks achieved by experienced human surveyors working frame-by-frame, yet the
network delivers those results at video frame-rate on a GPU. In other words, what once
required hours of painstaking visual scrutiny can now be distilled into near-instant, colour-
coded alerts. By inserting this capability into routine ROV dives or deck-camera patrols,
operators can pivot away from rigid dry-dock calendars toward condition-based
maintenance anchored in real-time evidence.
7.12 The implications are far-reaching. As the training pool expands to cover
additional paint chemistries and lighting extremes, and as vision outputs are fused with
ultrasonic thickness readings or eddy-current scans, the system will evolve from a simple
“spot the rust” tool into a comprehensive health-diagnostics platform. Continuous learning
in the field will progressively trim false alarms and capture the faintest oxide bloom before
it penetrates the coating. The endgame is a predictive stewardship model in which steel
loss is charted, prioritised, and budgeted months in advance minimising unplanned
downtime, avoiding costly emergency steel renewals, and, most importantly,
safeguarding the crews and cargoes that depend on a vessel’s structural integrity. With
the foundations laid here, the maritime industry now has a clear, technologically practical
pathway from reactive patching to truly proactive asset management.
REFERENCES
[1]. Bonnin-Pascual, Frank., & Ortiz, (2013). A novel approach for defect detection on vessel
structures using saliency-related features. Department of Mathematics and Computer Science,
University of the Balearic Islands.
[2]. Chliveros, Kontomaris, & Letsios, (2015). Automatic identification of corrosion in marine
vessels using decision-tree imaging.
[3]. Naladala, Raju, Aishwarya, & Koolagudi. (2021). Corrosion damage identification and
lifetime estimation of ship parts using image processing.
[4]. Petricca, Moss, Figueroa, & Broen, (2020). Corrosion detection using AI: A comparison of
standard computer vision techniques and deep learning models.
[5]. Civiconcepts. (2018). Types of corrosion. Retrieved from
https://civiconcepts.com/blog/types-of-corrosion
[6]. Ali, Jamaludin, et al. (2016). Computer vision and image processing approaches for
corrosion detection.
[7]. Imran, Jamaludin, Ayob et al (2023). Application of artificial intelligence in marine
corrosion prediction and detection.
[8]. Waszak, M., Cardaillac, A., Elvesæter, B., Rødølen, F., & Ludvigsen, M. (2019). Semantic
segmentation in underwater ship inspections: Benchmark and dataset.
[9]. Ortiz, A., Bonnin-Pascual, F., Garcia-Fidalgo, E., & Company-Corcoles, J. P. (n.d.). Vision-
based corrosion detection assisted by a micro-aerial vehicle in a vessel inspection application.
[10]. Guo Z., Wang, Yang, Huang, & Li (2021). MSFT-YOLO: Improved YOLOv5 based on
transformer for detecting defects of steel surface.
[11]. Wingert, D. (2009). Sane Victory. Retrieved from www.sanevictory.org
[12]. American Society of Civil Engineers. (2022). Retrieved from https://www.asce.org/
[13]. Nature. (2022). Corrosion study. Retrieved from https://www.nature.com/articles/s41529-
022-00232-6
[14]. Le Dinh, D., Tung Son, N., et al. (2015). Deep learning in segmentation of rust in images.
9th International Conference on Software and Computer Applications.
[15]. Vinokurov, I. V. (2019). Using a convolutional neural network to recognize text elements in
poor-quality scanned images. Program Systems: Theory and Applications.
[16]. Thorne, B. (2017). Introduction to computer vision in Python.
[17]. Nabizadeh, E., & Parghi, A. (2022). Automated corrosion detection using deep learning and
computer vision. Asian Journal of Civil Engineering.
[18]. Kim, H., Ahn, E., et al. (2018). Crack and non-crack classification from concrete surface
images using machine learning. Structural Health Monitoring, 17(2), 345-360.
[19]. Hui, J. (2018, December 7). mAP (mean average precision) for object detection. Medium.
Retrieved from https://jonathan-hui.medium.com/map-mean-average-precision-for-object-
detection-45c121a31173