Textile Defect Detection Algorithm Based On The Im
Textile Defect Detection Algorithm Based On The Im
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.DOI
ABSTRACT Automatic detection of textile defects is a crucial factor in improving textile quality. Fast
and accurate detection of these defects is key to achieving automation in the textile industry. However,
the detection of textile defects faces challenges such as small defect targets, low contrast between defects
and the background, and significant variations in the aspect ratio of defects. To address these issues, this
study proposes a new method for textile defect detection based on an improved version of YOLOv8(You
Only Look Once Version 8) called DA-YOLOv8s. Deep & Cross Network(DCNv2) is introduced into
the Backbone Network to replace the C2F module, enhancing the extraction of network features; an
self-attention mechanism, Polarized Self-Attention(PSA), is adopted to increase feature fusion capability
and reduce feature loss in both channel and spatial dimensions; finally, a Small Object Detection Head
(SOHead) is added to improve the feature extraction ability for small targets. Experimental results show
that the improved YOLOv8 algorithm achieves has achieved [email protected] and mAP of 44.6% and 48.6%
respectively, which is an improvement of 4.2% and 3.8% over the original algorithm, and also outperforms
the Optimal YOLOv9s model and the latest YOLOv11s model in these two metrics. The speed of textile
defect detection has reached 257.38 frames per second (FPS) and the floating-point operation speed is 36.6
GFLOPS, ensuring the accuracy and speed of textile defect detection, with practical engineering application
value.
INDEX TERMS Interest Point Detection, Textile Industry, Quality Management, YOLOv8, Textile Defect
Detection, Polarized Self-Attention,Deep & Cross Network
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
characterize the original image, and computed the Euclidean denim fabrics and constructed a defect detection algorithm
distance of GLCMs between the images to be detected and based on a cascading architecture by merging the two models.
the defect-free template image to achieve defect detection. Deep learning methods lack in speed and accuracy.
Guan et al. [5]first highlighted the defect areas using image In recent years, with the further development of computer
enhancement techniques, then used the first-order derivative vision technology, research on object detection algorithms
for edge detection while employing the Roberts operator to has largely focused on those based on candidate regions and
detect the edges of the defect areas to improve detection regression-based deep convolutional neural networks. The
accuracy. Faster R-CNN algorithm is a representative of the candidate
Spectral analysis methods treat images as two-dimensional region-based object detection algorithms, demonstrating ex-
signals with amplitude variations and perform frequency cellent performance in the field of object detection [14]– [17].
domain analysis through certain transformation algorithms, Wei et al. [18] proposed a Faster R-CNN model based on an
commonly including Fourier transform, Wavelet transform, improved VGG structure, which adapts to the characteristics
and Gabor filter transform. Hu et al. [6] proposed an unsu- of fabric defect images by reducing the number of anchor
pervised method based on the combination of the discrete points in the Faster R-CNN. The VGG16 was modified to
Fourier transform (DFT) and the discrete wavelet transform include 13 convolutional layers (with the activation function
(DWT), which performs wavelet shrinkage denoising on being ReLu) and four pooling layers to extract feature maps.
the residual image after Fourier recovery, and applies in- Additionally, the Region Proposal Network (RPN) and the
verse transformation to the approximation coefficients and ROI pooling layer were improved to enhance the model.AN
processed wavelet coefficients separately to achieve defect et al. [19] improved the Faster R-CNN network for textile
information segmentation using simple thresholding. Li et defect detection by using deep residual networks instead
al. [7] proposed a Defect Direction Projection Algorithm of traditional VGG-16 for feature extraction, and incorpo-
(DDPA) based on fabric defects characteristics, which filters rating methods such as adding feature pyramid modules
the input image using Gabor filters, performs a Radon trans- and increasing the number of anchor boxes. Chen et al.
form projection after using hard threshold segmentation, and [20] designed a Genetic Algorithm Gabor Faster R-CNN
selects the optimal Gabor filter channel,that is, the channel (Faster GG R-CNN) model, which embeds Gabor kernels
with the maximum defect value, to detect defects. Xiang into Faster R-CNN and employs a two-stage training method
et al. [8] proposed a defect detection algorithm based on based on Genetic Algorithm (GA) and backpropagation for
Fourier convolution, which generates image pairs for training textile defect detection. The training of the Faster R-CNN
using random masking in the training phase and incorporates algorithm is a two-stage object detection algorithm, with the
a Fourier convolution layer into the autoencoder to achieve first stage completing region box proposals and the second
automatic detection of dyed fabric defects. stage conducting object recognition within the region boxes,
The defect detection algorithms based on traditional com- which affects the speed and accuracy of object detection.
puter vision have high computational requirements and need Regression-based object detection algorithms directly
to be improved in terms of detection speed and accuracy. regress the bounding box coordinates and object categories
Deep learning is a new framework in computer vision at multiple positions in the input image, addressing the co-
research, which has been widely applied in the field of existence of accuracy and speed issues; YOLO is a typical
defect detection with the rapid development of big data and representative of such algorithms. The diversity of YOLO’s
artificial intelligence technologies [9]– [10]. Deep learning applications in industrial defect detection also verifies the
can automatically extract features, optimize and iterate pa- effectiveness of the algorithm [21]– [24]. Yue et al. [25]
rameters, thus achieving the function of detecting defects proposed an improved YOLOv4 textile defect detection al-
in textile images. Mei et al. [11] designed a Multi-Scale gorithm, which, based on the expansion of the dataset using
Convolutional Denoising Autoencoder Network (MSCDAE) combined data augmentation methods, improved the head
that achieved unsupervised detection of textile defects. The prediction layer and integrated the Convolutional Block At-
algorithm trained the Convolutional AutoEncoder (CAE) tention Module (CBAM) to achieve accurate classification
with positive samples, enabling it to extract fabric features and localization of tiny target defects. Jin et al. [26] also made
and reconstruct fabric images. Detection is realized by iden- improvements to the YOLOv5 network, introducing spatial
tifying defects based on the difference in features between and channel attention models into the backbone network and
defective images and normal fabric images. Jing et al. [12] designing a multi-task learning strategy with two detection
proposed an automatic detection method for fabric defects heads for detecting common defects and identifying specific
based on convolutional neural networks. This method decom- defects to improve the accuracy of defect recognition. These
poses textile images into multiple local patches and labels methods have weaker detection capabilities for irregularly
them, then transmits them to a pre-trained deep CNN for sized defects, and the accuracy can also be further enhanced.
learning, and uses the trained model to detect each patch, Our research focuses on improving the accuracy of textile
thereby obtaining the category and position of each defect. defect detection using deep neural network technology.
Ma et al. [13] used an improved parameter VGG16 model Section I introduces the background of textile defect detec-
to train a classifier for detecting and recognizing defects in tion and the development of detection methods, summarizes
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
the characteristics of these methods, and proposes improve- (3)We add a detection head for small objects SOHead to
ments and innovations to the YOLOv8s benchmark model. prevent the loss of small object features, thereby improving
Section II describes the principles of convolutional neural detection performance.
networks, the feasibility of incorporating attention mecha- Experiments show that using YOLOv8s as the base net-
nisms into neural networks, and introduces the basic network work model achieves textile defect detection and recogni-
structure of YOLOv8. tion, with improved DA-YOLOv8s increasing [email protected] and
Section III specifically proposes the construction of DA- [email protected],0.3,0.1 by 4.2% and 3.8% , reaching 0.446% and
YOLOV8s model, including the introduction of DCNv2 to 48.6%, but there is still room for improvement in terms of
enhance the feature extraction capability of the backbone model accuracy.
network, the introduction of PSA to improve the feature
fusion capability of the network neck, and the addition of II. RELATED WORK
SOHead detection head to enhance the detection capability In order to enhance the feature extraction capabilities of
of small targets. computer images and improve detection accuracy, extensive
research has been conducted. In this section, we introduce
Section IV introduces the dataset and evaluation metrics,
the relevant research work and, combining it with various
conducts comparative experiments and ablation study, and
studies, propose an improved method as shown in the Table
analyzes the experimental results.
1.
Section V summarizes the work achievements of this paper
and looks forward to future research. TABLE 1. Related Studies and Improvement Methods
The structure of the thesis content is shown in Fig 1.
Research Focus Improved Method Section
CNN DCNv2 Section III, Part B
Attention Mechanism PSA Section III, Part C
YOLOv8 DCNv2+PSA+SOHead Section III, Part D
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
with a statistical measure of its neighboring area. Common and the head module, as shown in Fig 2. The input end mainly
pooling functions include average pooling and max pooling includes Mosaic data augmentation, adaptive anchor box cal-
strategies. culation, and adaptive grayscale padding. The Backbone net-
Fully Connected Layer: While the convolutional neural work contains structures such as Conv, C2f, and SPPF, among
network is capable of feature extraction from input data, the which the C2f module is the primary module for learning
role of the fully connected layer is to perform non-linear residual features, with branch connections across layers that
combinations of the features extracted by the convolutional enrich the gradient flow of the model and form a neural
and pooling layers to produce the output. network module with stronger feature representation capabil-
Output Layer: For image classification problems, the out- ities. The Neck network adopts the PAN (Path Aggregation
put layer uses a logistic function or a normalized exponential Network) structure, which enhances the network’s ability to
function (softmax function) to output classification labels. In fuse features of objects at different scaling scales. The Head
object detection problems, the output layer can be designed module is the output end, which decouples the classification
to output the center coordinates, size, and classification of the and regression processes; the loss calculation process mainly
object. includes positive and negative sample assignment strategies
Traditional CNNs extract features in the form of linear and loss computation, with the assignment strategy using the
models, which have limited extraction capabilities. In con- dynamic assignment method of Task Aligned Assigner [33],
trast, the Cross Network can achieve multi-layer feature which selects positive samples based on the weighted results
interactions, with each layer producing higher-order interac- of classification and regression scores. The loss calculation
tions based on existing ones, and retaining interactions from for the classification branch adopts the BCE Loss method,
previous layers. The cross network can be trained jointly with while the regression branch uses the distribution focal loss
a deep neural network [30]. Here, we introduce the DCNv2 [34] and the CIOU (complete intersection over union) loss
model to improve the C2f in YOLOv8, which includes the function algorithm. The network structure is shown in the Fig
classic CNN, as discussed in Section III, Part B. 2.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
FIGURE 2. YOLOv8 Network Structure Diagram.Here w represents the width of the convolutional kernel, r represents the scale factor.
implicit feature interactions, thereby achieving automated deep network layer, and finally connecting them. The net-
feature cross-encoding and improving the efficiency of high- work structure of DCNv2 and DCNv2 Block is shown in Fig
order feature extraction. We designed stacked and parallel 4. This not only enables efficient learning of the intersection
structures, and here we adopt the parallel structure, passing of sparse and dense features in images but also enhances the
the input features through the cross network layer and the model’s perception and learning ability for defect details.
VOLUME 4, 2016 5
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
FIGURE 4. The Network Structure of DCNv2. The left figure details the specific algorithm of DCNv2 Block.
6 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
Embedding Layer: Classifies the input features into a C. INTRODUCING PSA TO ENHANCE FEATURE FUSION
combination of sparse and dense features. Transforms the ABILITY OF THE NECK NETWORK
sparse features into embedding vectors and normalizes the The Neck network in YOLOv8, through a series of convolu-
dense features, with its output being the concatenation of tional layers and upsampling layers, fuses low-level feature
all embedding vectors and normalized dense features: x0 = maps with high-level feature maps, thereby enhancing the
[xembed,1 ; · · · ; xembed,n ; xdense ]. accuracy of object detection. However, to more effectively
Cross networks and deep networks: Cross networks are capture the details of objects and the feature information
characterized by the features of the th layer being operated within small target regions, with a focus on feature extraction
with the learned weight matrix and bias vector, then com- at the channel and specific spatial levels. In response to
bined with the first-order original features in the base layer to this issue, Polarized Self-Attention (PSA) is introduced to
produce the features of the next layer. The operation rule for enhance the representation fusion capability of the entire
a single layer is as shown in Fig 5. network.
Polarized self-attention mechanisms are used to address
pixel-level regression tasks, maintaining relatively high res-
olution in both channel and spatial dimensions (retaining
C/2 dimensions in the channel and [H, W] dimensions in the
spatial dimension), which can reduce information loss caused
by dimensionality reduction; it also synthesizes nonlinear
functions that directly correspond to the typical fine-grained
regression output distribution, making the fitted output more
refined and closer to the actual output. This structure includes
self-attention mechanisms in both channel and spatial dimen-
sions and fuses the results of these two dimensions to obtain
FIGURE 5. Single-Layer Operation Rule for Cross Networks.
the polarized self-attention output. There are two structures
of the polarized attention mechanism, namely the parallel
structure and the sequential structure, as shown in the left
Deep networks, on the other hand, take the features of a figure in Fig 6 and Fig 7 respectively. In this project, the
certain layer, operate them with the weight matrix and bias parallel structure is integrated into the YOLOv8 network.
vector, then activate them with the ReLU function to serve as Within the Neck network, the PSA module receives the
the feature input for the next layer, following the operation output from the C2F module and processes it in parallel
rules as shown in (1) . using channel-wise self-attention and spatial self-attention,
performing convolutions, reshaping, and applying the Sig-
hl+1 = f (Wl hl + bl ) (1) moid function, among other operations. The results of these
computations are then combined and output to the detection
Deep and cross combination: The combination of cross head and Conv modules. The enhanced structure of the
networks and deep networks results in two structures, namely neck network in YOLOv8 is illustrated in Fig 6. The PSA
the stacked structure and the parallel structure. The stacked computation process is as follows:
structure feeds the output of the cross network into the Channel Dimension Self-Attention Ach (X) ∈ RC×1×1 :
deep network as its input. The parallel structure, however, First, the input features X are transformed into Wq and Wv
processes the two networks in parallel and finally combines using the convolution of 1 × 1 , where the channels of Wq
their outputs with a single output layer. In practice, which are fully compressed, while the channel dimension of Wv
architecture performs better depends on the data. remains at a relatively high level (C/2) . Because the channel
The formula for the predictive function is as shown in (2): dimension of Wq is compressed, information enhancement is
required through HDR, so the information of Wq is enhanced
T
ybi = σ Wlogit xfinal (2) using Softmax. Then, Wq and Wv are subjected to matrix
multiplication, followed by a 1 × 1 convolution and LN to
Where Wlogit is the weight vector for the logit, σ (x) = increase the dimension on the channels to C . Finally, the
1/ (1 + exp (−x)) ). For the final loss, the logarithmic loss Sigmoid function is used to keep all parameters between 0-1
function Log Loss is used, as shown in (3). . The operation is as shown in (4):
N
1 X 2 ch (X) = F
Wl A SG Wz|θ1 (σ1 (Wv (X)) × FSM (σ2 (Wq (X))))
X
loss = − yi ) + (1 − yi ) log (1 − ybi ) + λ
yi log (b 2
N i=1 (4)
l
(3) Where Wq , Wv and Wz are the convolutional layers
Where ybi is the prediction; yi is the true label; N is the total 1 × 1, σ1 and σ2 are two tensor reshaping operators, FSM (·)
number of inputs; λ is the L2 regularization parameter. is the SoftMax operator," × " is the matrix dot product
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
operation as (5). The number of internal channels between In the formula, Wq and Wv are standard 1 × 1 con-
Wv | Wq and Wz is C/2 , and the output of channel self- volutional layers, σ1 ,σ2 and σ3 are three tensor reshaping
attention is Zch = Ach (X) ⊙ch X ∈ RC×H×W , where ch is operators, FSM (·) is the SoftMax operator. FGP (·) is a
the channel multiplication operator. global pooling operator as (7). “ × ” denotes the dot product
operation of matrices. The output of spatial self-attention is
Np
X exj Zsp = Asp (X) ⊙sp X ∈ RC×H×W , where sp is the spatial
FSM (X) = Np
xj (5) multiplication operator.
j=1
P
exm H X W
m=1 1 X
FGP (X) = X (:, i, j) (7)
Spatial Dimension Self-Attention Asp (X) ∈ RC×H×W : H × W i=1 j=1
First, the input features were transformed into Wq and Wv Combination of Channel and Spatial Self-Attention: The
using convolution of 1 × 1 . For Wq features, Global Pooling parallel combination of channel and spatial self-attention
was applied to compress the spatial dimension, transforming forms a PSA parallel structure, as shown in (8).
it into the size of 1 × 1 ; while the spatial dimension of Wv
features was maintained at a relatively large level (H × W) .
P SAP (X) = Z ch + Z sp
b
Since the spatial dimension of Wq was compressed, Softmax (8)
was used to enhance the information of Wq . Finally, matrix = Ach (X) ⊙ch X + Asp (X) ⊙sp X
b b
Asp (X) = FSG [σ3 (FSM (σ1 (FGP (Wq (X)))) × σ2 (Wv (X)))] = Asp Ach (X) ⊙ch X ⊙sp Ach (X) ⊙ch X
(6) (9)
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
Hyperparameter Value
Batch Size 16
Epochs 100
Learning Rate 0.0001
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
The Cascade R-CNN Algorithm Model: This represents an network layer replaces the CSPBlock used in YOLOv5 with
enhancement over the Faster R-CNN, proposing a cascaded RepBlock, i.e., Rep-PAN, and correspondingly adjusts width
R-CNN detection and inference architecture composed of a and depth; the Head network adopts a hybrid-channel strat-
sequence of detectors trained with increasing IoU thresholds. egy to construct a more efficient decoupled head, which
This cascaded sampling progressively enhances the quality further reduces computational cost.
of detection. YOLOv9 introduced the concept of Programmable Gradi-
The YOLOv3 Algorithm Model: Introduced in 2018, the ent Information (PGI) to address the diverse changes required
YOLO detection model utilizes a Feature Pyramid Network by deep networks to meet various objectives. Furthermore,
(FPN) for feature fusion, incorporating residual connection YOLOv9 has developed a new lightweight network architec-
modules to enable multi-scale training. It outputs feature ture, the Generalized Efficient Layer Aggregation Network
maps at three scales, and in this context, we adopt the (GELAN), which employs gradient path planning to signifi-
YOLOv3-tiny as the baseline model. cantly improve detection performance.
YOLOv5 algorithm model: YOLOv5 is an object detection YOLOv10 has introduced a consistent dual assignment
model released by Ultralytics in June 2020, which excels in strategy with dual label assignments and a consistent match-
inference speed. This paper adopts the YOLOv5s model ver- ing metric to address the issue of redundant predictions in
sion, which has a smaller number of parameters and is suit- post-processing without the need for Non-Maximum Sup-
able for lightweight resource devices or scenarios requiring pression (NMS). It has proposed a lightweight classification
fast inference. The YOLOv5 network structure consists of an head, spatial-channel decoupled downsampling, and rank-
input end (Input), a Backbone network, a Neck network, and guided block design to reduce explicit computational redun-
a detection end (Head). The input end uses data augmentation dancy and achieve a more efficient architecture.
techniques and adaptive anchor calculation methods to enrich The YOLOv11 Algorithm Model: This is the latest iter-
the dataset and reduce the occupation of GPU resources; ation of the YOLO model, which builds upon YOLOv8 by
the Backbone network utilizes the Focus and CSPDarknet53 refining the network architecture. It replaces the C2f module
structures to optimize the classifier, enhancing the diversity with the C3K2 module and adds an attention mechanism
and robustness of features; the Neck network adopts the SPP C2PSA module following the SPPF layer. Additionally, it
module and FPN+PAN structure, which strengthens semantic improves the structure of the detection head. We have also
and positional dual information, fuses feature information introduced this model as a comparative model in our experi-
extracted by the Backbone network, and further improves the mental evaluation.
model’s performance and accuracy.
YOLOX algorithm model: YOLOX is an object detection D. EVALUATION METRICS
algorithm proposed by Megvii Technology in 2021. It is im- To validate the effectiveness and execution time of the model,
proved based on YOLOv3-SPP, adopting the design idea of this paper uses GFLOPS (Giga Floating-point Operations Per
decoupled heads, with classification prediction and position Second, representing 1 billion floating-point operations per
prediction handled by two separate networks. This approach second) as a measure of the execution time of the network
not only reduces information redundancy but also improves model; the mean average precision (mAP) metric is used
detection accuracy. YOLOX introduces an anchor-free de- to evaluate the accuracy of the model, with its calculation
sign concept, achieving anchor-free detection by directly formula as shown in (10).
predicting the center point coordinates and width and height X
information of objects, which not only enhances detection PmA = PA /N (10)
flexibility but also effectively avoids performance bottle-
necks caused by anchor boxes. Moreover, YOLOX employs In the formula, mAP, N represents the total number of cat-
an advanced label assignment strategy, taking into account egories, and PΛ is the area enclosed by the curve formed by
factors such as the size, position, and shape of objects, recall on the horizontal axis and precision on the vertical axis.
making label assignment more rational and accurate. We use [email protected] and mAP as evaluation metrics, where
YOLOv6 algorithm model: YOLOv6 is an object detection [email protected] is the mean Average Precision at an Intersection
framework developed by Meituan’s Visual Intelligence De- over Union (IOU) threshold of 0.5, and mAP is the mean
partment and released in 2022. YOLOv6 has made numerous Average Precision at IOU thresholds of 0.5, 0.3, and 0.1.
improvements to network structures such as Backbone, Neck, Simultaneously, we employ FPS (Frames Per Second) to
and Head. In this paper, YOLOv6 also uses the YOLOv6s evaluate the detection speed of the model. FPS denotes the
version as the comparison algorithm. The Backbone network number of image frames that can be processed and outputted
in YOLOv6s is inspired by the RepVGG [43] Style structure, within a second. The calculation method is illustrated in (11).
composed of RepBlock in the training phase, and in the In this equation, t1 represents the image preprocessing time,
inference phase, each RepBlock is converted into a 3*3 t2 signifies the image inference time, and t3 indicates the
convolutional layer stack (represented as RepConv) through post-processing time.
the ReLU activation function, which can reduce inference la- 1000 ms
tency while enhancing representation capabilities. The Neck FPS = ; (11)
t1 + t2 + t3
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
E. EXPERIMENTAL RESULTS
1) Comparative Detection Experiments with Different Models
To fully evaluate the detection algorithm of the improved (a) Comparison of [email protected] across different epochs
YOLOv8s model in this paper, ten algorithms were selected
for experimental comparison, including the Faster R-CNN,
Cascade R-CNN, and unimproved YOLOV3-tiny, YOLOv5s,
YOLOv6s, YOLOX, YOLOv8s, YOLOv9s, YOLOv10s, and
YOLOv11s. Meanwhile, a comparison was made with the
typical two-stage object detection algorithm, Faster-RCNN.
The mean Average Precision (mAP) after 100 iterations
was used as the evaluation criterion for different detection
algorithms, which can scientifically and reasonably assess
the object detection capability and computational speed of
various detection algorithms. The results of the experimental
comparison are shown in Table 5. The mean precision after
each training iteration is shown in Fig 12.The results indicate (b) Comparison of average precision mAP across epochs
that, considering the balance between detection accuracy and FIGURE 12. Comparison Chart of Training Accuracy Across Epochs
speed, the YOLOv8s, YOLOv9s, and YOLOv11s models
significantly outperform other models.Using the improved
DA-YOLOv8s model, which is superior to all the models it
is compared with, there is a 4.2% increase in [email protected] and
a 3.8% improvement in average mAP relative to the baseline
YOLOv8s model, although the detection speed has decreased
by 52.2%, and the GFLOPS has risen by 12.1. Given YOLO’s
excellent performance in detection speed, the DA-YOLOv8s
still achieves a detection speed of 257.38 FPS, which meets
the requirements for industrial applications.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
(c) (d)
V. CONCLUSION AND FUTURE PROSPECTS
A textile defect detection algorithm based on an improved FIGURE 15. Detection Results of the DA-YOLOv8s Model on the Test
Dataset.
YOLOv8 algorithm is proposed to address the issues of low
detection accuracy and poor real-time performance of tra- Textile defect detection is mostly dominated by small ob-
ditional methods. Experimental results show that improving jects, with the detection ability for small targets as a research
the C2f to DCNv2 in the YOLOv8s baseline network can focus for the next step.
enhance the feature extraction capability of the network,
incorporating the self-attention mechanism PSA can increase REFERENCES
the feature fusion capability on the channel and spatial [1] L. Tong,W.K. Wong, C.K. Kwong, "Fabric defect detection for apparel
dimensions, and adding detection heads can improve the industry: a nonlocal sparse representation approach," IEEE access, vol.5,
VOLUME 4, 2016 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
pp.5947-5964, Feb. 2017, 10.1109/ACCESS.2017.2667890. [22] X. Liao, S. Lv, D. Li, Y. Luo, Z. Zhu, and C. Jiang, "YOLOv4-MN3 for
[2] A. Rasheed,B. Zafar,and A. Rasheed,et al., "Fabric Defect Detec- PCB Surface Defect Detection," Appl. Sci., vol 11, no. 24, pp. 11701.1–
tion Using Computer Vision Techniques: A Comprehensive Review," 17, Dec. 2021, 10.3390/app112411701
Math. Probl. Eng., vol. 2020, no. 41, pp. 8189403.1–24, Dec. 2020, [23] S. Teng, Z. Liu, and X. Li, "Improved YOLOv3-based bridge sur-
10.1155/2020/8189403. face defect detection by combining High-and low-resolution feature im-
[3] W.Y. Zhang, J. Zhang, Y. Hou, and S. Geng, "MWGR: a New Method for ages," Buildings, vol.12, no.8, pp.1225.1–18, Aug. 2022, 10.3390/build-
Real-time Detection of Cord Fabric Defects," in Proc. 2012 Int. Conf. on ings12081225
Adv. Mechatronic Syst., Tokyo, Japan, September 2012, pp. 458—461. [24] Z. Cong, X. Li, and Z. Huang, "Research on Brake Pad Surface Defect
[4] D. D. Zhu, R. R. Pan, W. D. Gao, et al., "Yarn-Dyed Fabric Defect De- Detection Method based on Deep Learning," in IEEE Proc. of 2023 Int.
tection Based On Autocorrelation Function And GLCM,” Autex Research Conf. on Advances in Elect. Eng. and Comput. Appl. (AEECA), Dalian,
Journal,vol. 15,no.3, pp.226–232, Aut.2015,10.1515/aut-2015-0001. China, Aug. 2023, pp.813–818, 10.1109/AEECA59734.2023.00149
[5] M. Guan, Z. Zhong, and Y. Rui, "Automatic Defect Segmentation for Plain [25] X. Yue, Q. Wang, L. He, Y. Li, and D. Tang, "Research on tiny target
Woven Fabric Images, " in Proc. of 2019 Int. Conf. on Commun., Inf. detection technology of fabric defects based on improved Yolo," Appl.
Syst. and Comput. Eng. (CISCE), Haikou, China, Jul. 2019, pp. 465–468, Sci., voL.12, no.13, pp.6823.1–16, Jul. 2022, 10.3390/app12136823.
10.1109/CISCE.2019.00108. [26] Y. Jin, L. Di, "Textile defect detection based on multi-proportion spatial
[6] G.H. Hu, Q.H. Wang, and G.H. Zhang, "Unsupervised Defect Detection in attention mechanism and channel memory feature fusion network," IET
Textiles Based on Fourier Analysis and Wavelet Shrinkage,” Appl. Opt., Image Process., vol. 18, no.2, pp.412-427, Feb. 2024, 10.1049/ipr2.12957
vol. 54, no. 10, pp. 2963–2980, Feb. 2015, 10.1364/AO.54.002963. [27] R. Wang, R. Shivanna, D. Cheng, et al., "Dcn v2: Improved deep & cross
[7] Y. H. Li and X. Y. Zhou, "Fabric Defect Detection with Optimal Gabor network and practical lessons for web-scale learning to rank systems," in
Wavelet Based on Radon," in Proc. of 2020 IEEE Int. Conf. on Power, Proc. of the web conf. 2021,New York, USA, Jun. 2021, pp.1785–1797,
Intell. Comput. and Syst. (ICPICS), Shenyang, China, Sep. 2020, pp. 788- 1.1145/3442381.34500
793, 10.1109/ICPICS50287.2020.9202242. [28] H. Liu, F. Liu, X. Fan, et al., "Polarized self-attention: Towards high-
quality pixel-wise regression," 2021, arXiv:2107.00782.
[8] J. Xiang, R. R. Pan, and W. D. Gao, "Yarn-dyed Fabric Defect Detection
[29] Y. Lecun, L. Bottou, Y. Bengio and P. Haffner, "Gradient-based learning
Based on an Improved Autoencoder with Fourier Convolution," Text. Res.
applied to document recognition," in Proc. IEEE, vol.86, no.11, pp.2278-
J., vol. 93. no. 5/6, p1153–1165, Mar. 2023, 10.1177/00405175221130519
2324, Nov. 1998, 10.1109/5.726791.
[9] A. M. Kamoona, A. K. Gostar, A. Bab-Hadiashar and R. Hoseinnezhad,
[30] R. Wang, B. Fu, G. Fu, et al., "Deep & cross network for ad click
"Point Pattern Feature-Based Anomaly Detection for Manufacturing De-
predictions," in In Proc. of the ADKDD’17 (ADKDD’17), New York,
fects, in the Random Finite Set Framework," IEEE Access, vol. 9, pp.
USA, Aug. 2017, pp.2278-2324, 10.1109/5.726791
158672–158681, Nov. 2021, 0.1109/ACCESS.2021.3130261.
[31] M.H. Guo, T.X. Jin, J.J. Liu, et al., "Attention mechanisms in computer
[10] F. Alghanim, M. Azzeh, A. El-Hassan and H. Qattous, "Software Defect vision: A survey," Comput. vis. media, vol. 8, no.3, pp.331-368, Mar. 2022,
Density Prediction Using Deep Learning," IEEE Access, vol. 10, pp. 10.1007/s41095-022-0271-y
114629–114641, Oct. 2022, 10.1109/ACCESS.2022.3217480. [32] A. VaswaniR, S. Noam, P. Niki, et al., "Attention is all you need,"
[11] S. Mei, Y.D. Wang, and G.J. Wen, "Automatic Fabric Defect De- 2017,arXiv:1706.03762
tection with a Multi-scale Convolutional Denoising Autoencoder Net- [33] C. Feng, Y. Zhong, Y. Gao, et al., "Tood: Task-aligned one-stage
work Model," Sensors, vol. 18, no. 4, pp. 1064.1–18, Apr. 2018, object detection," in Proc. of 2021 IEEE Int. Conf. on Com-
10.3390/s18041064. put. Vis. (ICCV), Montreal, QC, Canada, Oct. 2021, pp.3490–3499,
[12] J. F. Jing, H. Ma, and H. H. Zhang, "Automatic fabric defect detection 10.1109/ICCV48922.2021.00349.
using a deep convolutional neural network," Color. Technol., vol. 35, no. [34] X. Li, W. Wang, L. Wu, et al., "Generalized focal loss: Learning qualified
3, pp. 213–223, Mar. 2019, 135.10.1111/cote.12394. and distributed bounding boxes for dense object detection," Advances in
[13] S. Ma, R. Zhang, Y. Dong, Y. H. Feng, and G. Zhang, "A Defect Detection Neural Inf. Proc. Syst., vol. 33, 2020, pp. 21002-21012.
Algorithm of Denim Fabric Based on Cascading Feature Extraction Ar- [35] S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards
chitecture," J. Inf. Process. Syst., vol. 19, no. 1, pp. 109–117, Feb. 2023, Real-Time Object Detection with Region Proposal Networks," IEEE
10.3745/JIPS.04.0265 Trans. Pattern Anal. Mach. Intell.,vol.39, no.6, pp.1137–1149, Jun. 2017,
[14] F. Xu, Y. Liu, B. Zi, and L. Zheng, "Application of Deep Learning for 10.1109/TPAMI.2016.2577031.
Defect Detection of Paint Film," in Proc. of 6th Int. Conf. on Intell. [36] Z. Cai, N. Vasconcelos, O. Tuzel, et al., "Cascade r-cnn: Delving into high
Comput. and Signal Proc. (ICSP), Xi’an, China, Apr. 2021, pp. 1118– quality object detection," in Proc. of the IEEE conf. on comput. vis. and
1121, 10.1109/ICSP51882.2021.9408956. pattern recognit. (CVPR),Salt Lake City, USA, Jun. 2018, PP.6154-6162
[15] B. Zhao, M. Dai, P. Li, and X. Ma, "Data Mining in Railway Defect Image [37] A. Farhadi, J. Redmon, "Yolov3: An incremental improvement," 2018,
Based on Object Detection Technology," in Proc. of 2019 Int. Conf. on arXiv:1804.02767
Data Mining Workshops (ICDMW), Beijing, China, Nov. 2019, pp. 814– [38] G. Zheng, S. Liu, F. Wang, Z. Li, and J. Sun, "YOLOX: Exceeding YOLO
819, 10.1109/ICDMW.2019.00120. Series in 2021," 2021, arXiv:2107.08430
[16] Y. Zhang, Z. Zhang, K. Fu, and X. Luo, "Adaptive Defect Detec- [39] C. Li, L. Li, H. Jiang, et al., "YOLOv6: A Single-Stage Object Detection
tion for 3-D Printed Lattice Structures Based on Improved Faster R- Framework for Industrial Applications,", 2022, arXiv:2209.02976
CNN," IEEE Trans. Instrum. Meas., vol. 71, no. 5020509, pp. 1–9, Aug. [40] C.Y. Wang, I.H. Yeh, H.Y.M. Liao, YOLOv9: Learning what you want to
2022,10.1109/TIM.2022.3200362. learn using programmable gradient information. 2024, arXiv:2402.13616.
[17] X. Gao, M Jian, M. Hu, M. Tanniru, and S. Li. "Faster multi-defect [41] A. Wang, H. Chen, L. Liu, et al. YOLOv10: Real-Time End-to-End Object
detection system in shield tunnel using combination of FCN and faster Detection. 2024, arXiv:2405.14458.
RCNN," Adv. in Structural Eng., vol 22, no. 13, pp.2907–2921, May. 2019, [42] R. Khanam, M. Hussain, "YOLOv11: An Overview of the Key Architec-
10.1177/1369433219849829 tural Enhancements," 2024, arXiv:2410.17725
[18] B. Wei, K. Hao, X. Tang, et al., "Fabric Defect Detection Based on Faster [43] X. Ding, X. Zhang, N. Ma, et al., "Repvgg: Making vgg-style convnets
RCNN", in Proc. of Artif. Int. on Fashion and Textiles Conf., Shanghai, great again," in Proc. of the IEEE/CVF Conf. on Comput. Vis. and
China, 2019, pp.45–51 Pattern Recognit., Nashville, TN, USA, Jun. 2021, pp.13733–13742,
[19] M. An, S. Wang, L. Zheng, and X. Liu, "Fabric defect detection using 10.1109/CVPR46437.2021.01352
deep learning: An Improved Faster R-approach," in IEEE Proc. of 2020
Int. Conf. on Comput. Vis., Image and Deep Learn. (CVIDL), Chongqing,
China, Jul. 2020, pp.319–324, 10.1109/CVIDL51233.2020.00-78
[20] M. Chen, L. Yu, C. Zhi, et al., "Improved faster R-CNN for fabric
defect detection based on Gabor filter with Genetic Algorithm opti-
mization," Comput. Ind., vol. 134, no. 2022, pp.103551, Jan. 2022,
10.1016/j.compind.2021.103551
[21] Z. Liu, W. Wu, X. Gu, et al., "Application of combining YOLO models
and 3D GPR images in road detection and maintenance," Remote Sens.,
vol. 13, no. 6, pp.1081.1–19, Mar. 2021, 10.3390/rs13061081.
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2025.3528771
VOLUME 4, 2016 15
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/