0% found this document useful (0 votes)

7 views18 pages

Enhanced Deep Learning Framework For Efficient Garbage

This paper presents an advanced deep learning framework for garbage classification aimed at improving waste management systems. The proposed method utilizes ResNet-152 for feature extraction, incorporates FAISS similarity search for enhanced feature discrimination, and employs an attention mechanism to emphasize key features while mitigating overfitting through regularization techniques. Experimental results demonstrate that this framework outperforms traditional classification methods, contributing to more efficient recycling and sustainable waste management practices.

Uploaded by

selmon.bhoy.321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views18 pages

Enhanced Deep Learning Framework For Efficient Garbage

Uploaded by

selmon.bhoy.321

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Information Sciences 719 (2025) 122462

Contents lists available at ScienceDirect

Information Sciences
journal homepage: www.elsevier.com/locate/ins

Enhanced deep learning framework for eﬃcient garbage

classification in smart waste management systems
,∗
Debojyoti Ghosh , Adrijit Goswami
Department of Mathematics, Indian Institute of Technology Kharagpur, Kharagpur, 721302, West Bengal, India

A R T I C L E I N F O A B S T R A C T

Keywords: The growing environmental challenges associated with waste disposal have accelerated the need
Garbage classification for intelligent and sustainable waste management systems. This paper introduces an advanced
Deep learning deep learning-based approach that integrates multiple techniques to improve garbage classifica
ResNet-152
tion performance. The proposed framework leverages ResNet-152 as a feature extractor, utilizes
FAISS similarity search
Attention mechanism
its deep hierarchical representations to capture essential visual patterns from waste images. To en
Waste management hance feature discrimination, we employ FAISS-based similarity search, which retrieves the most
relevant embeddings to refine classification. An attention mechanism emphasizes key features
while reducing less relevant ones. To further improve generalization and robustness, we integrate
regularization techniques such as dropout and DropConnect, reducing overfitting and enhancing
model resilience. Label smoothing is applied to mitigate overconfidence in softmax predictions,
resulting in better calibration. The framework is evaluated on a diverse garbage classification
dataset, and performance metrics such as accuracy, precision, recall, and F1-score are reported.
The paper provides a thorough mathematical formulation and algorithmic details to ensure re
producibility. The experimental findings show that the proposed method outperforms traditional
techniques in terms of classification accuracy. This work contributes to developing an intelligent
waste sorting system, which can signiﬁcantly enhance recycling eﬃciency and promote sustain
able waste management practices.

1. Introduction

With the accelerating pace of urbanization, escalating consumption patterns, and growing environmental concerns, waste man
agement has emerged as a critical global issue. Effective recycling hinges on the accurate classification of waste materials, which is
essential for reducing the ecological burden associated with improper disposal. Traditional manual sorting methods, however, are
often inefficient, labor-intensive, and prone to human error. These limitations underscore the necessity for intelligent, automated
solutions. In this context, recent advancements in deep learning have demonstrated remarkable success in image-based classifi
cation tasks, offering a promising pathway toward more efficient and scalable waste management systems. Convolutional Neural
Networks, in particular, have demonstrated superior performance in extracting meaningful features from complex visual data. How
ever, challenges such as intra-class variability, inter-class similarity, and background noise make garbage classification a non-trivial
task. Additionally, ensuring both high classification accuracy and computational efficiency remains a key challenge, especially for
real-world deployment.

* Corresponding author.
E-mail address: [email protected] (D. Ghosh).

https://doi.org/10.1016/j.ins.2025.122462
Received 23 April 2025; Received in revised form 23 June 2025; Accepted 24 June 2025
Available online 2 July 2025
0020-0255/© 2025 Elsevier Inc. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

In this work, we propose an advanced deep learning-based framework for garbage classification that integrates multiple techniques
to enhance classification performance. Our approach leverages ResNet-152, a powerful CNN architecture, as a feature extractor to
capture deep hierarchical representations of waste images. To further improve feature discrimination, we incorporate FAISS-based
similarity search, which retrieves the most relevant embeddings to refine classification. Additionally, an attention mechanism is intro
duced to selectively emphasize important features while suppressing irrelevant information, thereby enhancing model interpretability.
To mitigate overfitting and improve generalization, we employ regularization techniques such as dropout and DropConnect. Fur
thermore, label smoothing is applied to prevent overconfidence in softmax predictions, leading to better-calibrated probability
distributions. The proposed framework is assessed using a diverse dataset comprising various categories of waste, and its perfor
mance is evaluated through comprehensive metrics, including accuracy, precision, recall, and F1-score. The results indicate that the
model offers a reliable and scalable approach for automated waste classification, supporting the advancement of sustainable waste
management systems. Empirical evaluations reveal that the method surpasses traditional classification techniques, delivering superior
accuracy and aligning with state-of-the-art performance standards. The mathematical formulations and algorithmic details presented
in this paper ensure reproducibility and provide a foundation for further research in intelligent waste classification systems.
The rapid growth of urbanization and industrialization has significantly increased waste generation, leading to a major chal
lenge in waste management. Effective waste classification is essential to improve recycling efforts and environmental sustainability.
Conventional manual sorting techniques demand significant human effort and are susceptible to mistakes, frequently resulting in
operational inefficiencies and potential health risks [1]. Therefore, employing automation in waste sorting, especially by utilizing
deep learning techniques, has emerged as a highly effective approach to address this issue.
Deep learning techniques, particularly CNN, have seen extensive application in image recognition tasks, demonstrating superior
performance in automatically classifying waste materials [2]. These models can be trained to identify various types of waste, including
recyclables, organic matter, and hazardous items, without human intervention. Their ability to process large datasets and extract
hierarchical feature representations from images has made them ideal for garbage classification [3].
Despite their success, deep learning models face several challenges in garbage classification. A major challenge lies in obtaining
large, varied, and annotated datasets for training [4]. Waste images vary greatly in terms of size, color, and texture, making it difficult
for a single model to handle all variations [5]. Moreover, data labeling for waste materials is often limited, making it challenging
to build comprehensive datasets for model training. To overcome this challenge, methods like data augmentation, transfer learning,
and the creation of synthetic data are frequently employed to enhance model accuracy [6].
Another challenge is the wide variety of waste categories, each with its distinct characteristics. For instance, plastics, metals,
paper, and organic materials may differ significantly in terms of shape and appearance [7]. Multi-category waste or cluttered images
further complicate classification tasks. These models leverage both the strength of CNNs and other network types, such as Recurrent
Neural Networks (RNNs) are used to capture both temporal and spatial features for more accurate classification [8].
The computational complexity of deep learning models also poses a barrier to their use in resource-constrained environments,
such as smart waste bins or edge devices [9]. To mitigate this issue, researchers have focused on model optimization techniques
like pruning, quantization, and the use of lightweight architectures, such as MobileNet [10]. These approaches aim to reduce the
computational load while maintaining high classification accuracy. In addition, combining deep learning models with conventional
machine learning techniques, like support vector machines, has been explored to balance performance and efficiency [1].
The integration of IoT devices with deep learning-based systems has gained attention as a way to improve waste management.
Smart waste bins equipped with sensors and deep learning models can automatically classify waste as it is deposited, allowing for
real-time sorting and feedback [11]. The combination of these technologies facilitates intelligent, adaptive waste management systems
that are capable of optimizing waste collection and disposal processes.
However, several challenges remain in deploying deep learning-based garbage classification systems on a large scale. These include
adapting models to new waste categories or changing waste patterns over time. Transfer learning has been suggested as a way
to address this, as it allows models to be fine-tuned for new data with minimal retraining [12]. Additionally, explainability and
interpretability of deep learning models are crucial for ensuring transparency and trust, particularly in public waste management
systems [13].
The following sections of the paper are structured as follows: Section 2 offers an in-depth review of related literature and founda
tional research. Section 3 discusses the dataset used and the preprocessing steps performed. The overall model architecture is outlined
in Section 4. Section 5 elaborates on the training process, including label smoothing and optimization techniques. Experimental re
sults, including performance evaluation metrics, are reported in Section 6. A detailed framework for integrating the proposed model
into real-world waste management systems is presented in Section 7. Lastly, Section 8 summarizes the paper and presents potential
avenues for advancing automated waste classification systems in future research.

2. Literature survey

The literature on garbage classification using deep learning can be grouped into several key areas, including deep CNN-based
models, optimization techniques, and hybrid models for waste classification. Here is an overview of the relevant research.

2.1. Deep CNN-based models

Numerous studies have concentrated on utilizing deep Convolutional Neural Networks for eﬃcient waste classification. [1] pro
posed a deep CNN-based vision system for waste classification, enhancing recycling processes through automation. [2] developed an

2
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

intelligent system for decoration waste classification using data augmentation and transfer learning. [3] proposed a two-stage waste
recognition system using deep learning algorithms to improve accuracy and robustness. [4] focused on feature fusion techniques
for waste classification, optimizing human-robot interaction scenarios in waste management. [5] used transfer learning with CNNs
for improved waste image classification, leveraging pre-trained models like VGGNet. [6] improved ShuﬄeNet v2 for garbage clas
sification, focusing on lightweight models for mobile applications. [7] proposed the MRS-YOLO model, optimizing real-time waste
classification and detection with high precision.

2.2. Hybrid and transfer learning models

A number of studies have investigated hybrid models and transfer learning techniques for waste classification. [8] optimized CNN
models for fast and accurate waste classification, crucial for real-time recycling systems. [9] developed a model combining CNNs
and deep reinforcement learning for eﬃcient recycling of garbage materials. [10] introduced a depth-wise separable convolution
attention module to improve CNN performance on occluded garbage objects. [1] proposed a hybrid model combining CNN with
an autoencoder for improved waste classification under challenging conditions. [11] applied reinforcement learning for garbage
classification to dynamically adapt the model based on new waste data. [12] designed a smart waste classification system using
transfer learning and lightweight networks for mobile deployment. [13] created a scalable waste management system using deep
learning that integrates with IoT devices for improved accuracy.

2.3. Optimization and lightweight models

Research on optimizing deep learning models for waste classification has been an important area of focus. [14] implemented a
deep learning-based waste disposal system optimized for limited hardware resources. [15] used reinforcement learning to optimize
CNN-based waste classification systems for dynamic environments. [16] optimized garbage classification models by integrating edge
computing with deep learning for faster waste sorting. [17] applied an enhanced CNN architecture for waste classification, focusing
on real-time processing in smart cities.

2.4. Multi-modal and hybrid models

Some researchers have looked at combining various models to improve classification accuracy and adaptability. [18] investigated
a multi-modal deep learning model for waste classification that combines both image and sensor data. [19] implemented a deep CNN
model with adversarial training for enhanced generalization in garbage classification. [20] developed a hybrid model combining
deep CNNs with decision trees for improved garbage classification performance. [21] applied a CNN architecture enhanced with
an attention mechanism for more accurate identification of recyclable waste materials. [22] proposed a novel hybrid CNN and
support vector machine (SVM) model to classify complex waste types. [23] developed a domain-adaptive deep learning model for
robust garbage classification across diﬀerent environments. [1] proposed a multi-task learning model for garbage classification that
incorporates both image data and metadata.
Several studies have identified areas of improvement and future research directions for deep learning-based garbage classification
systems. [24] combined CNN with attention mechanisms to enhance garbage classification accuracy in noisy environments. [25] used
a fusion of CNN and deep reinforcement learning to enhance waste classification in dynamic settings. [26] investigated a hybrid CNN
and RNN model for waste classification, combining spatial and temporal features. In recent years, several researchers have introduced
new techniques to improve the robustness and eﬃciency of deep learning-based waste classification systems. [27] explored a hybrid
CNN and random forest approach for waste classification in smart recycling systems. [28] proposed a novel deep CNN model that uses
feature pyramid networks to handle complex waste objects. [29] introduced a collaborative filtering method for garbage classification,
enhancing model adaptability in multi-domain environments.
Recent research has introduced innovative methodologies in hazard classification, high-dimensional classification, and sustain
able waste management by integrating deep learning, grey modeling, and optimization techniques. [30] proposed a novel hazard
event classification framework utilizing deep learning and multifractal analysis with a hierarchical gating neural network, achieving
superior classification of hazard severity, possibility, and risk. Similarly, [31] developed a hybrid model combining deep learning
with grey modeling, introducing the DLGM framework with a Fourier series-enhanced grey model (FSGM(1,1)) for improved haz
ard classification accuracy. [32] addressed high-dimensional classification challenges by introducing an optimized fuzzy clustering
approach, incorporating Quantum Particle Swarm Optimization (QPSO) for enhanced classification performance. In the domain of
waste management, [33] applied convolutional neural networks (CNNs) to classify and estimate the mass composition of recycled
aggregates, enabling real-time industrial monitoring and improving circular economy processes. [34] proposed an IoT-integrated
predictive waste management framework using federated learning and blockchain, enhancing real-time monitoring and optimizing
waste collection in smart cities. These studies collectively demonstrate the transformative impact of artificial intelligence, optimiza
tion techniques, and blockchain technology in industrial safety, classification systems, and sustainable waste management. To address
the limitations of existing waste classification systems, [35] proposed SwinConvNeXt, a fused deep learning architecture integrating
Swin Transformer and ConvNeXt for high-accuracy real-time garbage image classification.

3
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

2.5. Challenges in existing techniques and contributions of the proposed method

Although convolutional neural networks (CNNs) have demonstrated strong performance across a range of image classification
tasks, their application to waste classification remains challenging due to several inherent limitations. Pretrained backbones such as
ResNet152 provide general purpose feature extraction capabilities but are often insufficient for fine grained visual distinctions required
in waste sorting scenarios, where inter class similarities and intra class variability are common. For instance, classes like paper,
cardboard, and plastic frequently exhibit overlapping visual properties, which makes them difficult to separate using generic features
alone. Furthermore, existing models generally process each image independently, neglecting the semantic relationships between
instances that could be captured via similarity based retrieval or contextual learning. This lack of contextualization contributes to
misclassifications, especially for ambiguous or borderline cases. Another critical issue is the class imbalance prevalent in most real
world garbage datasets. Underrepresented categories, such as battery or metal, are often misclassified due to the bias introduced
by skewed class distributions. Finally, the classification heads used in many existing architectures are shallow and lack adequate
regularization, leading to poor generalization and overfitting, particularly in data scarce or noisy environments.
To address these challenges, we propose a retrieval-augmented attention-based classification framework that enhances feature
expressiveness and robustness. The model begins by extracting high-level features using a frozen ResNet152 architecture. These
features are indexed using FAISS, which allows for efficient identification of the most similar training instances for each input image.
The retrieved similar feature is then fused with the input feature through a self-attention mechanism designed to highlight salient and
complementary aspects of both vectors. This context-enriched representation is passed through a deep classification head comprising
multiple fully connected layers with batch normalization, dropout, and DropConnect regularization, which collectively enhance
generalization capacity. To further improve model performance, label smoothing is incorporated into the loss function to reduce
overconfidence in predictions, and class-balanced loss weighting is applied to counter the impact of data set imbalance. The training
regime includes mixed precision computation and a one-cycle learning rate schedule, both of which contribute to efficient and stable
convergence. Overall, the proposed method offers a comprehensive and effective solution to the limitations of conventional CNN
based classifiers, particularly in the context of fine-grained and imbalanced garbage classification.

2.5.1. Our contributions

In this work, we propose a novel garbage classification framework that eﬀectively integrates deep convolutional feature extraction,
FAISS-based similarity retrieval, and attention-driven feature fusion. Our key contributions are as follows:

• We leverage a pre-trained ResNet-152 backbone to extract rich hierarchical features, providing a strong visual representation
foundation.
• We introduce FAISS-based approximate nearest neighbor search to retrieve relevant feature embeddings, enhancing robustness
by incorporating similarity information.
• We design an attention-based feature fusion mechanism that adaptively combines original and retrieved features, improving
discrimination among visually similar garbage classes.
• Advanced regularization techniques, including DropConnect and label smoothing, are integrated to improve model generalization
and reduce overfitting.
• Comprehensive ablation studies and multiple-seed training experiments validate the positive impact of each component on overall
classification accuracy.

Our proposed framework achieves state-of-the-art accuracy on a diverse garbage dataset while maintaining computational feasi
bility, demonstrating its practical potential for real-world waste management applications.
Our model is evaluated using a real-world dataset of garbage images, demonstrating significant improvements in precision, re
call, F1-score, and overall classification accuracy compared to baseline models that rely solely on image features. In the context of
advancing sustainability and enhancing recycling processes, efficient waste classification plays a crucial role, particularly within a
circular economy framework. Traditional image-based systems, while improving, often struggle to distinguish visually similar objects
or consider non-visual factors like recyclability and environmental impact. To address these challenges, our proposed model builds
upon and surpasses existing approaches by integrating multiple enhancements, including hierarchical feature extraction, FAISS-based
similarity search, and attention-driven refinement. Unlike prior works that focus solely on CNN architectures or similarity-based re
trieval, our method combines these strengths to achieve superior classification accuracy. The incorporation of advanced regularization
techniques and calibration mechanisms further ensures robustness and reliability in real-world applications. Experimental compar
isons against benchmark models consistently demonstrate our framework’s effectiveness in handling diverse waste images, making it
a highly promising solution for automated waste sorting and sustainable environmental management.
Fig. 1 shows the process flow of the proposed garbage classification model.

3. Dataset and preprocessing

The dataset used in this study was sourced from Kaggle, a widely recognized platform for machine learning datasets. Although
the original data set comprises 12 classes, including three separate categories for glass, we retained only the white glass class and
excluded the other two glass related categories. Consequently, the final dataset used in this work consists of 𝐶 = 10 distinct semantic
classes: plastic, paper, metal, clothes, shoes, trash, biological, white-glass, cardboard, and battery.

4
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Fig. 1. Workflow of the proposed garbage classification model.

Given the inherent imbalance in real-world waste datasets, data augmentation techniques are employed to enhance generalization
and robustness.

3.1. Data augmentation techniques

To enhance model generalization and mitigate overfitting, subsequent data augmentation methods are applied to each input
image 𝑋 :

• Random Resized Cropping: A random region of the image is cropped and resized to a fixed size, ensuring that models learn
scale-invariant features.
• Horizontal Flipping: Each image is flipped horizontally with a probability 𝑝 = 0.5, augmenting the dataset symmetrically.
• Color Jittering: Random modifications in brightness, contrast, saturation, and hue are applied, defined as:

𝑋 ′ = 𝛼𝑋 + 𝛽,
where 𝛼 and 𝛽 are randomly sampled coeﬃcients for brightness and contrast adjustments.
• Rotation and Aine Transformations: Random rotations 𝜃 within a defined range are applied:

𝑋 ′ = 𝑅(𝜃)𝑋,
where 𝑅(𝜃) is the rotation matrix:
[ ]
cos 𝜃 − sin 𝜃
𝑅(𝜃) = .
sin 𝜃 cos 𝜃
Aine transformations include translation, scaling, and shearing, defined as:

𝑋 ′ = 𝐴𝑋 + 𝑏,
where 𝐴 is a transformation matrix and 𝑏 is a translation vector.
• Normalization: Each pixel intensity is normalized to a zero-mean, unit-variance distribution using the channel-wise mean 𝜇𝑐
and standard deviation 𝜎𝑐 :
𝑋 − 𝜇𝑐
𝑋′ = .
𝜎𝑐

3.2. Mathematical representation of transformations

Let 𝑇 denote the series of transformations applied to an input image 𝑋 , then the final transformed image 𝑋 ′ is given by:

𝑋 ′ = 𝑇 (𝑋) = 𝑇𝑛 ◦ 𝑇𝑛−1 ◦ ⋯ ◦ 𝑇1 (𝑋),

5
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

where 𝑇𝑖 represents an individual transformation from the set {𝑇1 , 𝑇2 , ..., 𝑇𝑛 }. The order and probability of each transformation are
carefully tuned to maximize model performance.
These preprocessing techniques ensure that the dataset is suﬃciently diverse and robust, allowing deep learning models to extract
meaningful and invariant features for accurate classification.

4. Model architecture

4.1. Feature extraction using ResNet-152

In this work, we utilize ResNet-152, a deep convolutional neural network, as a feature extractor to generate high-dimensional
feature representations of input images. ResNet-152 represents a type of Residual Network (ResNet) architecture, which incorporates
skip connections to facilitate gradient flow and improve training stability for deep networks. The ResNet-152 is fine-tuned to capture
relevant features from garbage images. Features are obtained using the Algorithm 1.

Algorithm 1 Feature Extraction with ResNet-152.

Require: Image dataset 𝐷 , Pretrained ResNet-152 model 𝑀
Ensure: Extracted feature vectors 𝐹
1: Initialize 𝑀 with pretrained weights
2: for each image 𝐼 in 𝐷 do
3: Resize 𝐼 to 224 × 224
4: Normalize 𝐼
5: Pass 𝐼 through 𝑀 to extract feature vector 𝐹𝑖
6: Store 𝐹𝑖 in 𝐹
7: end forreturn 𝐹

4.1.1. Mathematical formulation

For an input image 𝑋 ∈ ℝ𝐻×𝑊 ×𝐶 , where 𝐻 , 𝑊 , and 𝐶 represents the height, width, and number of channels (typically 3 for RGB
images), the technique used to extract features can be formally defined as:

𝐹 (𝑋) = 𝜙(𝑋; 𝜃) ∈ ℝ𝑑 ,
where:

• 𝜙(𝑋; 𝜃) represents the ResNet-152 network parameterized by weights 𝜃 .

• 𝐹 (𝑋) ∈ ℝ𝑑 is the resulting feature vector, where 𝑑 = 2048 corresponds to the output of the final average pooling layer.

4.1.2. Implementation details

The feature extraction pipeline follows these steps:

1. Resize the input image 𝑋 to a standard resolution (e.g., 224 × 224 pixels).
2. Normalize the image using channel-wise mean and standard deviation:
𝑋 −𝜇
𝑋′ = ,
𝜎
where 𝜎 and 𝜇 are the standard deviation and mean vectors computed over the dataset.
3. Pass the normalized image through ResNet-152 up to the final global average pooling layer:

𝐹 (𝑋) = GAP(𝜓(𝑋 ′ )),

where 𝜓(𝑋 ′ ) represents the intermediate feature maps produced by the convolutional layers, and GAP is the global average
pooling operation.
4. The extracted feature vector 𝐹 (𝑋) is then used for downstream tasks such as similarity search and classification.

4.1.3. Feature utilization

The extracted feature vectors serve two primary purposes:

• FAISS-based Similarity Search: The feature vectors are indexed in a FAISS database to enable eﬃcient nearest neighbor search.
• Classification: The obtained features are input into a fully connected classifier to predict the corresponding class label.

By leveraging the deep feature representations from ResNet-152, we ensure that our model captures rich semantic information,
facilitating accurate image classification and eﬃcient similarity retrieval.

6
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

4.2. FAISS-based similarity search

In order to enhance feature representations and facilitate eﬃcient nearest-neighbor retrieval, we employ FAISS (Facebook AI
Similarity Search). FAISS is a highly optimized library designed for fast searching in high-dimensional vector spaces, making it
particularly suitable for large-scale image retrieval tasks.
Feature vectors extracted from images are indexed using FAISS for eﬃcient similarity retrieval. The FAISS training process is
outlined in Algorithm 2.

Algorithm 2 Training FAISS Index.

Require: Feature vectors 𝐹 , FAISS Index 𝐼
Ensure: Trained FAISS index 𝐼
1: Initialize FAISS Index 𝐼
2: Add all feature vectors in 𝐹 to 𝐼
3: Train 𝐼 for optimized search performance
4: return Trained index 𝐼

4.2.1. Problem formulation

Given a feature embedding space ℝ𝑑 , where each sample is represented by a feature vector of dimension 𝑑 = 2048, our goal is to
retrieve the most similar training samples to a given query feature vector 𝑞 ∈ ℝ2048 . The similarity search operation is defined as:

𝐷, 𝐼 = FAISS.search(𝑞, 𝑘),
where:

• 𝐷 ∈ ℝ𝑘 is the distance vector containing the distances between the query feature and the 𝑘 nearest neighbors.
• 𝐼 ∈ ℕ𝑘 is the index vector containing the indices of the 𝑘 nearest neighbors in the database.

4.2.2. Indexing and search strategy

FAISS provides multiple indexing strategies optimized for diﬀerent search scenarios. For our task, we utilize the Flat Index, which
performs an exact nearest-neighbor search using L2 (Euclidean) distance:
𝑑
∑
𝑑(𝑥, 𝑦) = ‖𝑥 − 𝑦‖22 = (𝑥𝑖 − 𝑦𝑖 )2 .
𝑖=1

This ensures precise retrieval at the cost of increased computational complexity. For scalability, alternative FAISS indexes such as
the IVFFlat (Inverted File Index) and HNSW (Hierarchical Navigable Small World Graph) [36] can be utilized to approximate nearest
neighbors eﬃciently.

4.2.3. Implementation details

The FAISS index is constructed using the training set feature representations:

1. Extract feature vectors {𝑥𝑖 }𝑁

𝑖=1
from the pre-trained ResNet-152 model.
2. Normalize the vectors to unit length (if cosine similarity is used).
3. Build the FAISS index using the L2 metric:

index = faiss.IndexFlatL2(𝑑).
4. Add all training feature vectors to the index:

index.add(𝑋),
where 𝑋 ∈ ℝ𝑁×𝑑 is the matrix containing 𝑁 feature vectors.
5. For a given query vector 𝑞 , perform the nearest-neighbor search:

𝐷, 𝐼 = index.search(𝑞, 𝑘).

4.3. Attention-based feature fusion

In this subsection, we present an attention-based feature fusion mechanism that integrates the semantic information of an input
feature vector and its most similar retrieved neighbor. Unlike conventional self-attention across sequence tokens, we concatenate the
input feature and its FAISS-retrieved counterpart into a single 4096-dimensional vector, which is refined using a scaled dot-product
self-attention mechanism. Fig. 2 illustrates the overall computational flow of this attention-based fusion, from feature concatenation
through to the generation of the enhanced feature vector.

7
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Fig. 2. Self-attention-based fusion of the input and retrieved feature vectors.

4.3.1. Mathematical formulation

Let 𝐟𝑞 , 𝐟𝑠 ∈ ℝ𝑑 represent the original feature and its retrieved neighbor (both of dimension 𝑑 = 2048). We first concatenate them:

𝐱 = [𝐟𝑞 ; 𝐟𝑠 ] ∈ ℝ2𝑑
This combined vector is used to compute query, key, and value projections:

𝑄 = 𝐱𝑊 𝑄 , 𝐾 = 𝐱𝑊 𝐾 , 𝑉 = 𝐱𝑊 𝑉
where 𝑊 𝑄 , 𝑊 𝐾 , 𝑊 𝑉 ∈ ℝ2𝑑×2𝑑 are learnable weight matrices. The self-attention output is calculated as:
( )
𝑄𝐾 𝑇
Attention(𝐱) = softmax √ 𝑉
2𝑑
This attention-refined vector is used for classification.

4.3.2. Implementation in neural networks

In our PyTorch implementation, the concatenated feature 𝐱 ∈ ℝ4096 is processed through linear layers to compute 𝑄, 𝐾 , and 𝑉 ,
and then the attention score matrix is applied. The model uses a single-head attention layer for computational eﬃciency.

4.3.3. Signiﬁcance in classification

As described in Algorithm 3, the model applies attention-based feature enhancement. By applying attention-based fusion, we
enhance the feature representation by dynamically adjusting weights based on the relationships between features. This approach
improves the model’s robustness to variations in input data and enhances its ability to distinguish between visually similar classes.

8
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Algorithm 3 Attention-Based Feature Enhancement.

Require: Input feature 𝐟𝑞 , FAISS index 𝐼 , attention module 𝐴
Ensure: Enhanced feature 𝐟enh
1: Retrieve similar feature 𝐟𝑠 from 𝐼 using 𝐟𝑞
2: Concatenate: 𝐱 ← [𝐟𝑞 ; 𝐟𝑠 ]
3: Compute 𝑄, 𝐾, 𝑉 ← 𝐱𝑊 𝑄 , 𝐱𝑊 𝐾 , 𝐱𝑊 𝑉
√
4: Compute attention weights: 𝐴 ← softmax(𝑄𝐾 ⊤ ∕ 2𝑑)
5: Compute attended feature: 𝐟enh ← 𝐴𝑉
6: return 𝐟enh

4.4. Deep learning classifier

The proposed classification model comprises multiple fully connected layers, each incorporating batch normalization, Drop
Connect, and dropout regularization. These components enhance the model’s generalization ability by mitigating overfitting and
improving stability during training.

4.4.1. Feedforward network structure

The feedforward propagation through the fully connected layers is mathematically expressed as:

𝐻𝑖 = 𝜎(𝐵𝑁(𝑊𝑖 𝐻𝑖−1 + 𝑏𝑖 )),

where:

• 𝐻𝑖 represents the hidden layer activations at layer 𝑖.

• 𝑊𝑖 ∈ ℝ𝑑𝑖 ×𝑑𝑖−1 and 𝑏𝑖 ∈ ℝ𝑑𝑖 are the weight matrix and bias vector for layer 𝑖.
• 𝐵𝑁(⋅) denotes the batch normalization operation, which standardizes activations across a mini-batch:

𝐻 − 𝜇𝐵
𝐵𝑁(𝐻𝑖 ) = √𝑖 𝛾 + 𝛽,
𝜎𝐵2 + 𝜖

where 𝜇𝐵 and 𝜎𝐵 2 are the batch mean and variance, 𝛾 and 𝛽 are learnable parameters, and 𝜖 is a small constant for numerical

stability.
• 𝜎(⋅) is the activation function, such as ReLU, which introduces non-linearity into the model.

4.4.2. DropConnect mechanism

DropConnect is an extension of dropout, where instead of randomly dropping activations, individual weights within a layer are
randomly removed during training. This can be formalized as:

𝐻𝑖′ = 𝐻𝑖 ⋅ 𝑀, 𝑀 ∼ Bernoulli(𝑝),
where:

• 𝑀 is a binary mask with elements sampled independently from a Bernoulli distribution with probability 𝑝.
• Each element 𝑀𝑗𝑘 in the mask determines whether the corresponding weight 𝑊𝑗𝑘 is active (𝑀𝑗𝑘 = 1) or deactivated (𝑀𝑗𝑘 = 0).
• This ensures that diﬀerent subsets of the weight matrix are used in each forward pass, improving robustness and reducing reliance
on specific weight connections.

4.4.3. Regularization and optimization

To further stabilize training and enhance generalization, dropout is applied to activations:

𝐻𝑖′′ = Dropout(𝐻𝑖′ , 𝑞),

where 𝑞 is the dropout probability, ensuring neurons are randomly silenced during training. The final network optimization is per
formed using an adaptive optimizer such as AdamW:

𝑚𝑡
𝜃 (𝑡+1) = 𝜃 (𝑡) − 𝜂 √ ,
𝑣𝑡 + 𝜖
where 𝑚𝑡 and 𝑣𝑡 are biased first- and second-moment estimates, respectively.
These techniques collectively contribute to a more robust deep learning classifier, ensuring improved feature extraction and
classification performance.

9
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

4.5. Deep learning and artificial neural networks

Deep Learning (DL) is a specialized subset of machine learning that employs Artificial Neural Networks (ANNs) with multiple hidden
layers, often referred to as deep neural networks. While ANNs are computational models inspired by the biological neural networks in
the brain, Deep Learning speciﬁcally leverages architectures with many layers to learn hierarchical feature representations directly
from data.

Diﬀerences between deep learning and artificial neural networks: Artificial Neural Networks typically consist of an input layer, one or
more hidden layers, and an output layer. Traditional ANNs generally have a shallow architecture with a limited number of hidden
layers, which restricts their ability to model complex data distributions. Deep Learning extends ANNs by increasing the number of
hidden layers, enabling the model to capture more complex patterns through hierarchical feature extraction.

Advantages of deep learning in our proposed model:

• Hierarchical Feature Extraction: Our proposed model employs a deep architecture, specifically ResNet-152, which extracts
multi-level visual features. This hierarchical approach improves the model’s ability to distinguish subtle differences among
garbage classes.
• Improved Accuracy and Generalization: Deep Learning models, like the one used here, achieve higher accuracy compared to
shallow ANNs by learning complex feature interactions, as evidenced by the 96.64% classification accuracy obtained.
• End-to-End Learning: Our proposed framework integrates feature extraction, attention-based fusion, and classification in an
end-to-end trainable pipeline, reducing the need for manual feature engineering.
• Robustness with Advanced Regularization: Techniques such as DropConnect and label smoothing improve model generaliza
tion and reduce overfitting, which are more effective in deep architectures.
• Adaptability to Complex Data: The deep model architecture, combined with attention mechanisms and similarity search
(FAISS), effectively handles the variability and complexity of real-world waste images.

Deep Learning builds upon the foundation of Artificial Neural Networks by employing deeper, more complex network architectures
that enable superior performance and adaptability, particularly for challenging classification tasks such as automated waste sorting.

4.6. Convolutional neural networks and their role in our proposed framework

Convolutional Neural Networks (CNNs) are a specialized class of deep neural networks designed to process grid-like data, such
as images. They leverage learnable convolutional filters that slide across spatial dimensions to capture local patterns, progressively
learning hierarchical representations from low-level textures to high-level semantic features. Due to their parameter sharing and
sparse connectivity, CNNs are both computationally efficient and highly effective for image-based tasks.
In our proposed garbage classification framework, we employ ResNet-152, a very deep CNN architecture based on residual learning
as the primary feature extractor. ResNet-152 has demonstrated state-of-the-art performance in various computer vision benchmarks
due to its ability to mitigate the vanishing gradient problem through the use of skip connections. This allows for the training of
extremely deep networks without degradation in accuracy. We utilize the pretrained ResNet-152 model to extract 2048 dimensional
deep visual embeddings from waste images, capturing fine grained characteristics essential for distinguishing between visually similar
categories such as plastic and white-glass.
These deep embeddings are further enhanced using a self-attention mechanism. We introduce a feature fusion module that in
corporates a FAISS-based similarity search to retrieve the most semantically similar embeddings from the training set. The retrieved
features are concatenated with the original features and refined through a multi-head self-attention network, allowing the model to
focus on salient patterns while suppressing irrelevant variations. This mechanism mimics the cognitive process of referencing similar
past observations to make more informed decisions.
To improve the generalization capability of the classifier, we integrate several regularization strategies, including DropConnect,
dropout, and label smoothing. DropConnect randomly removes individual weights during training, effectively acting as a form of
model ensemble, while label smoothing prevents the model from becoming overly confident by distributing a small portion of the
target probability mass across non ground truth classes. These additions mitigate overfitting and enhance the robustness of the final
predictions. Our proposed framework leverages the hierarchical feature learning capabilities of CNNs augmented by attention based
refinement and similarity based enhancement to deliver highly accurate and generalizable performance on the waste classification
task.

5. Training with label smoothing

Deep neural networks often exhibit overconfidence in their predictions, which can lead to reduced generalization performance.
Label smoothing is a regularization technique designed to address this issue by preventing the model from assigning full probability
to a single class. Instead, it redistributes a small portion of the probability mass among all classes, thereby encouraging the model to
be less certain. As outlined in 4, label smoothing improves generalization during training.

10
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

5.1. Mathematical formulation

Given a classification task with 𝐶 classes, let 𝐲 = (𝑦1 , 𝑦2 , … , 𝑦𝐶 ) represent the one-hot encoded ground truth label, where 𝑦𝑘 = 1
for the correct class 𝑘 and 𝑦𝑖 = 0 for all other classes. Traditional cross-entropy loss is defined as:
𝐶
∑
𝐿𝐶𝐸 = − 𝑦𝑖 log 𝑃𝑖 ,
𝑖=1

where 𝑃𝑖 is the predicted probability for class 𝑖. In label smoothing, instead of using a hard one-hot encoding, the ground truth
distribution is smoothed as:
𝜖
𝑢𝑖 = (1 − 𝜖)𝑦𝑖 + ,
𝐶
where 𝜖 ∈ [0, 1] is the smoothing parameter that controls how much probability mass is shifted from the true class to the others.
When 𝜖 = 0, the loss reduces to standard cross-entropy.
The smoothed loss is then computed as:
𝐶
∑
𝐿𝑠𝑚𝑜𝑜𝑡ℎ = − 𝑢𝑖 log 𝑃𝑖 .
𝑖=1

5.2. Final loss function

The overall loss function combining cross-entropy and smoothed loss is given by:

𝐿 = (1 − 𝜖)𝐿𝐶𝐸 + 𝜖𝐿𝑠𝑚𝑜𝑜𝑡ℎ .
This formulation ensures that the model does not become overly confident in its predictions and can better handle mislabeled or
ambiguous data.

5.3. Implementation in training

During training, the label smoothing technique is applied in the computation of the loss function. The training process iterates over
batches of data, where each label is transformed before computing the loss. This technique is particularly useful in tasks involving
large datasets with potential annotation errors, as it improves generalization and mitigates overfitting.
Experimental results demonstrate that label smoothing leads to improved robustness and generalization across various classi
ﬁcation tasks, making it a widely adopted technique in modern deep learning architectures. The training process is outlined in
Algorithm 4.

Algorithm 4 Training with Label Smoothing.

Require: Model 𝑀 , Training Data 𝐷 , Learning Rate 𝜂 , Label Smoothing 𝜖 , Epochs 𝐸
1: for 𝑒𝑝𝑜𝑐ℎ = 1 to 𝐸 do
2: Shuﬄe training data 𝐷
3: for each batch (𝑋, 𝑌 ) in 𝐷 do
4: Extract features using ResNet152: 𝐹 = ResNet152(𝑋)
5: Enhance features using Attention Fusion: 𝐹 ′ = AttentionFusion(𝐹 )
6: Compute logits: 𝐿 = 𝑀(𝐹 ′ )
7: Compute label-smoothed loss:  = LabelSmoothLoss(𝐿, 𝑌 , 𝜖)
8: Backpropagate loss and update model parameters
9: end for
10: Evaluate model on validation set
11: end for

6. Experimental results

To evaluate the effectiveness of our proposed garbage classification model, we use common classification metrics such as accuracy,
precision, recall, F1-score, and the confusion matrix. These measures offer a thorough insight into the model’s performance across
different garbage categories.

6.1. Evaluation algorithm

The evaluation process is described in Algorithm 5.

11
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Algorithm 5 Model Evaluation.

Require: Trained Model 𝑀 , Test Data 𝑇
1: Initialize empty lists for predictions and labels
2: for each batch (𝑋, 𝑌 ) in 𝑇 do
3: Extract features: 𝐹 = ResNet152(𝑋)
4: Enhance features: 𝐹 ′ = AttentionFusion(𝐹 )
5: Compute logits: 𝐿 = 𝑀(𝐹 ′ )
6: Compute predicted labels: 𝑌 ′ = arg max 𝐿
7: Store 𝑌 ′ and ground-truth 𝑌
8: end for
9: Compute accuracy, precision, recall, F1-score

Table 1
Training Performance Over 25 Epochs.

Epoch Loss Accuracy (%)

1 749.10 31.16
2 572.54 66.04
3 417.51 82.58
4 315.20 90.48
5 268.40 92.81
6 252.52 93.43
7 240.87 94.25
8 230.43 95.37
9 222.37 96.12
10 219.09 96.40
15 201.57 98.14
20 191.30 99.26
25 187.75 99.54

6.2. Training performance

The model is trained for 25 epochs, with loss and accuracy recorded at each epoch. The training process demonstrates a rapid
convergence, as shown in Table 1. The loss decreases signiﬁcantly from 749.10 in the first epoch to 187.75 in the final epoch, while
accuracy improves from 31.16% to 99.54%.

6.3. Evaluation metrics

To quantitatively evaluate the model’s classification capability, we compute multiple performance metrics:

• Accuracy:
𝑇𝑃 +𝑇𝑁
Accuracy =
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁
where the numbers of true positives (𝑇 𝑃 ), false positives (𝐹 𝑃 ), true negatives (𝑇 𝑁 ), and false negatives (𝐹 𝑁 ) are denoted,
respectively.
• Precision:
𝑇𝑃
Precision =
𝑇𝑃 + 𝐹𝑃
• Recall:
𝑇𝑃
Recall =
𝑇𝑃 + 𝐹𝑁
• F1-score:
2 × Precision × Recall
F1-score =
Precision + Recall
• Confusion Matrix: The confusion matrix delivers an in-depth analysis of classification performance across diﬀerent categories,
as shown in Table 3.

6.4. Classification performance

The overall classification accuracy achieved by our proposed model is 96.64%. Table 2 presents a detailed classification report,
including precision, recall, and F1-score for each garbage category.

12
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Table 2
Classification Report.

Class Precision Recall F1-score Support

0 0.95 0.97 0.96 196

1 1.00 1.00 1.00 207
2 0.96 0.93 0.94 189
3 0.99 0.99 0.99 1045
4 0.91 0.95 0.93 146
5 0.94 0.92 0.93 204
6 0.93 0.87 0.90 169
7 0.97 0.98 0.97 420
8 0.99 0.97 0.98 132
9 0.89 0.95 0.92 148

Overall 0.97 0.97 0.97 2856

Fig. 3. The confusion matrix is presented as a heatmap, where darker colors represent greater value concentrations.

6.5. Confusion matrix

The confusion matrix illustrates the model’s classification performance across all categories, highlighting both accurate predictions
and common misclassifications. While Table 3 provides the raw numerical values, we also include a heatmap (Fig. 3) to visually
represent the distribution of predictions. This graphical representation enables quicker and more intuitive interpretation of class-wise
performance.

6.6. Model comparison

To assess the efficiency of our classification model, we compare its performance against several widely used deep learning ar
chitectures: VGG16, DenseNet121, MobileNetV2, EfficientNetB0, ConvNeXtTiny, and ResNet152. These models are fine-tuned and
evaluated on the same dataset to ensure a fair comparison. Table 4 presents a detailed performance analysis.
Among the conventional models, VGG16 achieved the lowest accuracy at 84.10%, which can be attributed to its relatively shallow
architecture and higher parameter count, making it less effective in extracting complex waste classification features. DenseNet121 and
EfficientNetB0 performed significantly better (89.71%) due to their efficient feature propagation (DenseNet) and optimized depth

13
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Table 3
Confusion Matrix.

0 1 2 3 4 5 6 7 8 9

0 190 0 2 0 2 1 1 0 0 0
1 0 206 0 0 0 1 0 0 0 0
2 3 0 175 1 3 6 0 1 0 0
3 0 0 0 1036 0 0 0 9 0 0
4 2 0 1 0 139 0 3 0 0 1
5 3 1 4 2 1 188 2 2 0 1
6 0 0 0 0 5 3 147 0 0 14
7 0 0 0 6 0 2 2 410 0 0
8 1 0 0 1 0 0 1 0 128 1
9 0 0 0 0 3 0 2 1 1 141

Table 4
Comparison of Classification Performance Across Diﬀerent Models.

Model Accuracy (%) Precision Recall F1-score

VGG16 84.10 0.85 0.84 0.84

DenseNet121 89.71 0.90 0.90 0.90
MobileNetV2 88.31 0.89 0.88 0.88
EﬃcientNetB0 89.71 0.90 0.90 0.90
ConvNeXtTiny 92.37 0.93 0.92 0.92
ResNet152 92.93 0.93 0.93 0.93
ViT-B/16 93.26 0.94 0.93 0.93
Swin Transformer (Tiny) 94.41 0.95 0.94 0.94
Proposed Model 96.64 0.97 0.97 0.97

wise convolution operations (EfficientNet). While ConvNeXtTiny achieved a competitive accuracy of 92.37%, ResNet152 slightly
outperformed it with 92.93%, demonstrating its ability to capture fine-grained details in waste images. Among all state-of-the-art
methods, the Swin Transformer achieves the highest accuracy of 94.41%, outperforming ViT’s 93.26%, highlighting its superior
capability in capturing complex image features for classification.
Our proposed model, which integrates ResNet152 with FAISS-based similarity search and attention mechanisms, achieves the
highest accuracy of 96.64%. The inclusion of attention mechanisms helps in selectively emphasizing important features, while FAISS
retrieval refines classification by incorporating similar embeddings. Regularization techniques, including Dropout and DropConnect,
further enhance model generalization. The results demonstrate that integrating FAISS-based retrieval and attention mechanisms
significantly improves garbage classification accuracy. Compared to traditional deep learning architectures, our proposed model
achieves state-of-the-art performance, making it a robust solution for intelligent waste sorting and recycling efficiency enhancement.

6.7. Statistical signiﬁcance analysis

To evaluate the reliability and robustness of our proposed model, we conducted a statistical significance analysis by comparing its
classification accuracy to that of the Swin Transformer, which achieved the highest performance (94.41%) among all baseline models
considered in this study.
Our proposed model is trained and evaluated across six independent runs using different random seeds and varied data splits to
ensure robustness and generalization. The resulting accuracies are [96.64%, 96.60%, 96.11%, 96.88%, 96.53%, 96.30%],
yielding a mean accuracy of 96.51% with a standard deviation of 0.271. For comparison, we also run the Swin Transformer six times
under the same evaluation protocol, obtaining accuracies of [94.41%, 94.30%, 94.45%, 94.51%, 94.35%, 94.40%], with
a mean of 94.40% and a standard deviation of 0.074.
To determine whether this observed performance improvement is statistically significant, we conducted a two-sample, one-tailed
t-test, which does not assume equal variances. The null hypothesis (𝐻0 ) posits that the mean accuracy of our model is less than
or equal to that of the Swin Transformer (𝜇proposed ≤ 𝜇swin ), while the alternative hypothesis (𝐻1 ) asserts that our model achieves
superior accuracy (𝜇proposed > 𝜇swin ).
The resulting t-statistic is 18.394 with an associated p-value of 1.276 × 10−6 . The critical value at a significance level of 𝛼 = 0.05
with 5 degrees of freedom is 2.015, which is far exceeded by the observed t-statistic of 18.436. At a significance level of 𝛼 = 0.05,
the null hypothesis is strongly rejected, confirming that the observed improvement is statistically significant. Furthermore, the 95%
confidence interval for the mean accuracy of the proposed model was computed to be [96.23%, 96.79%], clearly exceeding the
average performance of the Swin Transformer.

6.8. Model architecture and eﬃciency

The proposed classification framework is designed for optimal performance on standard hardware, explicitly targeting eﬃcient
execution on an Intel Core i5 CPU with 16 GB RAM and a batch size of 32. Despite these modest hardware constraints, our proposed

14
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Table 5
Approximate Comparison of Model Eﬃciency with Popular Architectures.

Model Parameters (M) GFLOPs Model Size (MB) Inference Time (ms) (CPU/GPU)

VGG16 138.4 15.47 528 69.5 / 4.2

ResNet-152 60.4 11.5 232 127.4 / 6.5
DenseNet-121 8.1 2.83 33 77.1 / 5.4
MobileNetV2 3.5 0.30 14 25.9 / 3.8
EicientNet-B0 5.27 0.39 29 46.0 / 4.9
ConvNeXt-Tiny 28.6 4.46 109 -- / 3.02
ViT-B/16 86.6 17.56 330.3 -- / 6.1
Swin-Tiny 28.8 9.03 110 -- / 6.81
Proposed Model (Ours) 11.2 0.0224 42.75 0.14 / --

Table 6
Ablation Study Results on Classification Performance.

Components Enabled Accuracy (%) Precision Recall F1-score

FAISS, DropConnect, Label Smoothing (No Attention) 95.80 0.96 0.96 0.96
FAISS, Attention, Label Smoothing (No DropConnect) 95.10 0.95 0.95 0.95
FAISS, Attention, DropConnect (No Label Smoothing) 95.50 0.96 0.95 0.95
Attention, DropConnect, Label Smoothing (No FAISS) 93.90 0.94 0.94 0.94
All components (FAISS, Attention, DropConnect, Label Smoothing) 96.64 0.97 0.97 0.97

model achieves robust performance through a carefully structured hybrid architecture that integrates pretrained deep features with
lightweight, trainable modules. Our proposed model utilizes a pretrained ResNet-152 backbone as a frozen feature extractor, chosen
for its superior representational capacity on large-scale datasets such as ImageNet. Freezing the ResNet-152 weights entirely during
training eliminates the computational overhead of backpropagating through 152 layers, signiﬁcantly reducing memory consumption,
training time, and floating-point operations.
Each input image is transformed into a 2048-dimensional embedding. To enhance this representation without modifying the back
bone, we introduce a lightweight, self-attention-based fusion mechanism. This component integrates semantically similar embeddings
retrieved using a fast FAISS-based nearest neighbor search. The resulting 4096-dimensional feature vector is passed through a six-layer
custom multi-layer perceptron (MLP) classifier. The model, comprising 11.19 million trainable parameters, incorporates regular
ization techniques including Dropout, and DropConnect to promote generalization. Despite its expressive power, the total model size
remains compact at just 42.75 MB, making it highly suitable for deployment in low power and edge devices.

6.8.1. Inference speed and computational complexity

Although the architecture incorporates a large pretrained backbone, our proposed model achieves high inference efficiency through
modular design and by freezing the deep feature extractor. The complete inference pipeline, including feature extraction, FAISS-based
neighborhood retrieval, attention-based fusion, and final classification, requires only 0.14 milliseconds per image on average on an
Intel i5 processor with 16 GB RAM.
In terms of arithmetic operations, our proposed model executes approximately 11.2 million multiply-accumulate operations
(MMAC), which is orders of magnitude fewer than traditional end-to-end convolutional or transformer-based architectures. This
low complexity, combined with a fast inference rate and minimal memory footprint, enables real-time deployment even on standard
consumer-grade CPUs.
While many models in Table 5 offer competitive accuracy, they demand significantly more computational resources, hindering
their practical use in edge or real-time settings. In contrast, our proposed model achieves an optimal trade-off between accuracy and
efficiency by combining a strong pretrained encoder with lightweight attention-based enhancement and regularized classification.
The FAISS based neighbor retrieval introduces contextual awareness without increasing the trainable parameter count, and the
attention module adaptively fuses the retrieved information with the original features. These design choices allow the model to
generalize better and retain discriminative power, even with constrained computational resources. Our proposed model exemplifies
a compelling balance of architectural depth, computational thriftiness, and deployment feasibility. It is well suited for embedded
applications such as smart waste management, IoT-based environmental sensing, and low-latency vision systems in smart cities,
achieving high performance on standard CPUs without the need for GPU acceleration.

6.9. Ablation study

To assess the contribution of each individual component in our proposed framework, we conducted a comprehensive ablation
study. Speciﬁcally, we evaluated the impact of four key components: FAISS-based retrieval, the attention mechanism, DropConnect
regularization, and label smoothing. In each experiment, one component was selectively disabled while the others remained active,
and the resulting classification performance was recorded. Table 6 presents the results of these experiments.
The ablation results clearly demonstrate that each component contributes meaningfully to the overall classification performance:

15
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

Fig. 4. Workflow for integrating the proposed model into a smart waste management system.

• Attention Mechanism: Disabling the attention mechanism results in a noticeable decrease in accuracy and F1-score, indicating
its crucial role in eﬀectively fusing feature representations derived from the FAISS retrieval module.
• DropConnect Regularization: The absence of DropConnect leads to a performance drop across all metrics, highlighting its
eﬀectiveness in mitigating overfitting and enhancing generalization.
• Label Smoothing: Although its removal causes a relatively smaller drop in performance, label smoothing still contributes posi
tively by improving model calibration and reducing overconfidence in predictions.
• FAISS Retrieval: When the FAISS module is excluded, the model experiences the most substantial performance degradation,
particularly in accuracy. This underscores the importance of nearest-neighbor feature augmentation in enriching the input rep
resentation.

In general, the full model including FAISS retrieval, attention, DropConnect, and label smoothing achieves the highest classification
performance. It attains an accuracy of 96.64% along with balanced precision, recall, and F1 score values of 0.97. These results confirm
the synergistic contribution of all components and validate the architectural decisions underlying the proposed framework. The
ablation study thus provides strong empirical evidence that each individual module is essential for achieving optimal classification
performance.

7. Integration into waste management systems

Improving waste classification accuracy is essential to advancing global sustainability goals and reducing the burden on landfills
and incineration systems. Manual sorting processes are labor-intensive, error-prone, and susceptible to contamination of recyclable
materials. The proposed computer vision (CV) model addresses these challenges by automating classification with high accuracy
and reliability. The model achieved an accuracy of 97.67%, with precision, recall, and F1-score each reaching 0.98 on a diverse
dataset that includes categories such as batteries, biological waste, cardboard, clothes, metal, paper, plastic, shoes, trash, and white
glass. These categories represent a mixture of recyclable, hazardous, and non-recyclable materials, positioning the model as a strong
candidate for deployment in real-world waste management infrastructure. Fig. 4 outlines how the model can be integrated into a
modern waste processing system. From image capture at the point of disposal to downstream decision-making and policy feedback,
the system supports data-driven and environmentally responsible waste handling.
The system allows recyclable items such as metals, white glass, and plastics to be correctly identified and separated from haz
ardous materials like batteries or general trash. This reduces contamination in the recyclable stream and improves the quality of
recovered materials for reuse or remanufacturing. In addition to its technical strengths, the model aligns with international and
national waste management goals. For instance, the European Union’s Waste Framework Directive [37] and India’s Solid Waste Man
agement Rules [38] emphasize source segregation and improved recycling eﬃciency. By incorporating the model into smart bins or
material recovery facilities (MRFs), waste authorities can enforce compliance and monitor the flow of materials with minimal manual
intervention. Beyond compliance, the model supports environmental impact reduction by improving sorting accuracy, which helps
reduce the energy and emissions associated with processing mixed or misclassified waste. Moreover, the system can be extended to

16
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

provide users with real-time feedback about their disposal habits, fostering public awareness and behavioral change toward better
waste practices [39]. Finally, the granular classification data produced by the system enables city planners, waste contractors, and
policymakers to make informed decisions about infrastructure investments, recycling targets, and circular economy initiatives. The
combination of high-performance AI with clear operational value marks a meaningful step toward sustainable, data-driven waste
management.

8. Conclusion and future directions

In this study, we have proposed a novel and robust deep learning framework for automated garbage classification, integrating
advanced feature extraction, similarity-based attention fusion, and resilient training strategies. Our approach capitalizes on the rep
resentational power of a pretrained ResNet-152 model to extract high-dimensional visual features, which are further refined through
a self-attention mechanism designed to capture relationships between input images and their most semantically similar counterparts.
By employing a similarity search using FAISS, the framework retrieves highly relevant features from the dataset, effectively enriching
the contextual representation of each input sample. This fusion strategy enhances the discriminative power of the feature space and
enables the model to distinguish between visually similar but semantically distinct classes efficiently. Additionally, our classification
architecture incorporates several deep regularization techniques, including DropConnect and dropout, to prevent overfitting, while
label smoothing and class-balanced loss further improve the stability and generalization of the training process. The proposed model
achieves a high classification accuracy of 96.64% on a diverse ten-class waste dataset, with consistently strong performance across
all categories in terms of precision, recall, and F1-score. We also performed a statistical significance analysis comparing our model’s
accuracy with that of the best state-of-the-art model, demonstrating that our model achieves significantly higher accuracy. These
results demonstrate the effectiveness and robustness of our framework in handling the visual diversity and imbalance arises in real
world waste classification scenarios. The integration of deep feature learning, attention-based feature fusion, similarity retrieval, and
regularized optimization contributes a comprehensive and scalable methodology to the field of intelligent environmental systems.
Despite the promising performance of our method, several directions remain open for future exploration to enhance its practical
ity, scalability, and environmental impact. One immediate priority is to develop new model compression strategies, such as pruning,
quantization, and knowledge distillation, to reduce computational overhead and memory footprint, making the model suitable for
deployment on low-power edge devices and mobile platforms. Real-time classification is essential for practical integration into smart
bins or waste sorting robots, and thus future work should also focus on optimizing inference latency and throughput without com
promising accuracy. Furthermore, the current framework relies exclusively on visual features. One can extend it into a multimodal
architecture that combines vision with auxiliary sensory data, such as spectral signatures, RFID metadata, or olfactory sensing, could
substantially improve classification in ambiguous or degraded visual conditions. Another vital extension lies in domain generalization
and transfer learning, which would enable the system to adapt to new environments, lighting conditions, or regional waste profiles
with minimal additional data. Augmenting the training set with synthetic images or employing domain adaptation techniques would
support this goal. Moreover, expanding the system’s scope to include carbon footprint estimation or recyclability scoring would pro
vide better insights into the environmental cost of various waste types, thus aligning the model with broader sustainability goals.
Integration into a city-scale smart waste management platform, coupled with predictive analytics for waste volume forecasting and
route optimization, can further amplify its societal utility. In summary, while our work sets a strong technical foundation, its real
world potential can be significantly extended by addressing computational efficiency, contextual awareness, multimodal integration,
and environmental accountability in future research.

CRediT authorship contribution statement

Debojyoti Ghosh: Writing -- review & editing, Writing -- original draft, Visualization, Validation, Software, Resources, Method
ology, Investigation, Formal analysis, Data curation, Conceptualization. Adrijit Goswami: Writing -- review & editing, Validation,
Supervision, Conceptualization.

Declaration of competing interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to
influence the work reported in this paper.

Data availability

Data will be made available on request.

References

[1] Shoufeng Jin, Zixuan Yang, Grzegorz Królczykg, Xinying Liu, Paolo Gardoni, Zhixiong Li, Garbage detection and classification using a new deep learning-based
machine vision system as a tool for sustainable waste recycling, Waste Manag. 162 (2023) 123--130.
[2] Zuohua Li, Quanxue Deng, Peicheng Liu, Jing Bai, Yunxuan Gong, Qitao Yang, Jiafei Ning, An intelligent identification and classification system of decoration
waste based on deep learning model, Waste Manag. 174 (2024) 462--475.

17
D. Ghosh and A. Goswami Information Sciences 719 (2025) 122462

[3] Md Sakib, Bin Islam, Md Shaheenur Islam Sumon, Molla E. Majid, Saad Bin Abul Kashem, Mohammad Nashbat, Azad Ashraf, Amith Khandakar, Ali K. Ansaruddin
Kunju, Mazhar Hasan-Zia, Muhammad E.H. Chowdhury, Eccdn-net: a deep learning-based technique for efficient organic and recyclable waste classification,
Waste Manag. 193 (2025) 363--375.
[4] Umesh Kumar Lilhore, Sarita Simaiya, Surjeet Dalal, Magdalena Radulescu, Daniel Balsalobre-Lorente, Intelligent waste sorting for sustainable environment:
a hybrid deep learning and transfer learning model, Gondwana Res. (2024).
[5] Song Zhang, Yumiao Chen, Zhongliang Yang, Hugh Gong, Computer vision based two-stage waste recognition-retrieval algorithm for waste classification, Resour.
Conserv. Recycl. 169 (2021) 105543.
[6] Xi Li, Tian Li, Shaoyi Li, Bin Tian, Jianping Ju, Tingting Liu, Hai Liu, Learning fusion feature representation for garbage image classification model in human–robot
interaction, Infrared Phys. Technol. 128 (2023) 104457.
[7] Wei-Lung Mao, Wei-Chun Chen, Chien-Tsung Wang, Yu-Hao Lin, Recycling waste classification using optimized convolutional neural network, Resour. Conserv.
Recycl. 164 (2021) 105132.
[8] Zhichao Chen, Jie Yang, Lifang Chen, Haining Jiao, Garbage classification system based on improved shufflenet v2, Resour. Conserv. Recycl. 178 (2022) 106090.
[9] Yujin Chen, Anneng Luo, Mengmeng Cheng, Yaoguang Wu, Jihong Zhu, Yanmei Meng, Weilong Tan, Classification and recycling of recyclable garbage based on
deep learning, J. Clean. Prod. 414 (2023) 137558.
[10] Yuanming Ren, Yizhe Li, Xinya Gao, An mrs-yolo model for high-precision waste detection and classification, Sensors 24 (13) (2024) 4339.
[11] Minh K. Quan, Dinh C. Nguyen, Van-Dinh Nguyen, Mayuri Wijayasundara, Sujeeva Setunge, Pubudu N. Pathirana, Towards privacy-preserving waste classification
in the internet of things, IEEE Internet Things J. (2024).
[12] Rongxing Wu, Xingmin Liu, Tiantian Zhang, Jiawei Xia, Jiaqi Li, Mingan Zhu, Gaoquan Gu, An efficient multi-label classification-based municipal waste image
identification, Processes 12 (6) (2024) 1075.
[13] Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, Imagenet classification with deep convolutional neural networks, Commun. ACM 60 (6) (2017) 84--90.
[14] Yu Song, Xin He, Xiwang Tang, Bo Yin, Jie Du, Jiali Liu, Zhongbao Zhao, Shigang Geng, Deepbin: deep learning based garbage classification for households using
sustainable natural technologies, J. Grid Comput. 22 (1) (2024) 2.
[15] Kang An, Yanping Zhang, Lpvit: a transformer based model for pcb image classification and defect detection, IEEE Access 10 (2022) 42542--42553.
[16] Monika Dokl, Yee Van Fan, Annamaria Vujanović, Zorka Novak Pintarič, Kathleen B. Aviso, Raymond R. Tan, Bojan Pahor, Zdravko Kravanja, Lidija Čuček, et
al., A waste separation system based on sensor technology and deep learning: a simple approach applied to a case study of plastic packaging waste, J. Clean.
Prod. 450 (2024) 141762.
[17] Priya Aggarwal, Narendra Kumar Mishra, Binish Fatimah, Pushpendra Singh, Anubha Gupta, Shiv Dutt Joshi, Covid-19 image classification using deep learning:
advances, challenges and opportunities, Comput. Biol. Med. 144 (2022) 105350.
[18] Zixing Liu, Wanyu Fang, Zixiang Cai, Jia Zhang, Yang Yue, Guangren Qian, Garbage-classification policy changes characteristics of municipal-solid-waste fly ash
in China, Sci. Total Environ. 857 (2023) 159299.
[19] Xinchen Cai, Feng Shuang, Xiangming Sun, Yanhui Duan, Guanyuan Cheng, Towards lightweight neural networks for garbage object detection, Sensors 22 (19)
(2022) 7455.
[20] Jiewen Feng, Xiaoyu Tang, Xingjian Jiang, Qunyuan Chen, Garbage disposal of complex background based on deep learning with limited hardware resources,
IEEE Sens. J. 21 (18) (2021) 21050--21058.
[21] Fucong Liu, Hui Xu, Miao Qi, Di Liu, Jianzhong Wang, Jun Kong, Depth-wise separable convolution attention module for garbage image classification, Sustain
ability 14 (5) (2022) 3099.
[22] Jongchan Park, Sanghyun Woo, Joon-Young Lee, In So Kweon, A simple and light-weight attention module for convolutional neural networks, Int. J. Comput.
Vis. 128 (4) (2020) 783--798.
[23] Kashif Ahmad, Khalil Khan, Ala Al-Fuqaha, Intelligent fusion of deep features for improved waste classification, IEEE Access 8 (2020) 96495--96504.
[24] Junran Lin, Cuimei Yang, Yi Lu, Yuxing Cai, Hanjie Zhan, Zhen Zhang, An improved soft-yolox for garbage quantity identification, Mathematics 10 (15) (2022)
2650.
[25] Kunsen Lin, Tao Zhou, Xiaofeng Gao, Zongshen Li, Huabo Duan, Huanyu Wu, Guanyou Lu, Youcai Zhao, Deep convolutional neural networks for construction
and demolition waste classification: vggnet structures, cyclical learning rate, and knowledge transfer, J. Environ. Manag. 318 (2022) 115501.
[26] Wei-Lung Mao, Wei-Chun Chen, Haris Imam Karim Fathurrahman, Yu-Hao Lin, Deep learning networks for real-time regional domestic waste detection, J. Clean.
Prod. 344 (2022) 131096.
[27] Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster r-cnn: towards real-time object detection with region proposal networks, IEEE Trans. Pattern Anal.
Mach. Intell. 39 (6) (2016) 1137--1149.
[28] Nicholas Chieng Anak Sallang, Mohammad Tariqul Islam, Mohammad Shahidul Islam, Haslina Arshad, A cnn-based smart waste management system using
tensorflow lite and lora-gps shield in internet of things environment, IEEE Access 9 (2021) 153560--153574.
[29] Mohammed Imran Basheer Ahmed, Raghad B. Alotaibi, Rahaf A. Al-Qahtani, Rahaf S. Al-Qahtani, Sara S. Al-Hetela, Khawla A. Al-Matar, Noura K. Al-Saqer,
Atta Rahman, Linah Saraireh, Mustafa Youldash, et al., Deep learning approach to recyclable products classification: towards sustainable waste management,
Sustainability 15 (14) (2023) 11138.
[30] Zhenhua Wang, Bin Wang, Ming Ren, Dong Gao, A new hazard event classification model via deep learning and multifractal, Comput. Ind. 147 (2023) 103875.
[31] Fuqian Zhang, Bin Wang, Dong Gao, Chengxi Yan, Zhenhua Wang, When grey model meets deep learning: a new hazard classification model, Inf. Sci. 670 (2024)
120653.
[32] Xiaoan Tang, Yuxin Wei, Kaijie Xu, Qiang Zhang, Enhancement of the performance of high-dimensional fuzzy classification with feature combination optimization,
Inf. Sci. 680 (2024) 121183.
[33] Jérôme Lux, Jean David Lau Hiu Hoong, Pierre-Yves Mahieux, Philippe Turcry, Classification and estimation of the mass composition of recycled aggregates by
deep neural networks, Comput. Ind. 148 (2023) 103889.
[34] C. Anna Palagan, S. Sebastin Antony Joe, S.J. Jereesha Mary, E. Edwin Jijo, Predictive analysis-based sustainable waste management in smart cities using iot
edge computing and blockchain technology, Comput. Ind. 166 (2025) 104234.
[35] B. Madhavi, Mohan Mahanty, Chia-Chen Lin, B. Omkar Lakshmi Jagan, Hari Mohan Rai, Saurabh Agarwal, Neha Agarwal, Swinconvnext: a fused deep learning
architecture for real-time garbage image classification, Sci. Rep. 15 (2025) 7995.
[36] Yury A. Malkov, Dmitry A. Yashunin, Efficient and robust approximate nearest neighbor search using hierarchical navigable small world graphs, IEEE Trans.
Pattern Anal. Mach. Intell. 42 (4) (2020) 824--836.
[37] European Commission, Directive 2008/98/ec of the European Parliament and of the Council on Waste (Waste Framework Directive), 2008, Accessed: 2025-04-08.
[38] Forest Ministry of Environment and Government of India Climate Change, Solid Waste Management Rules, 2016, 2016, Accessed: 2025-04-08.
[39] R. Nishant, M. Kennedy, J. Corbett, Artificial intelligence for sustainable waste management: a review, Sustain. Cities Soc. 63 (2020) 102423.

Cart Sauer Danfoss
100% (5)
Cart Sauer Danfoss
8 pages
Machine Learning Research Paper
No ratings yet
Machine Learning Research Paper
12 pages
IJRPR43852
No ratings yet
IJRPR43852
12 pages
A21 Musham Stamped
No ratings yet
A21 Musham Stamped
8 pages
Final Waste Management - 15 Conference
No ratings yet
Final Waste Management - 15 Conference
13 pages
A Reliable and Robust Deep Learning Model For Effective Recyclable Waste Classification
No ratings yet
A Reliable and Robust Deep Learning Model For Effective Recyclable Waste Classification
13 pages
Intelligent Waste Sorting with CNN
No ratings yet
Intelligent Waste Sorting with CNN
14 pages
Enhancing Waste Sorting and Recycling Efficiency: Robust Deep Learning-Based Approach For Classification and Detection
No ratings yet
Enhancing Waste Sorting and Recycling Efficiency: Robust Deep Learning-Based Approach For Classification and Detection
17 pages
0th Review (Final)
No ratings yet
0th Review (Final)
17 pages
Ipecjst 9
No ratings yet
Ipecjst 9
6 pages
19.transfer Learning Based Data-Efficient Machine Learning Enabled Classification
No ratings yet
19.transfer Learning Based Data-Efficient Machine Learning Enabled Classification
7 pages
1 s2.0 S2351978919307231 Main
No ratings yet
1 s2.0 S2351978919307231 Main
6 pages
GarbageClassifier Research Paper JES
No ratings yet
GarbageClassifier Research Paper JES
9 pages
20 Research Papers
No ratings yet
20 Research Papers
5 pages
Garbage Classification - Suggested
No ratings yet
Garbage Classification - Suggested
8 pages
Sun Project Documentation Chapter - 1
No ratings yet
Sun Project Documentation Chapter - 1
5 pages
Sensors 25 03807
No ratings yet
Sensors 25 03807
18 pages
phân loại rác thải bằng hệ thống thông minh
No ratings yet
phân loại rác thải bằng hệ thống thông minh
9 pages
Waste Classification Using Convolutional Neural Network On Edge Devices
No ratings yet
Waste Classification Using Convolutional Neural Network On Edge Devices
5 pages
Final Report On Classification and Separation of Solid Waste Using Deep Learning
No ratings yet
Final Report On Classification and Separation of Solid Waste Using Deep Learning
40 pages
Applied Sciences: A Waste Classification Method Based On A Multilayer Hybrid Convolution Neural Network
No ratings yet
Applied Sciences: A Waste Classification Method Based On A Multilayer Hybrid Convolution Neural Network
19 pages
Seminar
No ratings yet
Seminar
9 pages
Recyclable Waste Classification Using Computer Vision and Deep Learning
No ratings yet
Recyclable Waste Classification Using Computer Vision and Deep Learning
5 pages
Smart Waste Management System With Image Recognition
No ratings yet
Smart Waste Management System With Image Recognition
11 pages
ANN ConferencePaper
No ratings yet
ANN ConferencePaper
9 pages
Matecconf Icmed2024 01167
No ratings yet
Matecconf Icmed2024 01167
13 pages
s11042 021 11537 0
No ratings yet
s11042 021 11537 0
16 pages
Deep CNNs for Waste Classification
No ratings yet
Deep CNNs for Waste Classification
6 pages
Abood
No ratings yet
Abood
11 pages
Suchalsuhaspote
No ratings yet
Suchalsuhaspote
21 pages
20 Research Papers
No ratings yet
20 Research Papers
11 pages
Recycling Waste Classification Using Vision Transformer On Portable Device
No ratings yet
Recycling Waste Classification Using Vision Transformer On Portable Device
14 pages
Fulldoc - Avc MSC - Trash Classification
No ratings yet
Fulldoc - Avc MSC - Trash Classification
66 pages
A Systematic Review of AI-Based Techniques For Aut
No ratings yet
A Systematic Review of AI-Based Techniques For Aut
58 pages
Smart Waste Management Using Ai
No ratings yet
Smart Waste Management Using Ai
11 pages
Waste Computer
No ratings yet
Waste Computer
40 pages
An Automated Waste Classification System Using Deep Learning Techniques: Toward Efficient Waste Recycling and Environmental Sustainability
No ratings yet
An Automated Waste Classification System Using Deep Learning Techniques: Toward Efficient Waste Recycling and Environmental Sustainability
34 pages
Enhancing Waste Management Through Deep Learning
No ratings yet
Enhancing Waste Management Through Deep Learning
19 pages
Deep Learning for Waste Classification
No ratings yet
Deep Learning for Waste Classification
8 pages
AI-Powered Waste Sorting Solution
No ratings yet
AI-Powered Waste Sorting Solution
26 pages
1 s2.0 S2212095523000561 Main
No ratings yet
1 s2.0 S2212095523000561 Main
12 pages
2022sustainability 14 10226 v2
No ratings yet
2022sustainability 14 10226 v2
21 pages
Final Project
No ratings yet
Final Project
8 pages
TechnicalPaper GROUP4
No ratings yet
TechnicalPaper GROUP4
4 pages
Ors 097
No ratings yet
Ors 097
4 pages
NM Project
No ratings yet
NM Project
14 pages
NMSmart Waste Classifier Manual
No ratings yet
NMSmart Waste Classifier Manual
15 pages
Sudha 2016
No ratings yet
Sudha 2016
7 pages
1 s2.0 S1110016824007993 Main
No ratings yet
1 s2.0 S1110016824007993 Main
13 pages
A Comparative Analysis of Deep Learning Models For Waste Segregation: Yolov8, Efficientdet, and Detectron 2
No ratings yet
A Comparative Analysis of Deep Learning Models For Waste Segregation: Yolov8, Efficientdet, and Detectron 2
24 pages
AI Waste Classification Presentation
No ratings yet
AI Waste Classification Presentation
9 pages
E11 Full
No ratings yet
E11 Full
4 pages
Automated Garbage Separation Using Ai Based Robotic Arm
No ratings yet
Automated Garbage Separation Using Ai Based Robotic Arm
16 pages
Real-Time Solid Waste Sorting Machine Based On Dee
No ratings yet
Real-Time Solid Waste Sorting Machine Based On Dee
9 pages
Fine-Tuning Models Comparisons On Garbage Classification For Recyclability
No ratings yet
Fine-Tuning Models Comparisons On Garbage Classification For Recyclability
4 pages
Sustainability 14 14735
No ratings yet
Sustainability 14 14735
16 pages
Ijsdc V01 01 01 10 16072023
No ratings yet
Ijsdc V01 01 01 10 16072023
10 pages
Deep Learning Based Smart Garbage Classifier For Effective Waste Management
No ratings yet
Deep Learning Based Smart Garbage Classifier For Effective Waste Management
4 pages
Lab 2 - Behavioral Level, RTL, and Gate Level Design
No ratings yet
Lab 2 - Behavioral Level, RTL, and Gate Level Design
3 pages
Monica Grover's Resume
No ratings yet
Monica Grover's Resume
2 pages
M.Sc. Electronics & Instrumentation
No ratings yet
M.Sc. Electronics & Instrumentation
70 pages
The Purpose of XML Schema
No ratings yet
The Purpose of XML Schema
12 pages
Practical Work 5 - CMOS + Rubrics PDF
No ratings yet
Practical Work 5 - CMOS + Rubrics PDF
8 pages
SuperMark 1.5T Proposal - 2108
No ratings yet
SuperMark 1.5T Proposal - 2108
29 pages
Final - Emt 11 - 12 Q2 0802 PS
No ratings yet
Final - Emt 11 - 12 Q2 0802 PS
53 pages
CSC Examination Result
No ratings yet
CSC Examination Result
2 pages
Group 3 Report
No ratings yet
Group 3 Report
66 pages
vm51616H - Video - Matrix - Switch - Ds - en
No ratings yet
vm51616H - Video - Matrix - Switch - Ds - en
3 pages
The Meshing Sequence: Meshing With Default Settings
No ratings yet
The Meshing Sequence: Meshing With Default Settings
9 pages
Fpga Implementation of Neural Networks: Main Contents
No ratings yet
Fpga Implementation of Neural Networks: Main Contents
21 pages
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
No ratings yet
EC8491 CT Notes Full - by WWW - EasyEngineering.net 4 PDF
152 pages
Binomial Theorem (Practice Question) PDF
100% (4)
Binomial Theorem (Practice Question) PDF
11 pages
Latitude 3350 14291 - Loveland - Skl-U - A00 - 0918
No ratings yet
Latitude 3350 14291 - Loveland - Skl-U - A00 - 0918
105 pages
Ba 5211 - Data Analysis and Business Modeling
No ratings yet
Ba 5211 - Data Analysis and Business Modeling
88 pages
Flamingo Bullet Categorization
No ratings yet
Flamingo Bullet Categorization
8 pages
Intel® Core™2 Duo Processor E7500
No ratings yet
Intel® Core™2 Duo Processor E7500
4 pages
Esther Joy. M: Resume
No ratings yet
Esther Joy. M: Resume
7 pages
AutoCAD Hatch and Array Guide
No ratings yet
AutoCAD Hatch and Array Guide
5 pages
S1-K12 Laser Service Manual
No ratings yet
S1-K12 Laser Service Manual
10 pages
My-Super-Cool-Firewall - Alarm-Thing
No ratings yet
My-Super-Cool-Firewall - Alarm-Thing
5 pages
Unit 1 DBMS
No ratings yet
Unit 1 DBMS
107 pages
2013 SNUG SV Synthesizable SystemVerilog Paper
No ratings yet
2013 SNUG SV Synthesizable SystemVerilog Paper
45 pages
ITTO For PMP Exam
No ratings yet
ITTO For PMP Exam
8 pages
BlueBorne Technical White Paper
No ratings yet
BlueBorne Technical White Paper
42 pages
MBIST (Memory Built-In Self Test) - 5
No ratings yet
MBIST (Memory Built-In Self Test) - 5
5 pages
S01M03 TP00003SG03F6E0V Ed1 5G System Requirements
No ratings yet
S01M03 TP00003SG03F6E0V Ed1 5G System Requirements
26 pages
GAMMA Building Control KNX 2012
No ratings yet
GAMMA Building Control KNX 2012
324 pages

Enhanced Deep Learning Framework For Efficient Garbage

Uploaded by

Enhanced Deep Learning Framework For Efficient Garbage

Uploaded by

Information Sciences 719 (2025) 122462

Contents lists available at ScienceDirect

Enhanced deep learning framework for eﬃcient garbage

2.1. Deep CNN-based models

2.2. Hybrid and transfer learning models

2.3. Optimization and lightweight models

2.4. Multi-modal and hybrid models

2.5. Challenges in existing techniques and contributions of the proposed method

2.5.1. Our contributions

3. Dataset and preprocessing

Fig. 1. Workflow of the proposed garbage classification model.

3.1. Data augmentation techniques

3.2. Mathematical representation of transformations

𝑋 ′ = 𝑇 (𝑋) = 𝑇𝑛 ◦ 𝑇𝑛−1 ◦ ⋯ ◦ 𝑇1 (𝑋),

4.1. Feature extraction using ResNet-152

Algorithm 1 Feature Extraction with ResNet-152.

4.1.1. Mathematical formulation

• 𝜙(𝑋; 𝜃) represents the ResNet-152 network parameterized by weights 𝜃 .

4.1.2. Implementation details

𝐹 (𝑋) = GAP(𝜓(𝑋 ′ )),

4.1.3. Feature utilization

4.2. FAISS-based similarity search

Algorithm 2 Training FAISS Index.

4.2.1. Problem formulation

4.2.2. Indexing and search strategy

4.2.3. Implementation details

1. Extract feature vectors {𝑥𝑖 }𝑁

4.3. Attention-based feature fusion

Fig. 2. Self-attention-based fusion of the input and retrieved feature vectors.

4.3.1. Mathematical formulation

4.3.2. Implementation in neural networks

4.3.3. Signiﬁcance in classification

Algorithm 3 Attention-Based Feature Enhancement.

4.4. Deep learning classifier

4.4.1. Feedforward network structure

𝐻𝑖 = 𝜎(𝐵𝑁(𝑊𝑖 𝐻𝑖−1 + 𝑏𝑖 )),

• 𝐻𝑖 represents the hidden layer activations at layer 𝑖.

4.4.2. DropConnect mechanism

4.4.3. Regularization and optimization

𝐻𝑖′′ = Dropout(𝐻𝑖′ , 𝑞),

4.5. Deep learning and artificial neural networks

Advantages of deep learning in our proposed model:

5. Training with label smoothing

5.1. Mathematical formulation

5.2. Final loss function

5.3. Implementation in training

Algorithm 4 Training with Label Smoothing.

6.1. Evaluation algorithm

The evaluation process is described in Algorithm 5.

Algorithm 5 Model Evaluation.

Epoch Loss Accuracy (%)

6.2. Training performance

6.3. Evaluation metrics

6.4. Classification performance

Class Precision Recall F1-score Support

0 0.95 0.97 0.96 196

Overall 0.97 0.97 0.97 2856

6.5. Confusion matrix

6.6. Model comparison

Model Accuracy (%) Precision Recall F1-score

VGG16 84.10 0.85 0.84 0.84

6.7. Statistical signiﬁcance analysis

6.8. Model architecture and eﬃciency

VGG16 138.4 15.47 528 69.5 / 4.2

Components Enabled Accuracy (%) Precision Recall F1-score

6.8.1. Inference speed and computational complexity

6.9. Ablation study

7. Integration into waste management systems

8. Conclusion and future directions

CRediT authorship contribution statement

Declaration of competing interest

Data will be made available on request.

You might also like