Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views64 pages

Efficient Brain Tumor Classification Using Deep Learning

This dissertation presents a novel approach for brain tumor classification using Data-Efficient Image Transformers (DeiT), addressing the limitations of traditional Convolutional Neural Networks (CNNs). The proposed model achieved a classification accuracy of 98% and a rapid identification time of approximately 0.5 seconds, making it suitable for real-world clinical applications. The research emphasizes the importance of efficiency, precision, and scalability in medical imaging, paving the way for future integration of transformer-based models in clinical settings.

Uploaded by

dosaddeepak01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views64 pages

Efficient Brain Tumor Classification Using Deep Learning

This dissertation presents a novel approach for brain tumor classification using Data-Efficient Image Transformers (DeiT), addressing the limitations of traditional Convolutional Neural Networks (CNNs). The proposed model achieved a classification accuracy of 98% and a rapid identification time of approximately 0.5 seconds, making it suitable for real-world clinical applications. The research emphasizes the importance of efficiency, precision, and scalability in medical imaging, paving the way for future integration of transformer-based models in clinical settings.

Uploaded by

dosaddeepak01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

A Dissertation Report

on

Efficient Brain Tumor Classification Using Deep Learning


Submitted in partial fulfilment of the
requirement for the award of the degree of

MASTER OF TECHNOLOGY
in
Computer Science & Engineering
Submitted By

Vikas Maurya (23SCSE2010007)

Under The Supervision of


Dr. Abdul Aleem
Professor C.S.E

SCHOOL OF COMPUTER SCIENCE AND ENGINEERING


DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA, INDIA
JULY, 2025
SCHOOL OF COMPUTER SCIENCE AND
ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA

CANDIDATE’S DECLARATION

I hereby certify that the work which is being presented in the thesis, entitled “EFFICIENT
BRAIN TUMOR CLASSIFICATION USING DEEP LEARNING” in partial fulfilment of the
requirements for the award of the M. TECH in CSE submitted in the School of Computer Science
and Engineering of Galgotias University, Greater Noida, is an original work carried out during the
period of July 2024 to July 2025, under the supervision of Dr. Abdul Aleem, Professor,
Department of Computer Science and Engineering of School of Computer Science and
Engineering, Galgotias University, Greater Noida.
The matter presented in the thesis has not been submitted by me for the award of any other
degree of this or any other institution.

Vikas Maurya- 23SCSE2010007

This is to certify that the above statement made by the candidates is correct to the best of my
knowledge.

Supervisor
(Dr. Abdul Aleem, Professor)

ii
SCHOOL OF COMPUTER SCIENCE AND
ENGINEERING
GALGOTIAS UNIVERSITY, GREATER NOIDA

CERTIFICATE

The Final Thesis “Efficient Brain Tumor Classification Using Deep Learning” Viva-Voce
examination of Vikas Maurya (23SCSE2010007) has been held on 2 July 2025 and his work is
recommended for the award of M. TECH in Computer Science & Engineering.

Signature of Examiner(s) Signature of Supervisor(s)

Signature of M.Tech Coordinator Signature of Dean

Date: July 2025


Place: Greater Noida

iii
ACKNOWLEDGEMENT

The satisfaction and euphoria that accompany the successful completion of any task would be
incomplete without the mention of the people who made it possible, whose constant guidance and
encouragement crown all efforts with success.

I take this opportunity to express my profound gratitude and deep regards to My Guide Dr. Abdul
Aleem, Professor, School of Computer Science & Engineering, Galgotias University for his
exemplary guidance, monitoring and constant encouragement throughout the project work.

I extend my sincere appreciation to all other Professors, Program Chair and Dean for their valuable
inside and tip during the design of the project. I would also like to thank all lab assistants for
helping me with my project work. Their contributions have been valuable in so many ways that I
find it difficult to acknowledge them individually.

I am also thankful to all those who helped me directly or indirectly in the completion of this work.

Date: July 2025 Vikas Maurya


Place: Greater Noida (23SCSE2010007)

iv
ABSTRACT
MRI scan for brain tumor detection, is highly relevant in medical diagnosis, as it impacts
patient treatment strategies. However, the traditional approaches, such as the manual and
Convolutional Neural Networks (CNNs) are quite challenging. They include high computational
requirements, the need for large pre-labeled datasets, and low speed in real-world clinical use. In
order to tackle these problems, this work introduces a new solution based on Data-Efficient Image
Transformers (DeiT) for classifying brain tumor with high efficacy.

It integrates DeiT for accurate classification of the MR images but also the bounding box
localization that identifies the exact location of the tumors in the brain. By incorporating attention
mechanisms, DeiT has the ability to extract generation patterns from MRI scans even in
applications with limited data. Bounded boxes make the model more capable of localizing tumors,
with emphasis on these areas as compared to the general body scan improving interpretability for
clinicians. There are normalization operations and data augmentation operations such as the
normalization of MRI scans with different image acquisitions and machine types.

The model was tested using the BraTS dataset, and the performance reached exceptional
classification accuracy of 98%, precision of 97%, recall of 96%, before finishing with the IoU
score of 93%. Furthermore, the model offered really high-performance – the time of identification
of the image was approximately 0.5 sec which is perfect for the real-world medical practice. The
comparative analysis of the proposed method with the native CNN-based models – ResNet and
EfficientNet – demonstrated a higher accuracy and equally effective resource consumption.

This research not only eliminates the imperfection of current methods for detecting brain
tumor but also offers a solution that is scalable, fast and reliable for immediate clinical use.
Prominent among these are the bypassing of patients’ rights to privacy and the control of possible
prejudice. Moreover, and maybe more importantly, the presented framework’s versatility in
relation to other imaging procedures like the CT scan clearly demonstrates a future applicability
of the framework within the larger spectrum of medical imaging.

v
Therefore, this work presents a practical path that deals with the actual characteristics of
the application, namely efficiency, precision, and extendibility, creating a structure for the future
incorporation of transformer-based models into clinical applications. The future direction of this
work will include the integration of multi-modal imaging, improvement in domain adaptation for
better generalization, and use of AI interpretability techniques to enhance acceptability in the
medical environment. This is a giant leap toward changing the manner in which brain tumors are
diagnosed and treated.

vi
TABLE OF CONTENT
TITLE PAGE NO
CANDIDATES’S DECLARATION II
CERTIFICATE III
ACKNOWLEDGEMENT III
ABSTRACT V
CONTENTS VII
LIST OF TABLES IX
LIST OF FIGURES X
ACRONYMS XI

CHAPTER 1 INTRODUCTION 12
1.1 Overview 12
1.2 Problem Statement 15
1.2.1 Data Dependency 15
1.2.2 Computational Complexity 16
1.2.3 Limited Contextual Understanding 16
1.3 Motivation 17
1.4 Objective 18
1.4.1. Design of Accurate Model 18
1.4.2 Enhancing Data Efficiency 19
1.4.3 The present work is aimed at enhancing 19
the computational efficiency
1.4.4 Maintaining the capability of scaling up 19
and down and being flexible or changeable
1.5 Challenges 20
1.5.1 Data Security 20
1.5.2 Variability in MRI Data 20
1.5.3 Computational Requirements 21
1.5.4 Ethical and Regulatory Issues 21
1.6 Methodology Overview 21
1.6.1 Preprocessing 22
1.6.2 Model Training 22
1.6.3 Evaluation 22
1.6.4 Comparison 22
1.7 Contributions 23
1.7.1 Efficient Classification 23

vii
1.7.2 Real-Time Applicability 23
1.7.3 Improved Localization 24
1.7.4 Ethical Considerations 24
1.8 Summary 24

CHAPTER 2 LITERATURE SURVEY 25


2.1 Traditional Method 26
2.2 Advances in Transformers 27
2.3 Bounding Box Techniques 28
2.4 Gap Analysis 29
2.5 Summary 31
CHAPTER 3 PROPOSED METHODOLOGY 32
3.1 Datasets 33
3.2 Model Design 35
3.3 Implementation Details 36
3.4 training Strategy 37
3.5 Summary 38
CHAPTER 4 RESULT & DESCUSSION 39
4.1 Performance Metrics 39
4.2 Comparative Metrics 40
4.3 Real-Time Efficiency 42
4.4 Qualitative Results 44
4.5 Final Output Visualization 46
4.6 Summary 48
CHAPTER 5 ETHICAL AND PRACTICAL IMPLICATIONS 49
5.1 Interpretability 49
5.2 Scalability 50
5.3 Ethical Considerations 51
5.4 Summary 52
CHAPTER 6 CONCLUSION & FUTURE SCOPE 53
REFERENCES
PUBLICATIONS

viii
LIST OF TABLES

PAGE
S.NO. CAPTION
NO.

2.1 Summary of Related Work in Brain Tumor Detection and 30


Classification

ix
LIST OF FIGURES

PAGE
S.NO. TITLE
NO.
1.1 Structure of human brain 12

3.1 MRI Image of a Brain 34

3.2 BRATS Dataset 34

4.1 Performance metrics of the Proposed Model 40

4.2 Comparative Analysis of Accuracy Models 42

4.3 Comparative Analysis of IoU Among Models 42

4.4 Real-Time Computational Efficiency Comparison 44

4.5 Qualitative Analysis of DeiT-Based Framework 46

4.6 Final Output Visualization of Tumor Detection 47

x
ACRONYMS

DeiT Data-efficient Image Transformers


MRI Magnetic Resonance Imaging
CNNs Convolutional Neural Network
ViT Vision Transformer
XAI Implement explainable AI
VGG Visual Geometry Group
BraTS Brain Tumor Segmentation
CT Computed Tomography
Grad-CAM Gradient-weighted Class Activation Mapping
AI Artificial Intelligence

xi
CHAPTER 1

INTRODUCTION

1.1 Overview

It concerns the brain tumor that can be considered one of the most urgent problems in
medicine today since early and accurate detection of this disease allows, on one hand, to enhance
the efficiency of treatment and, on another hand, increase chances of a patient’s survival. Given
the severity of brain tumors as a disease that threatens human life, it is critically important to
differentiate them, size, location, and stage in the development of therapeutic management plans.
Ratio of early diagnosis to the overall treatment approach is significant, minimizes complications
and improves the chances to achieve a positive outcome. Out of all the diagnostic techniques MRI
came out to be the best because it can give detailed images of soft tissue tissue including the brain
without having to use ionizing radiation and is therefore most preferred in brain tumor diagnosis
[1].

MRI has transformed the way brain tumor is diagnosed by providing clinicians the ability
to visualize the tissues of the brain without invasive procedures to be done on the patient. Contrary
to CT scans or other X-ray scans, MRI is based on magnetic field and radio waves to produce
pictures. It also makes it safer for repeated use something that frequently is required for observation
of tumor size or assessment of therapy results. Moreover, MRI can generate different sequences
of images such as T1, T2, FLAIR and each of them gives different appearance of the tumor and
can differentiate between oedema, necrotic or active parts of the tumor. However, to this date,
manual assessment of the MRI scans is still a complex and lengthy process [1].

Figure 1.1 Structure of human brain

12
Diagnostic assessment of brain tumor on MRI images is a labor-intensive process implying
the analysis of differences in tissue texture that may be subtle and which vary depending on the
type, size and location of the tumor. This process is tedious, and also, susceptible to inconsistencies
and inaccuracies, particularly where the tumor in question is small or relatively complicated.
Interpretation of the human being is always subjective; that is the reason why even experienced
radiologists may not be able to recognize certain minute abnormalities or even misinterpret what
they observe. All these challenges are worsened by the rise in the need for diagnostic imaging
hence posing much pressure on very stretched healthcare structures. In order to overcome such
limitations, newer approaches have been considered for integration in medical image analysis,
mainly artificial intelligence (AI).

Automated and precise diagnosis of brain tumors have been shown to be possible by
adopting the AI solutions. AI’s deep learning has recently promisingly allowed the creation of
models that can accurately analyze medical images. Among these, more specific Convolutional
Neural Networks (CNNs) have been employed in tumor classification, segmentation and detection
applications. CNNs are most effective in identifying hierarchical features from the images
beginning with edges and texture to higher structures. That is why CNNs are used in many medical
applications including the analysis of brain tumor [2].

CNN based methods, however, have experienced significant success despite the following
critical drawbacks that make their application in clinical decisions inefficient. As one of the major
issues, the computational complexity of CNNs can be still regarded. Training CNN models is
computationally intensive and needs powerful GPU and a lot of time which is inconceivable in
developing countries. Furthermore, CNNs are very sensitive to the availability of large annotated
datasets on which models can be trained. It is even more difficult in the medical domain, because
of privacy reasons, expensive expert annotations, and the fact that some tumor types are extremely
rare. This dependency means that the CNN models can only really be used for the particular
dataset, or the organization that provides the data [2].

However, CNNs also suffer from a number of drawbacks: another one is high data
dependency and weak capability to learn and handle global contextual information. CNNs mainly

13
depend on the local spatial relationships of an image, and hence may fail to recognize complex or
diffused tumors that may require a general view of the whole image. For example, a CNN may
have a problem of distinguishing between the tumor tissue and the surrounding edema if these two
have similar characteristics. Such a lack of global awareness can lead to misclassifications or
inadequate segmentations, which are fatal mistakes in a clinical environment [3].

To address these challenges, researchers have employed the concept of transformer-based


architectures which have found massive success in the computer vision domain. Transformers,
initially designed for NLP, employ the self-attention mechanism to analyze data and data
processing. As it has already been mentioned, transformers are free from the convolutional layers
that are common in CNNs, but they use attention in order to learn the features that captures local
and global dependences in an image. This ability to model long-range dependencies is particularly
helpful especially in medical imaging where proximal information and extra context are very vital
for diagnostics [3].

Specific to the transformer-based architectures, Data-Efficient Image Transformers (DeiT)


have been proposed in recent studies to deliver efficient and robust results for image analyses. In
particular, DeiT is capable of overcoming weaknesses of traditional transformers, which require
massive computations and large datasets for training. DeiT exposes how optimizing some features
of the transformer architecture can lead to a lite model that performs nearly equivalent to CNNs
with less computations and less data. This makes DeiT an appealing solution for medical
applications because annotated datasets are often rare, and computation power may be restricted
[27].

Among the obvious advantages of DeiT in the diagnosis of brain tumors are the following:
Its self-attention mechanism enables it to consider the entire MRI scans at once and pay attention
to the detailed features of the tumor regions as well as spatial contexts. This enhances the model’s
stability in giving different options of a brain tumor type like gliomas, meningioma, and tumor of
sell such as prolactinomas. Moreover, their approach is computationally efficient when processing
images, enabling real-time clinical implementation of DeiT. For example, the feasibility to utilize

14
DeiT for high accurate brain tumor classification with low inference time was preliminarily
demonstrated in [3].

Altogether, the development of transformer-based architectures mainly DeiT is a marked progress


in the area of brain tumor detection. Due to the deficiencies in the traditional CNNs, DeiT provides
a new solution for analyzing MRI scan to locate and predict the tumor accurately. Given that it can
work well, even with a minimal amount of data and computational power, it can be viewed as a
useful instrument for enhancing diagnostics’ precision and speed within clinics. They have to point
the response of deep learning, specifically DeiT, on uses related to diagnosis of brain tumors so
that the established gap between the advance AI techniques reach the clinic.

1.2 Problem Statement

Diagnostics of Brain tumor sometimes is a challenge even at initial stage and more
challenging at the classification level. However, there are still existing drawbacks that hinder
further implementation of this type of AI into the clinical works, though some progress has been
exiting particularly after debut of CNNs. Specifically, these challenges limit data-dependency,
value computational speed, and undermine an understanding of context.

1.2.1. Data Dependency

CNNs call for large annotated data sets in order for them to effectively train on. In medical
imaging, generation of such datasets poses an even bigger challenge. This can only be done
through the small vertical box in the upper right-hand corner or section and this evidently requires
input from the radiologists, a rare commodity that is also relatively expensive when hired. The
absence of intricate and numerous databases is sometime aggravated by the privacy issues and a
strict regulation of the personal health information. Additionally, the distribution of samples by
the type of brain tumor is extremely skewed, indicating rare forms of the disease make
generalization during training a challenge. Hence the CNNs with low accuracy and in versatility
for redeployment and further modification stemming from a small or biasing database [4].

15
1.2.2. Computational Complexity

CNNs are very complex, they consume a lot of hard ware resources when it comes to
training and when it comes to testing /implementation as well Therefore, when training CNN
models, it requires going through a number of layers of convolutions, pooling and activation which
is time-consuming and takes a powerful GPU or a cloud services. In terms of inference, the
resource requirements are still high, and in real applications for such environments, for example,
rural clinics, or a relatively small hospital, it remains a problem. Also the energy demand and
latency of CNN hampers the possibility of extent scalability especially in the case where timely
decision making is imperative [5].

1.2.3. Limited Contextual Understanding

CNNs are good at obtaining localized spatial relations of an image like the edges and
boundaries, textures, and other shapes but less appropriate at capturing the overall scene in the
image. This is especially the case in brain tumor detection, where often the boundary between the
cancerous tissue and the healthy one is blurred. Some tissues may be surrounded by tumors or have
subtle features may be less conspicuous than distinct margins or boundaries might imply and may
call for broader perspective of an image. Due to the fact that most CNNs perform most of their
feature extraction incorporating data from a relatively small spatial region at a time, CNN
misclassifications, inexact segmentations or failure to accurately identify the location of the tumor,
contribute to reduced accuracy in diagnosis [6].

These challenges give rise with increased advantages to more effective models that can
surmount drawbacks in CNNs. An ideal model should perform well on small set of data points,
require and use the least computational power possible, and should be capable of identifying local
and global characteristics of the MRI scans. These points are important to resolve, in order to
advance the construction of diagnostic instruments that are accurate in real-time, cost-effective,
and usable in various establishment of healthcare. Approaches that can efficiently overcome these
difficulties are transformer-based architectures that, based on attention mechanisms, perform
comprehensive analysis of dependencies. In this work, however, Data-Efficient Image

16
Transformers (DeiT) are adopted to solve these issues to provide a new approach to effective and
precise brain tumor detection and classification.

1.3 Motivation

This research is fueled by the need to develop real time point of care diagnostic tools in
clinical practices. There is thus growing concern in the detection of brain tumor at an early stage
and accurately in order to inform the treatment regime and increase patients’ likelihood of survival.
However, many times, the current diagnostic methods do not provide prompt, accurate results
especially in critical applications where decision making is required on a priority basis. The
interpretation of MRI scans by radiologists is time-consuming, liable to error, and in general
insufficient to respond adequately to the significant need for complete diagnostic imaging in
today’s healthcare systems. Additional problems related to these distinctions arise from
characteristics of tumor that are difficult to determine by TCs, including size and tumor boundary
irregularities; these features can lead to misdiagnosis or delays in intervention measures [7].

These challenges have been evidenced and can be resolved particularly by deep learning
models such as Convolutional Neural Networks (CNNs) of Artificial Intelligence (AI). CNNs have
made considerable advancements on automatic medical imaging, but their usage in clinical
environments in real time is still a challenge. CNNs are computationally expensive and depend on
a large set of annotated examples; characteristic which does not scale very well. In addition, CNNs
are less capable of capturing the larger context that is necessary for precise tumor localization and
classification if the tumor presents as complex or diffuse.

Transformers have emerged as effective rivals to CNNs because they inherit key
advantages of long doctrinal configuration and high-dimensionality. Whereas CNNs are able of
decoding only one or two aspects of an image at a time, the transformers, by virtue of self-attentive
mechanisms, are capable of decoding all aspects of an image at a time. This capability is
particularly critical in medical imaging as understanding of the tumor and other structures in the
region is evidently vital in decision making. However, such methods as the ordinary transformer

17
models are slow and require a large set of data for training which makes such methods unsuitable
for the medical industry.

To avoid such drawbacks, other architectures such as Data-Efficient Image Transformers


(DeiT) develop the architecture for small datasets exclusively. Performance and computation
analysis of DeiT make it apparent that the model is either as capable as or superior to other CNNs
yet requires fewer processing resources. Because this approach can recognize which features are
big enough to observe in MRI scans and can perform well even when there is low volume data it
could be a more fitting option for real time detection of brain tumor. In any case, it is still evident
that DeiT has claimed the accuracy, speed and scalability which would transformation the
diagnostic utilization in the clinic. This research stems from the capabilities DeiT has to mitigate
the drawbacks of the previous approaches and the prospect of the algorithm in fostering the
creation of novel fast diagnosis system in the health care sector [9].

1.4 Objectives

This is research work which seeks to propose a new identification and classification of brain
tumors using Data Efficient Image Transformer (DeiT). Considering the drawbacks associated
with conventional diagnostic tools and critical necessities for precise, efficient, and scalable
methods, the objectives that have been devised are set out as follows:

1.4.1. Design of Accurate Model

The framework derived from this work is derived from DeiT focused on the classification
of brain tumor types. Thus, the objective of the current model is to train DeiT to perform optimally
in distinguishing between the global and local features in particular MRI scans. It focuses on the
macro- and microarchitecture of most frequent primary brain tumors such as gliomas,
meningiomas, and pituitary tumors, and, for that reason, is capable of yielding even more
satisfying results than other techniques. As a result, bounding box localization should be
implemented to help boost the model for the localization of the tumor region in order to classify
regions with tumors and pinpoint such regions quite well with utmost certainty.

18
1.4.2. Enhancing Data Efficiency

Medical imaging datasets are usually scarce due to the expensive nature and technicality
involved in labeling them. To overcome this limitation, this research employs data augmentation,
transfer learning, and fine-tuning technique. The methods of data extension like rotation of images,
their flipping and intensity variations make a greater number of samples to train a model, making
more robust. Damping relies on Integer Programming and is used to minimize the cycles needed
to complete ‘s work, by taking into account the availability of workers and their efficiency level.
To this end, unlike prior studies that used large web-scraped datasets for predicting language use
and language change, this work only requires 200 examples per language for model training and
aims to maximize the model’s performance in the context of relatively small and domain-salient
datasets.

1.4.3. The present work is aimed at enhancing the computational efficiency

As however is true in real-world clinical application, computational efficiency is a crucial


factor. The model has to extract valuable information from the MRI scans in a short time to meet
time-sensitive decision-making needs. In this work, the existing architecture of DeiT is altered to
increase its processing speed on real-time images to less than 1 second per image. Methods like
pruning of models and number scaling will also be adopted in order to reduce the computational
cost without necessarily reducing the model’s accuracy. This optimization makes the model
deployable in the environments characterised by limited resources such as rural clinics and other
small healthcare facilities.

1.4.4. Maintaining the capability of scaling up and down and being flexible or changeable

Therefore, both scalability and adaptability are critical characteristics of a model that
allows it to be used in real-life. The purpose of this work is to develop a framework that is not
constrained to a specific dataset, imaging protocol or scanner model. Furthermore, the research
validates the model using datasets from different institutions which reduces bias arising from the

19
demographic and technical variations. Besides, the project outcomes will be transferrable to CT
scans, or any other imaging modality as they pertain to the medical imaging domain, increasing
the domains of the framework’s application to healthcare. Thus, achieving these objectives, this
study provides a sound, fast, and highly scalable solution to the brain tumor classification problem
for advanced improvements in clinical procedures and, consequently, patients’ lives [9].

1.5 Challenges

As has been earlier shown with approaches such as Data-Efficient Image Transformers (DeiT),
there are important strides forward for brain tumor classification; however, there are still
difficulties when investigating their application to medical imaging. If these challenges are not
well-addressed, their actual applicability and usefulness of such models in clinical practice may
be constrained. The key challenges include the following:

1.5.1. Data Security

The fundamental need for annotated datasets of a high quality in the development and
testing of ML models is especially crucial in medicine. But, in medical imaging, getting such
datasets is still very hard. It takes a lot of time and money to mark MRI scans, as this should be
done by specialized radiologists only. Additional challenges are decisive privacy issues and
regulatory limitations concerning the use of patient information. However, those specific types of
brain tumors are quite rare making it even worse, distribution of data sets and under-representation
of various type of tumors. Such scarcity compromises the capacity of models to perform tasks of
generalization across different patient population and images conditions [10].

1.5.2. Variability in MRI Data

Data collected with MRI have a lot of variability because of the differences in the type of
scanners, sequences, resolution, and the patient’s population. For example, different MRI
machines can give an image with different intensity range or pixel density. Also, the differences
in technical factors, for example contrast settings for images or the thickness of the slices used in
imaging, cause variations. This is made even more challenging by the patient-related factors,

20
including movement artefacts and anatomical differences in patients’ MRI information. These
differences create difficulties for both model training and assessment because the models might
not be able to correctly transfer their learning from one dataset obtained from different hospital
and/or different patients [11].

1.5.3. Computational Requirements

While DeiT is optimized to improve on the traditional transformer models, its computation
works on the assumption of comprehensive MRI scans, which are computationally expensive to
compute. Both training and inference of transformer-based models require significant operations,
particularly when it comes to large 3D MRI data. This becomes a challenge especially in setting
which may not have high end hardware for instance in rural clinics or some small health facilities.
Third, the energy required to train large models, which would comprise a part of the classifier, is
rather worrisome as far as ecological concerns are rising [12].

1.5.4. Ethical and Regulatory Issues

The application of artificial models in a health context is subject to norms that are ethical
and legal. Security and privacy of data we come across will always be important and identification
of patient in case of medical records should always be done properly. The models must therefore
be developed bearing in mind adequate removal of predisposing factors that may lead to
differential health care delivery such as age, gender and ethnicity. However, it is equally crucial
to understands laws such as GDPR or HIPAA to which they have to adhere to, the legal and ethical
use of AI in clinical practice. Additionally, the possibility of explaining decisions made by the
model to others is a crucial criterion for making a trustful base with other healthcare professionals
and patients [13]. are a considerable.

1.6 Methodology Overview

This research outlines a foolproof method that utilizes Data-Efficient Image Transformers
(DeiT) for the classification and segmentation of brain tumor images. The methodology centers on

21
fine tuning the model for clinical uses and usability criteria such as speed and precision are
paramount. The key steps are as follows:

1.6.1 Preprocessing

Preprocessing is an essential that helps in attaining structural harmony of the input MRI
scans. Image processing procedures, for instance, normalization is used to ensure that pixel
intensity values are as close as possible to enhance on machine contrast or variations resulting
from two MRI scanners or two different protocols. To address this problem, noise reduction
methods such as Gaussian filtering are used to filter out noises arising from patient movement or
equipment variation which can disrupt the tumor boundary structure. Applying rotations, flips, and
intensity values to images increase the richness of the training dataset thereby decreasing chances
of over-fitting and improving the ability to perform well under various imaging conditions [14].

1.6.2. Model Training

The DeiT model is trained on the Brain Tumor Segmentation (BraTS) dataset which is a
standard benchmark dataset in brain tumors research. Sophisticated methods, including transfer
learning, are applied that allow the model to tapped into higher-level weights derived from larger
and more extensive datasets to cut down the time to train and improve the accuracy of machines
learning model when working with the relatively small dataset of medical images. Hyperparameter
tuning aims at achieving the best learning rate, batch sizes, or any other factor that enhances the
learning efficacy and effectiveness or the model.

1.6.3. Evaluation

The findings are based on standard metrics of the model and consist of accuracy, distances,
Recall, F1-score, and the IoU. They enable comprehensive evaluation of the model’s outlook on
the classification and the precise locating of the compartments that contain the tumor.

1.6.4. Comparison

22
The results are compared with traditional CNN-based models, such as ResNet and Efficient
Net, to establish the effectiveness of DeiT. The comparison validates the superiority of DeiT in
terms of both computational efficiency and diagnostic accuracy, making it a potential candidate
for real-time clinical applications [15].

1.7 Contributions

As a whole, this study provides research contributions to the medical imaging science
especially in the area of brain tumor detection and characterization. To counter these shortcomings,
and propose new approaches to drive research forward, this work builds on Data-Efficient Image
Transformers (DeiT). The primary contributions are as follows:

1.7.1. Efficient Classification

This work demonstrates that DeiT can be a viable approach to classifying brain tumors.
Through paying attention to the features, DeiT captures both global and local ones within MRI
scans allowing for the proper classification of various types of tumors: gliomas, meningiomas, as
well as pituitary ones. The model suggests new and advanced performance parameters that could
be fine-tuned to compensate for low performance innate in tendency to CNN based models. This
contribution focuses on how DeiT is capable of processing high dimensional medical images with
little reliance on large datasets making it effective in dealing with one of the biggest problems
affecting medical imaging [16].

1.7.2. Real-Time Applicability

The framework is thus intended to work in real time in clinical practice, especially where
decisions have to be made within short durations. This research optimizes the DeiT architecture
for performance improvements, making inference time less than a second per image to make it
helpful in timely use cases. Different mechanisms like model pruning and post-training
quantization are used to include high computational cost but reduce complexity hence making the
model workable even in resource-poor regions and settings like rural healthcare centers [16].

23
1.7.3. Improved Localization

Close localization of tumor is very important in determining the best treatment plan to be
undertaken. This work introduces bounding box methods applied to improve the localization of
the DeiT model for the precise definition of tumor areas in MRI images. It also enhances the
accuracy of diagnosis and optimizes decision-making process regarding tumor size and location
being helpful for clinicians during diagnosis and treatment [16].

1.7.4. Ethical Considerations

Besides the technological conceptualization, this study also focuses on significant ethical
and legal concerns regarding AI in the HIM context. It guarantees adherence to data protection
laws and employs techniques for making models balanced for age, gender, or ethnicity if this
information is in the training dataset. In addition, model interpretation is also underlined as a
significant priority which would allow clinicians to trust the results of the model. These come into
the usefulness of AI within the clinical facility, thereby resulting into ethical and fair equal
distribution of health care to all [16].

1.8 Summary
This chapter highlighted the significance of brain tumor detection, the limitations of traditional
methods, and the promise of DeiT as a transformative solution. It outlined the research objectives,
methodology, challenges, and key contributions, emphasizing DeiT's potential for accurate,
efficient, and real-time medical imaging. The next chapter, Literature Review, will explore
existing research on brain tumor detection using deep learning. It will focus on the limitations of
CNN-based models and the advancements of transformers, establishing the foundation for the
research gaps addressed in this study.

24
CHAPTER 2

LITERATURE SURVEY

AI has significantly contributed to improvement of detection and classification of brain


tumor. The field has gone from using MRI scan images with simple manual interpretation to
complex automated techniques; however, there fundamental issues with efficiency, accuracy, and
scalability persist. Whereas once the gold standard of diagnosis, manual interpretation suffers from
the dependence on radiologist experience and knowledge which renders it slow, labor-intensive
and producing potentially inconsistent results. These limitations created the basis for adopting
automated methods into the process.

CNN has played a very crucial role in enhancing the automated brain tumor detection in
recent years’ existent literature. CNNs successfully capture hierarchal features ranging from very
basic such as edges up to higher levels making CNN a very suitable tool for the classification and
segmentation of tumors. The highlighted architectures include ResNet and U-Net that have been
found to instill high accuracy in tumor identification as well as delineation missions. However,
they are not devoid of some drawbacks that has been mentioned below The first problem that can
be associated with CNNs is that they are considered to be relatively slow in terms of computer
processing. With incorporate -annotated datasets and are proven to be very sensitive for which
data labeling is cumbersome and expensive especially in the medical sector. Further, CNNs are
known to be computationally, and therefore require a powerful hardware system both for training,
and for running the desired inference. In addition, CNNs have limited learning of the global context
of images, meaning that in complex cases the function of distinguishing between tumors and even
healthy tissues may be inaccurate.

Transformer based models are revolutionizing the field of computer vision including
medical imaging. What makes transformers different from CNNs is that self-attention mechanisms
help in processing both local and global dependencies in data which makes transformers a better
fit in more exhaustive image analysis tasks. Image transformers have been proposed as a variant,
known as Data-Efficient Image Transformers (DeiT), which tries to overcome high computations

25
and data needs of the base transformers. DeiT is designed for the low-data regime, a prevalent
issue in medical imaging, and proposes comparable performance and hardware constraints. The
effectiveness of its analysis of MRI scans with less hardware resources and time requirements
suggests other real-time application in the clinic.

Additional to the above-mentioned approaches, bounding box techniques also improve the
session of automated detection since it locates the tumor areas specifically within MRI scans.
Bounding boxes, when incorporated into DeiT, provide a more specific likelihood of tumor
positioning to allow boundaries to be effectively distinguished. This improves both classification
and localization processes, and makes the results more valuable for clinicians. Nonetheless, there
are challenges which include real-time processing, model interpretability, and the capability of the
same model to perform well differently in datasets and imaging protocols. They are critical, yet
still absent in the literature, to establish improved, scalable, and highly effective approaches for
brain tumor detection and classification.

2.1 Traditional Methods

Diagnosis of brain tumor has always involved the use of MRI images which are then
interpreted by radiologists. This traditional approach, as suggested earlier, is in many ways largely
useful but equally rigid, time and labor-intensive. While interpreting scans, radiologists need to
make a diagnosis of changed tissues, a process that may be significantly hampered by tumor depth
or complexity. In turn, variability is play in the diagnostic accuracy since interpretation is in volved
and as a result it is an inconsistent affair. In recent years, diagnostic imaging data in healthcare
settings has ramped up, further exacerbating manual processes to the point of needing scalable
automation.

The entry of automatic approaches in medical imaging, especially Convolutional Neural


Network (CNNs), became a giant step in the identification of brain tumors. CNNs are very
effective since they can learn and minimize down the hierarchical features of the medical images.
Petridis et al. The input of these architectures is formed of local image patches, and by analyzing
the pattern of the pixels in those regions, edges, textures, and more complex shapes such as those

26
seen in tumor classification and segmentation can be easily distinguished. Some of the State of the
Art CNN architectures like ResNet and U-net have enabled enhanced performance in the detection
and segmentation of tumor within MRI scan. These models have become reference point as far as
automated analysis of medical imaging research is concerned, a standard [17].

2.2 Advances in Transformers

Transformers, initially used for natural language processing, are now widely applicable for
the field of computer vision, including medical imaging. While CNNs process data based on the
convolution operation transformers utilize self-attention ones; thus, it is capable of processing
relationships within data both on a local and global scale. Such characteristic is surely beneficial
when it comes to the instances of the medical imagery, as in most of the cases the precise analysis
is based on the assessment of both the general picture and the marginal details. For example,
identification of brain tumors involves describing its localized characteristics such as the texture
and shape and its relation to the tissues it is located. Specifically, the integration of these
perspectives has made transformers a preferred choice for such applications.

Data-Efficient Image Transformers (DeiT) are new transformer architecture designs


specifically in settings with low data availability. The optimization of traditionally used
transformers is high and needs big data to train, which has restrained the application of
transformers in medical imaging. DeiT is efficient in addressing these issues through the
improvement of transformer architecture to minimize computational cost while improving data
usage. In this way, by using the pre-trained weights from the large-scale datasets, and then fine-
tune on the smaller domain-specific datasets, DeiT achieves high performance even without
requiring large medical imaging annotations. This makes it most suitable to medical field whereby
finding accompanied data is both costly and time consuming as indicated in [19].

In the context of brain tumor detection, DeiT can have performance similar to or surpass
CNNs. Its self-attention mechanism enables the model to take into account long dependencies;
thus, it does good classification and localization tasks of tumors in MRI scans. Furthermore, it is
evidenced that DeiT reduces significantly training time and inference latency, a promising

27
candidate for real-time clinical applications. These efficiencies, in addition to its high accuracy
and scalability, make DeiT a disruptive technology for medical imaging workflows, mainly in
environments where rapid, reliable diagnostic outputs are desired [20].

2.3 Bounding Box Techniques

Bounding box approaches are a primitive part of object detection tasks that allow detecting
particular areas of interest in images. For brain tumor detection bounding box plays an important
role while giving spatial cues necessary to describe the tumor in MRI images. This capability is
most important in clinical situations where the location of a disease state is directly pertinent to a
given treatment plan. For instance, a precise identification of the tumor’s location is needed to plan
the surgery, not to harm the healthy tissues and in radiation oncology to irradiate a limited area
and spare the adjacent healthy tissues.

The integration of bounding box techniques with Data-Efficient Image Transformers


(DeiT) greatly improves its performance in tumor localization. Taken together, the global and local
features learned by DeiT and the bounding box Bert Gongloe have demonstrated that the bounding
box approach is well-suited for emphasizing tumor-specific regions in the MRI scan. This
combined approach enhances the efficiency of classification and localization while being able to
address even difficult scenarios, such as when tumor size is small, or the tumor merges with
surrounding tissue structures. Even the least obvious irregularities become easily detected when
DeiT works in conjunction with bounding boxes to clearly outline the tumor area.

In addition to enhancing accuracy, bounding boxes improve the interpretability of AI


models, which is a critical component in clinical adoption. Bounding boxes give clinicians intuitive
insights into how the model focuses and makes decisions through visual representations of the
model's focus and decision-making process. This transparency leads to trust in AI-driven
diagnostic tools, allowing their integration into workflows where reliability and accountability are
paramount. Since the bounding box input capability is highly useful, DeiT appears to stand at a
competitive end among various other efficient detectors for a wide use of brain tumor detection

28
and treatment planning because it holds the robust ability for precision and efficiency with better
interpretability [21].

2.4. Gap Analysis

However, several open issues still remain the limitations to apply AI approaches in clinical
practice comparing to brain tumor detection progress. First, there is a problem of real-time
processing since most of the present models, including classical transformers, are computationally
demanding, which leads to high inference time. These limitations however, confine them for
practical use in real life clinical situations, where timely proper and accurate decision is paramount.
Although several of these computational demands have been offset by Data-Efficient Image
Transformers (DeiT), more efficiency improvements are still needed to support strict real-time
functionality in networking and information system environments, especially in health care
infrastructure with limited computational resources [22].

The second major environmental challenge therefore is interpretability. DeiT and many
other deep learning models are essentially ‘black boxes’ and therefore for the clinician, it is often
challenging to understand why a specific model is making a certain prediction. Undisclosed
variables in this setting breed mistrust and rejection in critical medical areas. While methods like
attention maps and visualization tools are partially interpretably, more comprehensive methods are
required for proper interpretability in order to bring insights to clinical practice. Increasing the
interpretability of these models is crucial to increase and maintain trust from health care
professional and guarantee the safe use of AI in health care settings [23].

Last but not the least, variability of datasets presents a big challenge for generalized
artificial intelligent models. MRI datasets are also largely different from the other because they
differ in imaging protocols, scanner types and the patients. These variations reduce the chances of
observing similar model performance when they are used in different institutions or populations.
The models trained specifically to work on particular data sets provide less reliability and are not
easily scalable when confronted with other datasets. To overcome these challenges, there must be
effective training processes, superior methods of data augmentation, and domain adaptation to

29
achieve efficiency in numerous environments and image acquisition protocols [24]. Addressing
these challenges chin is paramount for successful application of AI models in clinical practice in
an efficient, understandable and generalizable manner.

Table 2.1. Summary of Related Work in Brain Tumor Detection and Classification

Study Model/Technique Dataset Objective Key Findings


Used
Waqas et al. (2021) Faster R-CNN with Custom Dataset Classification and Bounding boxes
[25] bounding boxes localization improved tumor
localization
accuracy.
Mehmood et al. Hybrid CNN and King Saud Dataset Tumor Improved accuracy
(2022) [26] NASNet-large classification with hybrid models
but required high
computational
resources.
Liu et al. (2023) Transformer-based Various Medical Brain tumor Transformers
[27] models for Sets segmentation outperformed
segmentation CNNs in data-
scarce scenarios.
Basturk et al. Data-Efficient Image Custom Dataset Image Showed superior
(2021) [28] Transformer (DeiT) classification performance with
small datasets
compared to CNNs.
Havaei et al. (2017) Deep neural networks Public Dataset Tumor Highlighted the
[29] for brain tumor segmentation limitations of
segmentation CNNs in handling
noisy MRI scans.
Nadeem et al. Deep learning for Medical Imaging Brain tumor Identified
(2020) [30] analysis and detection detection challenges in
interpretability and
dataset
dependency.
Öksüz et al. (2022) Fused features with Biomedical Dataset Tumor Achieved better
[31] deep learning models classification classification with
fused features but
needed high-quality
data.
Aljohani et al. Metaheuristic- BRATS Dataset Diagnosis and Improved
(2024) [32] optimized CNN classification optimization led to
higher accuracy but
required significant
computational
resources.
Kaldera et al. Bounding boxes with Custom Dataset Localization and Bounding boxes
(2019) [33] Faster R-CNN classification improved
interpretability in
medical imaging
tasks.

30
Ari and Hanbay Deep learning-based Turkish Dataset Tumor Demonstrated good
(2018) [34] classification system classification accuracy but
limited
generalization
across diverse
datasets.
Lee et al. (2014) Self-attention Experimental Brain imaging and Highlighted
[35] transformers for brain Dataset tumor analysis limitations of
imaging traditional methods
in high-
dimensional
medical data.
Ramdlon et al. K-nearest neighbor Public Dataset Brain tumor Highlighted
(2019) [36] method classification limitations of
traditional methods
in high-
dimensional
medical data.
Ait et al. (2022) CNN with Bayesian Healthcare Dataset Brain tumor Bayesian
[37] optimization classification optimization
improved CNN
performance but
required detailed
preprocessing.
Ali et al. (2020) Domain mapping with Multiple MRI Sets Low-grade glioma Domain mapping
[38] deep learning prediction helped with dataset
variability but
lacked scalability.
Pereira et al. (2017) Bounding boxes for Custom Dataset Localization and Demonstrated the
[40] object detection in detection utility of bounding
medical imaging boxes in improving
detection precision
in medical imaging.

2.5 Summary

This Chapter is literature review revealed the evolution of brain tumor detection from manual
interpretation to advanced deep learning methods, such as CNNs and transformers like DeiT,
which highlights the strengths and limitations of CNNs and transformers like DeiT. Challenges
persist in real-time processing, interpretability, and dataset variability, requiring robust and
scalable solutions. The next step would be developing a DeiT-based framework with advanced
preprocessing, bounding box integration, and domain adaptation to address these gaps.

31
CHAPTER 3

PROPOSED METHODOLOGY

The research question of this work is formulated as follows to facilitate utilizing data
available in the BraTS database which is a large and widely-used benchmark source for exploring
brain tumor, containing detailed expert annotations of high-resolution MRI scans. Normalization
of the pixel intensities, noise removal, and rotation flip, and other techniques like shift scale, zoom
and intensity changes are employed as preprocessing steps on the dataset to make it more diverse
and to increase the model generalization capability.

Originally it brought The Data-Efficient Image Transformer (DeiT) architecture optimized


for MRI analysis in mind. In DeiT, MRI images are split into patches via the embedding
mechanism, allowing self-attention layers to address both localized and global images adequately.
To fit the model for medical imaging applications, the model is described to use medical
pretraining in order to enable it distinguish between features of tumor associated patterns. In
addition, bounding box approaches are incorporated for improving the localization accuracy
together with the ability of the model to identify and outline the tumor contour ally, including in
sites where tumors are small or minimally invasive.

For the implementation, the main tool for model building and training is PyTorch, and
OpenCV for preprocessing. Thus, utilizing high-performance hardware such as NVIDIA GPUs,
the computation of high-resolution MRI data is fast, thus attaining real-time performances. A
proposed loss function for the training of the system consists of two: cross entropy that minimizes
the misclassification and Intersection over Union (IoU) for tumor localization. To control both
efficiency and convergence stability, which is highly important for the deep learning, the Adam
optimizer is employed. This is done to set good values to basic parameters include but not limited
to; learning rate, batch size and dropout rate in an attempt to reach best accuracy and
generalization. This methodology is expected to solve some of the most important issues affecting
the process of real-time analysis, interpretability of results, and variability of the dataset for brain
tumor detection.

32
3.1 Datasets

This work uses the BraTS (Brain Tumor Segmentation) dataset which is renowned and
applied in most research finding in the field of brain tumors. The BraTS dataset contains MRI
scans with the tumor areas outlined by experts for important tumor types consisting of gliomas,
meningiomas, and pituitary tumor. These annotations include basic tumor labels such as enhancing
tumor, peritumoral edema, and necrotic core, which means that it is versatile dataset to train and
test machine learning algorithms intended for brain tumor identification and categorization.

Thus, to improve the model’s performance and have the difference equalizes among the
data sets, preprocessing is performed. Normalization is made to map pixel intensity values in order
to correct for factors arising from differences of imaging protocols as well as scanner hardware.
This is important in order to have a base to which all the samples will be compared in order to
understand their means to each other and the general population. Certain artifacts resulting from
the patient movements or irregularities within the equipment need to be eliminated, hence, the
employments of the basic noise reduction techniques like the Gaussian filtering the preservation
of the critical features of the tumors. There is also another important step in preprocessing called
data augmentation whose goal is to expand the dataset to avoid bias or overtrain. Basic transforms
consist of geometric altering such as rotation, translation and reflection, while intensity variation
include enhancement techniques of different imaging conditions. In fact, these techniques do not
only scale up the sizes of training datasets by a factor but also enhance model’s capability to
estimate on unseen data. Due to its high quality of annotation and strict preprocessing, BraTS
dataset is the most appropriate choice for this study. Its use allows formulating an effective
approach that can harness all the rich features important for recognizing brain tumor variability
and accounting for inconsistencies in tumors’ appearance and image acquisition conditions. These
preprocessing step can help to make sure that the model is prepared well in terms of the
requirements for dealing with real data in medical image.

33
Figure 3.1 MRI Image of a Brain

Figure 3.2 BRATS Dataset

34
3.2 Model Design

The described framework is built around the Data-Efficient Image Transformer known as
DeiT and optimized for MRI scans. Compared to initial CNN structures, DeiT native methods of
self-attention provide methods to capture both local and global relations in the image. This ability
of combining smaller-scale details with respective context information is crucial for medical
imaging tasks including the detection of brain tumors where precise spatial pattern properties are
essential for appropriate analysis.

The DeiT architecture commences with the division of the MRI scans into smaller, non-
overlapping patches therefore coming up with the patch embedding feature. These patches also
help the transformer take in the image as sequence inputs so that it can process it seamlessly. Then
the tokens go through self-attention layers to describe the dependencies of patches at a global level.
This confirms that the model can judge the region with tumor within the tissues, a major drawback
that CNNs have been shown to have since they only consider small areas of an image. Using
datasets that focus on the traits of MRI scans, pre-trained medical-specific transformers are utilized
to advance DeiT for medical imaging. This fine-tuning process allows the model to continue to
learn patterns of the specific domain, such as differences in tissue contrast and tumor shape, thus
increasing its ability to generalize over different imaging environments and types of tumor.
Bounding box approaches for tumor localization is an important part of the proposed design. The
coordinates given by bounding boxes contain spatial features about the tumor areas hence giving
the model accuracy in drawing the tumor margin. It is particularly useful when tumor size,
distribution, or shape is small, distributed, or irregular, respectively, as is global and local
contextual knowledge. The bounding boxes not only increase the location precision, but also help
clinicians better understand where the lung cancer is located and whether the model’s decision is
correct.

This way, we are able to make the self-attention of DeiT learn the image contents along
with the bounding box integration allows for classification as well as localization of objects of
interest. By so doing, this design allows the identification of the borders of the tumor while at the
same time being computationally efficient for application in real-time practice. Hence, through
solving the problems of brain tumor detection the proposed model design demonstrates the

35
innovative contribution to the field of medical imaging, providing an efficient and semantically
transparent solution for accurate tumor detection and treatment planning.

3.3 Implementation Details

The enforcement of the suggested framework leverages current tools and frameworks to
enhance the speed, scalability and reliability in dealing with high-resolution MRI data. The
implementation of the described models is based on the PyTorch deep learning framework used
by many AI and ML oriented projects. PyTorch is highly flexible and allows for use of dynamic
computational graphs that are perfect for implementing complex architectures such as Data-
Efficient Image Transformers (DeiT). They make it really easy to adjust the model and add/change
bits that are particular to medical imaging such as adding tulips for the bounding box for tumors.

The image augmentation and preprocessing steps including normalization, noise removal
are done with the help of OpenCV, which is a strong library for image processing. In particular,
its large set of image preprocessing tools guarantees the correct interpretation of input MRI scans
for the DeiT model. Image resizing in order to match the patch embedding dimensions, as well as
adding geometric and intensity-based image transformations of the augmented dataset are
successfully performed using OpenCV, which enhances the Model performance and
generalization.

The framework is run-on high-performance configurations, such as NVIDIA GPUs, as part


of the training and inference process. GPUs facilitate good management of the computational
requirements related to the processing of the high-resolution MRI scans and training of
transformer-based architectures. These merged facilities greatly lower the amount of time required
for each training campaign, which makes it possible to test various parameters of the model and
determine their optimal values within a reasonable amount of time. To further enhance
computational efficiency, techniques such as mixed-precision training are integrated, leveraging
PyTorch’s AMP (Automatic Mixed Precision) functionality. This approach reduces memory usage
and speeds up training without compromising accuracy. The implementation also includes
mechanisms for monitoring and optimizing resource usage, ensuring that the system remains

36
scalable and adaptable to various deployment environments, including resource-constrained
healthcare settings.

Overall, the combination of advanced tools, efficient frameworks, and high-performance


hardware ensures that the proposed methodology is robust, scalable, and capable of addressing the
complexities of brain tumor detection in real-world clinical applications.

3.4 Training Strategy


The proposed model is trained with a well-crafted strategy for maximization of both
classification accuracy and localization precision. A hybrid loss function combines the potential
of using cross-entropy loss for tumor classification and IoU loss for precise localization of the
tumor. This will make it compute the cross-entropy of correctly classifying the tumor type and IoU
for the precision of bounding boxes in drawing tumor areas. The approach of having a two-
objective method makes sure that the model does an excellent job in both the tasks which are
significant in identifying a brain tumor.

A stable and efficient convergence of training using the Adam optimizer is adopted. The
learning rate adjustment characteristics and adaptive momentum properties make Adam
particularly suitable for optimizing complex architectures such as Data-Efficient Image
Transformers (DeiT). It ensures smooth convergence even when dealing with high-dimensional,
sparse gradients commonly encountered in large MRI datasets related to medical imaging tasks.
Hyperparameter tuning is to find the best combination of parameters, such as the learning rate,
batch size, and dropout rates. Techniques such as grid search and random search are used in the
process to systematically search through the parameter space. For example, the learning rate is set
to achieve a balance between training speed and convergence stability. The batch size is optimized
so that it uses the memory of the GPU without affecting the performance of the model. Dropout
rates are set so that it does not overfit the model, and therefore, it generalizes well to unseen data.

Early stopping is used as well to enhance further the efficiency of training while monitoring
the validation performance to prevent overfitting and unnecessary computation. All these
strategies add up the scalability of the framework to manage the variety of datasets and adapt to

37
real-world clinical scenarios. The training strategy assures the model of high accuracy and
reliability, thus making it a very robust tool for brain tumor detection and localization.

3.5 Summary

This chapter is proposed methodology leverages the BraTS dataset with rigorous preprocessing,
integrates the DeiT architecture for capturing local and global MRI features, and employs
bounding boxes for precise tumor localization. Implementation uses PyTorch and OpenCV on
high-performance NVIDIA GPUs, with a hybrid loss function and Adam optimizer ensuring robust
training. Hyperparameter tuning and early stopping enhance efficiency and scalability. The next
step involves validating the model’s performance on diverse datasets, optimizing for real-time
applications, and comparing results with existing methods to establish its clinical reliability.

38
CHAPTER 4

RESULTS AND DISCUSSION

The proposed framework for brain tumor detection and classification is evaluated
thoroughly using key performance metrics, comparative analysis, real-time efficiency, and
qualitative results to determine its effectiveness and applicability in a clinical setting.

4.1 Performance Metrics

The proposed model was assessed in terms of key performance metrics to assess reliability
and robustness along with overall performance in terms of classification and localization of brain
tumors. This includes accuracy, precision, recall, F1-score, and Intersection over Union (IoU) that
together depict the full capability of the framework to handle complex tasks in medical imaging.
Accuracy is the first indicator of how well the model can identify the tumor type in a dataset. The
proposed framework resulted in an accuracy of 98%, which means that it had a high reliability in
classifying gliomas, meningiomas, pituitary tumors, and non-tumor regions. It shows that the
model has a robustness in handling various tumor morphologies and imaging variations.

Precision measures the proportion of correctly identified tumor predictions to the total
number of predictions made. A precision score of 97% reflects the model's ability to minimize
false positives, which is critical in medical diagnostics where overestimating the presence of a
tumor can lead to unnecessary treatments or interventions.

In recall, the measure is done regarding the percentage of actual tumors that the model
correctly predicts. A recall score of 96% indicates how good the model is at catching the actual
positive cases and not missing possible diagnoses. The two of these metrics underscore the well-
balanced performance of the model in both true and false predictions.

The F1-score, as a harmonic mean of precision and recall, evaluates the trade-off of both
parameters through a single value. Therefore, the framework is achieving a satisfactory F1-score
of 96.5% with performance at its best for situations in which both high precision and high recall

39
simultaneously need to be achieved. Hence, such balanced performance assures applicability in
clinical environments in which both false positives and false negatives need to be minimal. For
localization tasks, IoU (Intersection over Union) score was used to assess the precision of tumor
boundary delineation. An IoU score of 93% is a good indicator of how accurately the model can
localize tumors in MRI scans. This is crucial in treatment planning because precise localization of
tumors is the basis for the successful outcomes of surgical or radiation therapies.

Overall, these metrics confirm high accuracy, reliability, and precision of the framework
both in classifying and localizing targets. Such a level of performance points toward the full
potential of the proposed methodology as being a robust tool for diagnostics of a real-world brain
tumor diagnostics. Future improvement will work on further perfecting metrics and edge cases for
stronger reliability and applicability.

Figure 4.1 Performance metrics of the Proposed Model

4.2 Comparative Analysis

40
The performance of the proposed DeiT-based framework was compared against baseline
CNN models, namely ResNet and EfficientNet, known to be well-established within medical
imaging tasks. Specifically, the comparison was undertaken in terms of accuracy and localization
precision, measured with IoU, as well as overall robustness across complex tumor patterns. The
local features extracted by these CNNs like ResNet and EfficientNet, which mainly encompass the
edges and textures within the images, have been more aggressive models with good performance
in the diagnosis of brain tumors. To some extent, ResNet delivered the high accuracy at 94%.
Slightly better, its relative was EfficientNet as a model, which made its accuracy go up to around
95%. However, the framework developed based on DeiT far outclassed these models with an
excellent accuracy of 98%. This significant boost in the performance can be contributed to that
DeiT captures both the global and local image context through a self-attention mechanism. Since
DeiT processes MRI scans holistically, it integrates spatial relationships between the tumor regions
and surrounding tissues more precisely and reliably.

Localization capabilities were tested using the IoU metric that is measuring overlap
between predicted and ground-truth tumor boundaries. ResNet scored 82% IoU, while EfficientNet
scored an 85% IoU. Although these scores reveal reasonable localization performance, the score
of the DeiT architecture was 93%. This score improvement demonstrates some of the advantages
of taking bounding box techniques and combining them with the self-attention mechanism of
DeiT, allowing the network to draw clear boundaries around the tumor, even small or diffuse
tumors.

The second advantage of DeiT over CNN-based models is that it generalizes well across
diverse tumor patterns and imaging conditions. This is because CNNs are unable to handle
complex tumor morphologies and variations in MRI protocols, whereas DeiT's transformer-based
architecture deals well with these challenges by modeling long-range dependencies and adapting
to global image contexts. Besides having better performance metrics, DeiT exhibited better
inference efficiency when run on high-performance GPUs. This further shows its potential in real-
time clinical applications where real-time image processing with accuracy and speed is crucial. To
conclude, the proposed DeiT-based framework has demonstrated higher accuracy, localization
precision, and generalization performance compared to ResNet and EfficientNet. Results: DeiT

41
emerges as a promising approach to boost brain tumor detection and classification performance in
realistic clinical environments.

Figure 4.2 Comparative Analysis of Accuracy Models

Figure 4.3 Comparative Analysis of IoU Among Models

42
4.3 Real-Time Efficiency

Real-time computational efficiency is one of the critical requirements to integrate


automated diagnostic tools into clinical workflows. There, timely decision-making would really
make a big difference in the patient's outcome. The proposed DeiT-based framework was
evaluated rigorously in terms of its inference time and computational efficiency, indicating its
suitability for real-world applications. It utilizes high-performance NVIDIA GPUs to yield an
inference time of 0.5 seconds for MRI scan inference. This would make the system meet real-time
requirements of clinical settings as the quicker and accurate processing would be essential. CNN-
based baselines, namely ResNet and EfficientNet, required more inference times as their times are
comparatively longer; it is at 1.2 seconds for ResNet and at 1.0 seconds for EfficientNet. These
results demonstrate the computational advantages of the DeiT architecture over traditional CNN-
based methods. This kind of architecture allows the DeiT to capture more local and global features
within the image without demanding computationally intensive operations for convolutions. It
takes advantage of patch-based embedding to process smaller fragments of an image sequentially
to optimize memory usage with overhead processing that is minimal in nature. Such design aspects
made DeiT not only accurate but also computationally efficient.

Further efficiency improvements were obtained by using mixed-precision training, that is


PyTorch's Automatic Mixed Precision. Mixed-precision training allows doing calculations on
lower precision data types, which results in consuming less memory and accelerating the
computations without sacrificing accuracy. It combines with the optimized configurations of the
GPU hardware to significantly reduce both the training and the inference times; hence the model
is scalable to deployment in resource-constrained environments.

This also is one of the specific advantages the framework has, especially in high-volume
clinical settings. This reduces computation and allows for rapid turnaround of numerous MRI
scans run in parallel. Even relatively smaller bottlenecks let radiologists and clinicians pay
attention to decision-making points rather than waiting for the output. DeiT would be sustainable
in healthcare technology due to less computation needed and thereby using less energy. The
outcome reflects that the framework offers high-performance advantage as it surpassed the real-

43
time efficiency of the CNN architecture. High-speed inference arising from the state-of-the-art
architectural design along with the optimal training renders the model based on DeiT as an efficient
candidate for being deployed on real-world clinical detection in cases of brain tumors. Future
works will be oriented toward more optimizations such that this kind of efficiency might not get
compromised at various hardware environments.

Figure 4.4 Real-Time Computational Efficiency Comparison

4.4 Qualitative Results

Qualitative analysis of the framework based on DeiT provided critical insights into its
strengths and limitations in the detection and classification of brain tumors. Visual outputs, such
as bounding boxes and attention heatmaps, were analyzed to assess its ability to localize tumors
properly and interpret its predictions.

Successful cases showcased the model's capability to identify tumor regions with
remarkable precision. Generated Bounding boxes were generated from integration of localization
techniques within the model and were always aligned to annotated ground truth that came out to
be accurate to trace the boundaries of the tumors. This was more especially so in MRI scans, which
had clear margins, where even smaller or otherwise formed tumors were localized precisely by the
model. Even on tougher cases that included more dispersed structures of a tumor, the model

44
demonstrated excellent localization accuracy to distinguish tumor areas from non-tumor regions.
These findings point towards the strength of the self-attention mechanism within the DeiT
approach in focusing equally well on local as well as global features from an image while doing
the proper identification for accurate tumor detection.

Additional attention-layer-based heatmaps of the model further supplemented the


explanation ability of predictions generated from this model. These heatmaps depicted parts of the
image a model was focusing on in doing classification and localization tasks, making it intuitive
to understand what the model was doing. The model's attention mechanism was targeted toward
tumor regions, making the outputs of the model comply with clinical expectations, thus helping to
build trust and usability among healthcare professionals.

Qualitative evaluation, however, indicated failure cases, mainly in scans with highly
ambiguous characteristics. The model sometimes failed to differentiate between tumor regions and
surrounding edema, leading to overestimation or underestimation of the tumor boundaries. Such
problems occurred mainly in scans with overlapping intensity profiles between tumors and
adjacent tissues. Such limitations point to the need for further optimization, especially in enhancing
the model's ability to interpret complex imaging scenarios.

The DeiT-based framework succeeded in showing very promising results with qualitative
outputs consistent with accurate localization and better understandable predictions. One of the
most important applications of bounding box and attention heatmaps for the understanding of the
model's decision-making power has been developed in the paper. However, the failure case
handling capability of this framework and its extension to data coming from multi-modal imaging
would be crucial in the process of adding robustness to ensure full diagnostic capabilities are
available for such clinical applications. Further research will be directed toward these avenues to
further refine the proposed framework for real-world clinical usage.

45
Figure 4.5 Qualitative Analysis of DeiT-Based Framework
4.5 Final Output Visualization

The final output of this proposed DeiT-based framework is able to showcase very precisely
the detection and localization capabilities towards such brain tumors. The following results above
on the test set from the model itself give examples of tumor localization based upon bounding
boxes overlying major areas of interest utilizing colored overlays based on the idea of attention.
Validations come not only from showing high-quality performance but also demonstrate good
qualities of interpretability as applicable to clinical use cases.

Every bounding box represents an identified tumor region, marked by distinct tumor
boundary and neighboring tissues. The color attention maps explain what contributed most toward
the predictions made by the model: ensuring it's transparent to its choices. Outputs aligned with
the ground truth annotation achieved a high Intersection over Union, at 93%, so tumor localization
accuracy is accurate.

46
Figure 4.6 Final Output Visualization of Tumor Detection

This visualization manifests the strength of the framework based on DeiT regarding
robustness in varied morphologies of tumors, even where tumors are small, diffused, or irregular
in shape. Through the utilization of transformer architecture, this model depicts reliable and
scalable approaches in detecting brain tumors.

47
In conclusion, the last produced outputs are proof of effectiveness in real-world diagnostic
scenarios visual evidence-of the model's ability to revolutionize brain tumor detection and
classification in clinical practice.

4.6 Summary

The Results and Discussion chapter highlighted the superior performance of the DeiT-based
framework in brain tumor detection, achieving high accuracy (98%), precision, and IoU scores,
surpassing baseline CNN models like ResNet and EfficientNet. Real-time efficiency was
demonstrated with an inference time of 0.5 seconds, meeting clinical requirements, while
qualitative evaluations showed accurate localization and interpretable outputs via heatmaps.
However, failure cases in ambiguous scans revealed areas for further optimization. The next
chapter, Ethical and Practical Implications, will explore issues such as data privacy, bias, model
interpretability, and deployment challenges in clinical settings.

48
CHAPTER 5

ETHICAL AND PRACTICAL IMPLICATIONS

This integration of AI into healthcare systems, especially in critical applications such as the
detection of brain tumors, represents a significant change in medical diagnostics. Successful
adoption, however, goes beyond just the efficiency of technology; it must also take into account
several ethical and practical concerns, such as interpretability, scalability, and respect for ethical
standards, such as avoiding bias and ensuring privacy in medical data. This is a concern that would
be essential for gaining trust from clinicians and patients, ensure fairness, and promote extensive
acceptance in clinical environments.

5.1 Interpretability

Explain ability has to be at the base for the deployment of AI systems in health care; their
decisions affect patients. The clinicians, while placing reliance on the AI predictions, need to
understand how they are obtained to effectively incorporate them in diagnostic and treatment
workflows. This becomes important in XAI, techniques such as Grad-CAM (Gradient-weighted
Class Activation Mapping) to solve complex tasks such as the detection of brain tumors.

Grad-CAM creates visual explanations by constructing heatmaps that show where the
model looked at an MRI scan to classify or localize. Visualizing these heatmaps brings out the
regions of interest that most influenced the AI's decision-making process for clinicians to
understand the decision-making process of the AI. For example, if the model classifies the tumor
type to be glioblastoma, the heatmap from Grad-CAM would indicate which particular regions in
the scan had led to this prediction. That is not only making the output understandable but also
aligns the model's decision with the clinician's expertise, thereby fostering a cooperative diagnostic
process.
Interpretability is all the more emphasized in high-stake situations where decisions can call
for critical interventions such as surgery, chemotherapy, or radiation therapy. Grad-CAM makes

49
it clear visually and allows the clinicians to validate predictions by AI on whether these align with
clinical findings. This kind of transparency guarantees that the AI will be used as a decision-
making tool and not as some "black box" solution, which decreases resistance to its adoption.
Interpretability also allows for error detection. For example, if the model has misclassified or mis
localized a tumor, Grad-CAM heatmaps can show whether the focus has been on redundant areas,
like peritumoral edema or artifacts, rather than directly on the tumor. This information can be used
by a clinician to understand where the limitations are and further investigate or make
improvements in the deployment of this AI system.

Furthermore, in demonstrating the model's emphasis on the salient regions of a clinical


matter, Grad-CAM gives the practitioners confidence to verify and employ the technology. It also
enables training and education of those who are less experienced in the medical practice and may
learn how to notice slight patterns in imaging data by comparing where AI has focused its attention
with expert annotations.

Finally, interpretability is necessary in solving ethical dilemmas. With a transparent model,


the bias from the model is decreased because the predictions can be inspected by clinicians and
thus the outcome will be equitable and fair for various patient populations. This is crucial to
achieving confidence in AI technology for both health care providers and patients as reliable and
accountable. It bridges the gap between AI systems and clinical workflows and bolsters trust,
accuracy, and accountability through the interpretability of Grad-CAM. This means AI systems,
such as DeiT-based frameworks, can effectively be integrated into healthcare with better precision
in diagnosis and patient care.

5.2 Scalability

The scalability of the proposed DeiT-based framework is an indispensable characteristic to


make sure utility extends beyond this specific application toward MRI-based brain tumor
detection. Its ability to be adaptable for other kinds of medical imaging, including but not limited
to CT scan, X-ray, PET scan, and ultrasound, forms a vital prerequisite for generalizing a greater
extent of diagnostics challenges that come along the clinical practices. DeiT and other transformers

50
are famous for their ability to generalize across different data types and tasks, so they provide
versatile foundational abilities for the expansion of this framework.

For example, in CT imaging, the model can be retrained to detect lung nodules, aneurysms, or
internal organ abnormalities, which are among the key diagnostic challenges in pulmonary and
cardiovascular healthcare. In X-ray imaging, DeiT might be used to detect fractures, joint
deformities, and early-stage pathologies, like osteoarthritis or pneumonia. Functional and soft-
tissue examinations widely use PET scans and ultrasound imaging, which also represents potential
applications for the model: in tumor staging or vascular analysis. These steps then require fine-
tuning the model utilizing modality-specific datasets in combination with preprocessing
techniques specific to a type of imaging. Therefore, because the resolution, contrast, and noise are
quite different from MRI for CT, one has to adjust the augmentation process and normalization
process. Further in optimization of the architecture of a model could be patch-size adjustment or
adding domain specific layers. It encompasses deployment scalability in terms of healthcare
institution settings: large, urbanized hospitals with rich technology and less well-equipped, more
remote rural clinics. The computationally intensive nature of the model through mixed-precision
training and hardware optimization allows it to be used across different levels of infrastructure in
computations. This flexibility makes it possible to deploy AI-driven diagnostics faster in
underprivileged areas and, consequently, brings better access to healthcare.

The need for the framework when integrated into the multi-modal diagnostic workflows
will be scalability. Combining insights from the different imaging modalities, fusing MRI and PET
data to analyze tumors, enhances accuracy in diagnosis and gives a whole-rounded view of the
patient's condition. Multimodal adaptability makes DeiT an all-purpose tool suited to the
complexity of diagnostic needs in the medical fields. In a nutshell, scalability reconfirms that a
DeiT-based framework holds wider clinical applicability since it adapts toward new imaging
modalities and the particular healthcare environment. This, therefore, maximized the value that
can be obtained from the model but led to the integration of AI into various clinical domains
because it further enhances the ability to diagnose and brings better care to patients.

5.3 Ethical Considerations

51
Fairness, privacy, and accountability will be crucial in the application of AI models in
health care. One of the biggest issues is how to deal with bias in medical AI systems. Most often,
it arises from biased training data that fails to capture patient demographics diversity across age,
gender, ethnicity, or geographic location. For instance, with a majority of MRI images in a dataset
belonging to a given age range, the algorithm may perform poorly when taken to different age
groups. The new framework reduces those risks with balanced sampling, focused data
augmentation, and the training that is fairness-sensitive. This ensures fair treatment of performance
across different populations: it improves the reliability of the model and its robustness.

The other ethical challenge here is data privacy. Such medical imaging datasets include
sensitive patient information, hence the need to protect the data against legal and ethical violations.
This proposed framework adheres to strict standards of data protection, including GDPR and
HIPAA compliance. Important measures include anonymization of data, secure storage, and
encryption of sensitive information. In addition, access controls and secure transfer protocols
enhance data security to ensure confidentiality of the patient throughout the lifecycle of an AI
model. In this light, the framework of this paper is more concerned with fairness, privacy, and trust
in deployment in actual healthcare settings.

5.4 Summary

The chapter on Ethical and Practical Implications has highlighted the need to address
interpretability, scalability, and ethical challenges when deploying AI for healthcare. Techniques
such as Grad-CAM improve transparency and trust, while scalability ensures the framework's
adaptability to various imaging modalities and environments. Ethical considerations such as bias
mitigation and compliance with data privacy standards like GDPR and HIPAA are critical for
ensuring fairness and confidentiality. The final chapter, Conclusion and Future Scope, summarizes
the findings of the research, details the contributions of the framework, and provides some
directions for further research, such as integrating multi-modal imaging and optimizations toward
practical applications.

52
CHAPTER 6

CONCLUSION AND FUTURE SCOPE

This proposed framework is the most advanced automated brain tumor detection and classification.
The promise made by the architecture of the Data-Efficient Image Transformer (DeiT), and
through it, achieves impressive performance in MRI-based brain tumor detection, highly accurate,
precise, and even efficient in localization. It had an overall impressive accuracy of 98% and an
IoU score for localization at 93%, which could be even more robust and general when it comes to
addressing the diversity of different tumor morphologies and variation with imaging. The
bounding box techniques further enhanced its capabilities to delineate tumor boundary by being
more precise, not necessarily in the complex cases presented or small or diffuse tumors. It allows
for real-time processing and an inference time of 0.5 seconds in any given MRI scan therefore
fulfilling the critical need within the clinical workflows where timely decision would play an
inevitable role. Using explainable AI techniques, including Grad- CAM, ensured that what the
model was producing and its outputs were understandable to medical professionals and aligned to
their expectations, hence cultivating that trust among them. The framework has also remained firm
on strong ethical imperatives, bias mitigation techniques, and data privacy compliance for
deployment into real-world healthcare settings. Where much development has been made on the
framework, further areas of development are aplenty that will strengthen its capabilities and push
the application to the next levels. First, the betterment of diagnostic accuracy as well as
comprehensiveness through complementary structural and metabolic information derived from
multi-modal imaging applications such as MRI-CT or MRI-PET fusion are aimed at. This will
hence, in turn, make adaptions in the framework such that it can handle the dataset as multi-modal
in its nature and therefore require some architectural changes along with a design of efficient data
fusion techniques. The second is the enhancement of the robustness of the framework in different
clinical settings. The performance of models would vary depending on the imaging protocol, types
of scanners, and patients' demographics. Over this could be achieved by domain adaptation and
transfer learning, but here comes the collaboration between more than one institution to pool more
extended and diversified datasets to have a better generalization. Last, even though it is real-time

53
processing, optimization in deployment in resource-constrained environments like rural clinics
with minimum computing infrastructure will be needed. Techniques like model quantization,
pruning, and edge computing will hugely reduce the footprint without performance loss. In fact,
specific lightweight transformer architectures for medical imaging will further be helpful to scale
up.

Lastly, interpretability has to be further enhanced. This provides deeper insights into the
decision-making of the model, through layer-wise relevance propagation or saliency maps,
improving upon the success of Grad-CAM, thus ensuring outputs from AI are consistent with
practical clinical requirements, enhancing the human-friendliness and effectiveness of the system.
In a nutshell, the significance of the framework based on DeiT is highly crucial for the medical
imaging domain and mainly for the case of detection and classification involving brain tumors,
supported by high accuracy along with interpretability, as well as real-time efficiency, so it is
feasible as an innovative tool in the health care system. Although still, this process should be
continued, and further investigated towards developing for overcoming the present difficulties so
as to extend their area of application and thereby universally accepted in the clinical circles.

54
REFERENCES

[1] A. Krizhevsky, I. Sutskever, and G. Hinton, "ImageNet classification with deep convolutional
neural networks," Advances in Neural Information Processing Systems, vol. 25, pp. 1097–1105,
2012.

[2] R. Zhang et al., "Efficient medical image segmentation with transformers," IEEE Transactions
on Medical Imaging, vol. 40, no. 1, pp. 289–299, 2021.

[3] H. Touvron et al., "Training data-efficient image transformers and distillation through
attention," in Proceedings of the 38th International Conference on Machine Learning, 2021.

[4] Y. Liu et al., "Transformers in medical imaging: A survey," Medical Image Analysis, vol. 73,
p. 102193, 2021.

[5] S. Waqas et al., "Bounding box localization in medical imaging using transformers,"
Computational Imaging and Vision, vol. 29, no. 4, pp. 341–356, 2021.

[6] K. He et al., "Masked autoencoders are scalable vision learners," in Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1604–1613,
2022.

[7] S. Mesko, "AI-driven healthcare: A transformative future," Digital Health, vol. 5, pp. 1–9,
2020.

[8] S. Bakas et al., "The BraTS challenge: Benchmarking MRI-based segmentation of brain
tumors," IEEE Transactions on Medical Imaging, vol. 36, no. 11, pp. 1993–2024, 2017.

[9] M. Waqas et al., "Bounding box techniques for tumor localization," Custom Dataset Studies in
Medical Imaging, 2021.

55
[10] M. Mehmood et al., "Hybrid CNN and NASNet-large for tumor classification," King Saud
Dataset Research, 2022.

[11] Z. Liu et al., "Transformer-based models for segmentation in medical imaging," Various
Medical Sets Analysis, 2023.

[12] M. Basturk et al., "Data-Efficient Image Transformer (DeiT) performance on small datasets,"
Custom Dataset Reports, 2021.

[13] M. Nadeem et al., "Challenges in interpretability and dataset dependency for brain tumor
detection," Medical Imaging Research, 2020.

[14] H. Kaldera et al., "Bounding boxes with Faster R-CNN for localization and classification,"
Custom Dataset Research Studies, 2019.

[15] M. Ali et al., "Domain mapping with deep learning for low-grade glioma prediction," Multiple
MRI Sets Analysis, 2020.

[16] M. Havaei et al., "Deep neural networks for brain tumor segmentation," Public Dataset Studies
in Medical Imaging, 2017.

[17] S. Pereira et al., "Bounding boxes for object detection in medical imaging," Custom Dataset
Reports, 2017.

[18] M. Ait et al., "Bayesian optimization with CNN for brain tumor classification," Healthcare
Dataset Research, 2022.

[19] A. Ari and D. Hanbay, "Deep learning-based classification system for brain tumors," Turkish
Journal of Electrical Engineering and Computer Sciences, vol. 26, no. 5, pp. 2275–2286, 2018.

56
[20] R. H. Ramdlon et al., "Limitations of traditional methods for high-dimensional medical data,"
in Public Dataset Research, 2019.

[21] M. Waqas, S. M. Hussain, M. Khan, and F. Jan, "Brain tumor segmentation and surveillance
with deep artificial neural networks," in Deep Learning for Biomedical Data Analysis, Springer,
pp. 311–350, 2021.

[22] M. Mehmood, N. Gul, M. Alam, and I. Ullah, "Improved colorization and classification of
intracranial tumor expanse," Journal of King Saud University-Computer and Information Sciences,
vol. 34, no. 7, pp. 4358–4374, 2022.

[23] Z. Liu, Y. Wang, X. Zhang, and S. Shi, "Deep learning-based brain tumor segmentation: A
survey," Complex & Intelligent Systems, vol. 9, no. 1, pp. 1001–1026, 2023.

[24] M. Basturk, A. Sarigul, and T. Kaya, "Data-efficient image transformers for medical
imaging," Medical Image Analysis, vol. 72, p. 102456, 2021.

[25] M. Havaei, A. Davy, and P. Warde, "Brain tumor segmentation with deep neural networks,"
Medical Image Analysis, vol. 35, pp. 18–31, 2017.

[26] M. Nadeem, M. Alam, and R. Masood, "Brain tumor analysis empowered with deep learning,"
Brain Sciences, vol. 10, no. 2, p. 118, 2020.

[27] M. Öksüz, E. Kaplan, and H. Çelik, "Brain tumor classification using fused features,"
Biomedical Signal Processing and Control, vol. 72, p. 103356, 2022.

[28] S. Aljohani, A. Alqahtani, and M. Alshamrani, "Automated metaheuristic-optimized approach


for diagnosing brain tumors," Results in Engineering, vol. 23, p. 102459, 2024.

[29] H. Kaldera, S. Gunasekara, and M. Dissanayake, "Bounding boxes for tumor localization," in
Advances in Science and Engineering Technology, IEEE, pp. 1–6, 2019.

57
[30] D. Lee, J. Kim, and S. Park, "Transformers in medical imaging," Endocrinology, vol. 155,
no. 8, pp. 2858–2867, 2014.

[31] R. H. Ramdlon, M. Yusuf, and R. Dewi, "Brain tumor classification using MRI," in
International Electronics Symposium, IEEE, pp. 660–667, 2019.

[32] M. Ait, T. Rachid, and F. Benhamou, "MRI diagnosis and brain tumor classification,"
Healthcare, vol. 10, no. 3, p. 494, 2022.

[33] M. Ali, A. Qureshi, and N. Kamal, "Deep learning for low-grade glioma prediction," Brain
Sciences, vol. 10, no. 7, p. 463, 2020.

[34] R. Val-Laillet, E. Blat, and M. Ramirez, "Changes in brain activity after obesity," Obesity,
vol. 19, no. 4, pp. 749–756, 2011.

[35] X. Zheng et al., "Deep learning in medical imaging: Challenges and opportunities," AI in
Medicine, vol. 112, pp. 101984, 2021.

[36] A. Smith et al., "Generalizability of neural networks in healthcare," Medical Informatics


Today, vol. 28, no. 5, pp. 256–269, 2020.

[37] F. Isensee et al., "Self-configuring medical segmentation using nnU-Net," Nature Methods,
vol. 18, pp. 203–211, 2021.

[38] K. Johnson et al., "Medical imaging with deep learning: A primer," IEEE Transactions on AI
in Medicine, vol. 34, pp. 12–18, 2019.

[39] L. Smith and J. Tang, "Exploring transfer learning in healthcare," Computational Healthcare
Insights, vol. 15, pp. 45–60, 2022.

58
[40] H. Williams, "Emerging AI models in clinical diagnostics," Clinical Imaging, vol. 47, pp.
1023–1032, 2021.

59
PUBLICATIONS

1. Vikas Maurya, and Abdul Aleem, “Towards Secure and Efficient Brain Tumor Detection:
Federated Learning for Privacy-Preserving MRI Analysis”, To be Published as Book
Chapter (via Conference Confluence-2025), in book entitled “Recent Trends in Artificial
Intelligence and Data Sciences - Select Proceedings of the 15th International Conference—
CONFLUENCE 2025”, for Book Series “Lecture Notes in Electrical Engineering”,
Springer Singapore, 2025 (SCOPUS Indexed).
2. Vikas Maurya, Abdul Aleem “Efficient Brain Tumor Detection in MRI Images Using
YOLOv8 A Deep Learning Approach” International Conference on Artificial Intelligence
and Computer Vision in Medical Domain (AICVMD-2025). BHU, Varanasi, India, 17-19
February 2025 (SCOPUS INDEXED).
3. Vikas Maurya, Vikash Kumar Mishra, and Abdul Aleem, “Brain Tumor Detection Using
Image Segmentation Through Adaptive K-Means Algorithm”, Accepted to be Published
in Proceedings of International Conference on Futuristic Aspects in Science & Engineering
(ICFAiSE-2025), ICFAI University, Jaipur, India, 6-7 February 2025 (SCOPUS Indexed).

60
61
62
63
64

You might also like