Unified Deep Ensemble Approach for Brain Tumor
Detection and Classification using CNN YOLO
CH.Anwar Babu1 A.Karthik Sai2 K.Anitha3
Department of Computing Technologies, Department of Computing Technologies, Department of Computing Technologies,
School of Computing, School of Computing, School of Computing,
SRM Institute of Science and SRM Institute of Science and SRM Institute of Science and
Technology, Technology, Technology,
Kattankulathur, Kattankulathur, Kattankulathur,
[email protected] [email protected] [email protected] Abstract—Brain tumor detection as well as classification are last few years. CNNs have been very effective in learning
im- portant medical diagnostic tasks for enhancing early spatial hierarchies from medical images and thus contributing
detection and accurate treatment. This deployment utilizes a to better performance on works like classification of the images
hybrid deep learning method to improve the accuracy as well as
the efficiency of the tumor detection in brain and classification of as well as the segmentation of the images[3]. Moreover,
type of tumor from given MRI scanned copies. Two datasets reliance on a single architecture might not prove to be
were taken a classification dataset from Kaggle and a detection effective in understanding the intricate nature of brain tumors
dataset from Roboflow. For classification, several convolutional with varying shapes, sizes, positions, and patterns of
neural network architectures were utilized, such as ResNetV2, intensity [4]. In order to overcome these shortcomings,
EfficientNetB0 to B4, Xception, and NASNet-Large. These models
were trained separately and then ensembled using an ensemble hybrid deep learning models that take advantage of the merits
method to enhance robustness and accuracy. For detection, object of different networks have been considered a viable
detection models like YOLOv5, YOLOv8, YOLOv9, SSD, and solution. This paper suggests a hybrid deep learning
Faster R-CNN were used to precisely detect tumors in the architecture that unifies the task of classification and detection
MRI scans. The models were trained on independent train with the help of many state-of-the-art models like EfficientNet,
and test splits to provide valid performance assessment. A
web interface was implemented utilizing the Flask framework, ResNetV2, Xception, NASNet-Large, and detection models
coupled with SQLite for safe user authentication utilizing signup like YOLOv5, YOLOv8, YOLOv9, SSD, and Faster R-CNN
and signin modules. When the user uploads the image, resizing [5]. The system is trained on two domain-specific data sets:
and conversion of the image into an array utilizing Keras are one classification data set from Kaggle and another detection
carried out as preprocessing steps. The preprocessed input is then
data set from Roboflow. Utilizing ensemble learning and pow-
fed into the trained models to produce the predictions. The multi-
model ensemble classification model had the highest accuracy erful object detection abilities, the target model will improve
rate of 99.0 the prediction accuracy, recall, and mean average precision
Keywords: Deep Learning, Brain Tumor, Classification, (mAP) with the aim of achieving precise tumor classification
Detection of the type, YOLO, CNN, accuracy, Precision, as well as precise localization [6]. Additionally, the suggested
Ensemble Model, MRI Images. . system is administered by web-based framework made by the
Flask and integrated with SQLite for secure user login,
I. INTRODUCTION
providing real-time use and access in healthcare settings [7].
Brain tumors are one of the most serious as well as life- Apart from speeding up the diagnosis, this auto-pipeline
threatening disorders, and they pose important survival and eliminates the necessity for expert interpretation, providing
quality-of-life issues in patients. Early and accurate diagnosis an efficient and scalable solution for radiologists and clin-
is crucial to assist in timely intervention and better clinical icians. In summary, use of these deep learning models
outcomes [1].. Traditionally, detection of brain tumors has which are hybrid for detection of tumor and diagnosing it is
relied chiefly on radiologists’ visual examination of MRI very much needed in transforming medical imaging. The
scans.Though effective, this approach is time-consuming by research highlights the potential of ensemble CNNs and
nature, subjective, and prone to human error, which could re- real-time detection models to present interpretable yet
sult in misdiagnosis or delayed intervention [2]. A demand for precise results, thus resulting in
automated diagnostic tools that can provide fast, accurate, and early diagnosis as well as enhanced patient care [8].
reproducible results in clinics is hence increasingly felt. DL,
more specifically the Convolutional Neural Network (CNN), II. LITERATURE REVIEW
has completely transformed the task of medical imaging over Taher et al. [9] proposed an effective framework to detect
the brain tumors through numerous deep learning methods. Their
research aimed at the automation of the classification of contributed to boosting the overall model accuracy. They
the tumor in brain via transfer models and convolutional observed that transfer learning as well as the data
neural networks. Through comparison of several architectures, augmentation performance have improved overall, yet there
they proved that deep learning models improve accuracy was still room for further enhancement due to the cross-
and efficiency significantly over conventional methods. Their transformer network having high representation power.Their
contributions involved preprocessing techniques that improve focus was on testing how each strategy contributed to a higher
quality of diagnosis because of the ability of our model to find overall model
the very minute features from the provided MRI scans. Reyes accuracy. They determined that the transfer learning and the
and Sa´nchez[10] from magnetic resonance imaging methods data augmentation improves the overall performance
investigated how much accuracy does these convolutional substantially, yet the cross- transformer network had potential
neural networks have for brain tumor detection for extension be- cause of its powerful representation learning
andclassification.The article focused on the significance of ability. Banerjee et al. [16] proposed the model of deep
architecture selection in influencing model accuracy. The radiomics for the detection as well as the classification of
authors compared multiple CNN models and stressed how brain tumors from multi-sequenceof the MRI’s. Their
deeper models such as ResNet and DenseNet are superior to approach extracted deep features from multiple MRI
shallow networks in identifying intricate tumor patterns. The modalities, combining the abundant tumor-related information.
article emphasized the use of high-quality, labeled datasets in Their approach could deliver more precise and subtle classifi-
training to enhance generalization and minimize overfitting. cation outcomes using the collective knowledge of multiple
Rasheed et al. [11] suggested combining convolutional neural image sequences. They also proposed that deep radiomics
networks and atten- tion mechanisms for improving MRI-based can potentially contribute significantly to enhancing diagnostic
brain tumor classi- fication. Their system used attention layers accuracy, especially when used in conjunction with CNNs and
to direct the model to learn from the most important parts of other deep learning techniques.
the image. This combination facilitated the interpretability and
confidence in predictions of the model. They demonstrated that III. METHODOLOGY
attention-augmented CNNs both enhance classification accuracy and The suggested system applies a hybrid method of deep
enhance medical experts' understanding of the decision-making learning for detection as well as the classification of tumors in
process. Shah et al. [12] presented effective approach methods in brain brain from the uploaded MRI copies. Two expert datasets are
tumor detection using fine-tuned EfficientNet models. It demonstrated applied: a Kaggle classi- fication dataset and a Roboflow
the higher performance of EfficientNet over other CNN architectures detection dataset. In classifica- tion, various state-of-the-art of
due to its compound scaling method that optimally traded off network the trained CNN models have been employed, such as
depth, width, and resolution. They even suggested preprocessing ResNetV2 [20], EfficientNetB0 to B4 [19], Xception [18], and
methods and training mechanisms that further enhanced the NASNet-Large [17].An ensemble model aggregates the
accuracy and stability of the model. EfficientNet was advantages of these networks to enhance prediction accuracy.
experimentally proven for medical image classification as it For detection, high-performance YOLOv5, YOLOv8,
was light but comparably strong in terms of architecture. Ullah YOLOv9, SSD, Faster R-CNN are accurately used, all these
et al. [13] developed TumorDetNet, a deep learning model with are object detection models for label tumor areas. The models
the capa- bility for both brain tumor detection and are trained and tested on different train-test splits to prevent
classification. Their deep learning model combined the overfitting. A web-based UI is implemented using the Flask
detection and classification of tumors into one framework and framework, along with SQLite for user authentication. Input
saved computation time while achieving high accuracy. images are resized and converted to arrays by Keras before
TumorDetNet used transfer learning from pre-trained models being fed into the trained models. The system aims to deliver
and added new layers to enhance feature extraction. The paper live tumor classification and localization via a secure and
concentrated on the advantages of having a unified model for user-friendly platform. The architecture of the brain tumor
joint detection and classification, particularly for real-time detection system
diagnosis where both accuracy and speed are most important.
Malla et al. [14] proposed a model on CNN for the MRI brain
image’s classification of tumors with worldwide pooling
averages to reduce the amount of trained values and avoid
overfitting.Their model focused on creating a simple but
effective model that could be used in the clinic. Using their
strategy to reduce complexity and improve efficiency, they
lowered the computational resource requirements for
maintaining high performance. Anaya-Isaza et al. [15] utilized
neural networks, transfer models, data augmentation methods,
and the cross-transformer network to improve the detection Fig. 1. Proposed Architecture
and classification of brain tumors on MRI scans. Their focus
is how each strategy starts with a dataset of labeled MRI images. The images are
processed for quality improvement and noise removal through
image processing. Features related to them are extracted using
data extraction by deep learning models. The processed data is
used as input in a training phase to create trained models that to its depthwise separable convolutions, which enhance model
can classify and detect accurately. Finally, performance of the performance by focusing on efficient feature extraction and
system is measured through performance measuring criteria learning complex image patterns [18]. It helps in extracting
like accuracy, precision, recall, and mAP in order to ascertain finegrained tumor features in MRI scans for enhanced diag-
strength and stability of the system under real-time clinical nostic accuracy.
use. NASNet-Large: NASNet-Large is employed for classifica-
a) Collection of Dataset: First, there is loading and explo- tion based on its architecture, which is optimized by means of
ration of two primary datasets, one dataset for classification neural architecture search. The model is capable of identifying
and one dataset for detection. Kaggle’s classification dataset complex patterns in MRI images [17], with high accuracy in
consists of MRI images of brain tumors segregated based on distinguishing between tumor and non-tumor classes by virtue
different tumor types. The Roboflow detection dataset is for of its adaptive nature.
object detection and contains images with annotations where Ensemble: The ensemble approach aggregates the predic-
the tumors have boxes drawn around them. Distribution of tions of EfficientNet, Xception, and ResNetv2 models and
data is verified here to balance classes. Metadata like image yields better classification accuracy. By aggregating the output
size, tumor types, and annotation schemes are inspected. This of heterogeneous models, it avoids overfitting and bias and
allows for detecting missing or corrupted files and sets data provides more stable and accurate prediction to diagnose
ready for further preprocessing and model training. tumors.
b) Pre-Processing: Image Processing: Preprocessing im- YOLOv5: YOLOv5 is utilized in object detection for the
ages is important for improving model precision and main- application of real-time tumor region localization of MRI
taining consistent input formats. Through ImageDataGenerator images. Since it is fast and accurate in the localization of
in Keras, transformations such as rescaling pixel values to tumor regions, it is ideal for the detection and classification
[0, 1], shear transformation to mimic distortion, zooming for of tumors in a single pass with fewer false positives.
depth change, horizontal flip for generalization, and reshaping YOLOv8: YOLOv8 provides improved detection of brain
to the desired input size are performed. Also, libraries such tumors through the use of higher processing speed and ac-
as OpenCV and Pillow are employed for the resizing of curacy compared to previous versions of YOLO.YOLOv8
images to normalized sizes, generation of square backgrounds performs better to deliver fast and accurate localization of
to preserve aspect ratios, and transformation of images into tumors for application in clinical cases.
tensors. Such operations contribute to widening the diver- YOLOv9: YOLOv9 is used for detection since it has
sity of the training data, preventing overfitting, and ensuring improved the performance of the previous models in handling
compatibility of image formats with deep learning model different sizes and types of tumors. With its highly optimized
specifications. Data Extraction to Train and Test Set: Once architecture, it effectively identifies tumors in MRI images
preprocessing is completed, the dataset is split into test and with high accuracy and low latency for real-time use.
training sets using which model performance will be tested. SSD (Single Shot Multibox Detector): SSD is used for
This separation allows for model to gain from patterns from
tumor detection, providing efficient and precise object local-
training set so it can be tested on unseen data independently.
ization in images. Its multi-scale feature maps enable detection
The most common ratio used is an 80:20 or 70:30 split ratio.
in a range of object sizes, making it effective for identifying
Care is taken to ensure class balance between both sets,
tumors with varying sizes in medical imaging. Faster R-CNN:
particularly for classifying tasks, so that every tumor category Faster R-CNN is employed for tumor detection, yielding
is well rep- resented. For detection, respective annotations are
high accuracy for both localization and classification tasks.
also divided. Such segregation enables proper measurement of
Utilizing region proposal networks, it can better detect brain
metric such as accuracy, precision, and recall, in addition to
tumors in MRI scans, guaranteeing precision under real-world
avoiding data leakage and facilitating strong model diagnostic applications.
verification.
c) Algorithms:
Classification: IV. RESULTS AND DISCUSSION
ResNetv2: ResNetv2 is employed in classification tasks Accuracy: Accuracy of the test is power to distinguish
through the utilization of its deep residual networks to ef- healthy cases and the patient appropriately. It is the proportion
fectively deal with intricate patterns within MRI images [20]. of true the positive along with the true negative among all
It enhances accuracy by avoiding vanishing gradient issues, calculated instances.
ensuring deeper layers make valuable contributions to model
performance without overfitting. TN+TP
Accuracy = (1)
EfficientNetB0 to B4: EfficientNet models (B0 to B4) TN+ FN+ TP + FP
are utilized for classification because they can balance model
depth, width, and resolution. These models offer high accu- Precision: Precision measures ratio of the instances correctly
racy with fewer parameters, making them effective for brain classified as the positive from all the predicted positive values
tumor [19] feature detection without increasing computational TP
expenditure. Xception: Xception is used for classification due precision = (2)
TP + FP
Recall: Recall refers to a model's ability to retrieve all Accuracy is shown in blue, precision in orange, recall in
relevant cases in a data set. grey, and F1-score in yellow in Figure 2. Ensemble Model
(3) outperforms the others.
Precision is blue, recall is orange, and mAP is grey in
Figure 3. YOLOv9 surpasses all other models in detection
F1-Score: F1 Score is more of the harmonic mean of
performance.
precision and recall, thus is a balanced measure.
(4)
MAP: Mean Average Precision (mAP) is used to evaluate
object detection performance, which is the average of the AP
(Average Precision) across all classes.
1Σn
mAP = APk (5)
n
k=1
Table I ,Compares performance measures Recall, Preci-
sion, F1-Score, Accuracy, for each classification algorithm.
The Ensemble Model consistently performs best.
Table II,Compares the detection performance Precision, Fig. 3. Overall Comparison – Detection Metrics
Recall, and mAP for all object detection models. YOLOv9
performs highest in all metrics.
TABLE I
PERFORMANCE EVALUATION – CLASSIFICATION MODELS
ML Model Accuracy Precision Recall F1 Score
ResNet50 0.733 0.712 0.733 0.692
EfficientNetB0 0.433 0.311 0.433 0.309
EfficientNetB1 0.321 0.315 0.321 0.258
EfficientNetB2 0.455 0.312 0.455 0.362
EfficientNetB3 0.481 0.483 0.481 0.476
EfficientNetB4 0.498 0.405 0.498 0.435 Fig. 4. Upload Input Image – Classification
Xception 0.986 0.986 0.986 0.986
NASNetMobile 0.979 0.979 0.979 0.979
Ensemble 0.990 0.991 0.990 0.990
TABLE II
PERFORMANCE EVALUATION – DETECTION MODELS
ML Model Precision Recall mAP
YOLOv5 0.953 0.909 0.964
YOLOv8 0.926 0.881 0.943
YOLOv9 0.964 0.939 0.974
SSD 0.224 0.424 0.391
Fig. 5. Predicted Output – Classification
Faster R-CNN 0.211 0.421 0.368
Fig. 6. Upload Another Image – Classification
Fig. 2. Overall Comparison – Classification Metrics
classification was carried out with architectures such as
ResNet50, EfficientNetB0–B4, Xception, and NASNetMobile,
enhanced with an ensemble approach for stability.
For detection, object detection networks like YOLOv5,
YOLOv8, YOLOv9, SSD, and Faster R-CNN were employed
to accurately localize tumor regions. The ensemble model
achieved the best classification accuracy of 99.0YOLOv9
showed the highest detection performance with a precision
Fig. 7. Predicted Output – Classification of 96.4
The web-based interface built with Flask supports secure
login, real-time uploads, and Keras-powered predictions. Fu-
ture upgrades may include real-time cloud deployment, deeper
ensemble strategies (e.g., blending Xception with NASNet),
and integration of Explainable AI for better clinical adoption.
Fig. 8. Upload Input Image – Detection
Fig. 9. Predicted Output – Detection
Fig. 10. Upload Another Image – Detection
Fig. 11. Output Screen – Detection
V. CONCLUSION
The system successfully demonstrates hybrid deep learning
models for accurate brain tumor classification and detection
from MRI scans. Using two datasets (Kaggle and Roboflow),