Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views10 pages

Deep Learning Predictive Model For Colon Cancer

This research presents a deep learning predictive model using CNN-based classification to analyze imaging data of colon cells for colon cancer diagnosis. The study employs various CNN models, including MobileNetV2, achieving an accuracy of 99.67% with a data loss rate of 1.24. The findings highlight the potential of AI in automating cancer diagnosis, improving early detection, and ultimately enhancing patient survival rates.

Uploaded by

Syeed Talha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views10 pages

Deep Learning Predictive Model For Colon Cancer

This research presents a deep learning predictive model using CNN-based classification to analyze imaging data of colon cells for colon cancer diagnosis. The study employs various CNN models, including MobileNetV2, achieving an accuracy of 99.67% with a data loss rate of 1.24. The findings highlight the potential of AI in automating cancer diagnosis, improving early detection, and ultimately enhancing patient survival rates.

Uploaded by

Syeed Talha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

(IJACSA) International Journal of Advanced Computer Science and Applications,

Vol. 12, No. 8, 2021

Deep Learning Predictive Model for Colon Cancer


Patient using CNN-based Classification
Zarrin Tasnim1 Sovon Chakraborty2 Ali Newaz Chowdhury4
F. M. Javed Mehedi Shamrat3 Department of Computer Science Humaira Alam Nuha5
Md. Masum Billah8 and Engineering Sabrina Binte Zahir7
Department of Software Engineering Daffodil International University Department of Computer Science
Daffodil International University Dhaka, Bangladesh and Engineering, Ahsanullah
Dhaka, Bangladesh University of Science and
Technology, Dhaka, Bangladesh

Asif Karim6
Member, IEEE

Abstract—In recent years, the area of Medicine and issue, countries must make significant investments in public
Healthcare has made significant advances with the assistance of health, establish a large number of labs and pathology centres
computational technology. During this time, new diagnostic with the requisite technology, and educate more people to
techniques were developed. Cancer is the world's second-largest perform diagnostic operations. Furthermore, keeping the costs
cause of mortality, claiming the lives of one out of every six of these examinations within reach of those who are poor is
individuals. The colon cancer variation is the most frequent and necessary. Finding new techniques for diagnosing cancer will
lethal of the numerous kinds of cancer. Identifying the illness at give a genuine chance of survival.
an early stage, on the other hand, substantially increases the odds
of survival. A cancer diagnosis may be automated by using the
power of Artificial Intelligence (AI), allowing us to evaluate more
cases in less time and at a lower cost. In this research, CNN
models are employed to analyse imaging data of colon cells. For
colon cell image classification, CNN with max pooling and
average pooling layers and MobileNetV2 models are utilized. To
determine the learning rate, the models are trained and
evaluated at various Epochs. It's found that the accuracy of the
max pooling and average pooling layers is 97.49% and 95.48%,
respectively. And MobileNetV2 outperforms the other two
models with the most remarkable accuracy of 99.67% with a
data loss rate of 1.24.

Keywords—Colon cancer; MobileNetV2; Max pooling; Average


pooling; data loss; accuracy

I. INTRODUCTION
Cancer refers to a category of illnesses in which abnormal
cells develop within the human body as a result of random
mutations. When these cells are formed, they divide
abnormally and spread throughout the organs. If left untreated,
most cancers will eventually kill their victims. Fig. 1A, which
shows the 4-tier Human Development Index (HDI) based on
the UN's 2019 Human Development Report, shows how much
cancer's position as a cause of early death corresponds with
nation levels of social and economic development.
In rare situations, a person inherits from their parents the
faulty gene that causes cancer. Regular checks are required for
those who are at risk of getting hereditary malignancies. Many
Fig. 1. (A) The Four-Tiered Human Development Index (HDI) and (B) the
individuals cannot afford these diagnostic procedures since 20 World Regions. The Legend Includes the Population Sizes for Each
they are expensive. Cancer is responsible for over 70% of Population. Source: United Nations Development Program/United Nations
fatalities in poor and middle-income nations [1]. To meet this Procurement Division. Source: World Health Organization (WHO).

687 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

brain are described in DL, a sub-field of ML [3]. DL uses


artificial Neural Networks (ANNs) to improve pattern
recognition skills. Above all, it is clear that AI has given the
area of medical diagnostics a new dimension, and it is
increasingly replacing old diagnostic procedures as a viable
alternative [5 -7].
The rest of the paper is organized as follows. Section II
provides a comprehensive summary of the many ML
approaches utilized in colon cancer diagnosis. Section III
provides an overview of the contents of the employed dataset
and the method used for the classification purpose and
techniques required to build this model. Moreover, it contains
the criteria on which the performance of the model will be
measured. Section IV elucidates the outcome of the model.
Comparison of the result that different stages of the model's
Fig. 2. Colon Cancer Polyps. learning process are described in brief. Finally, Section V gives
a summary of the work described in this article, along with
Most cancers have five stages, according to the Tumor- some scopes of further research.
Node-Metastasis (TNM) classification devised and maintained
by the American Joint Committee on Cancer (AJCC): 0, Stage II. RELATED WORK
I, Stage II, Stage III, and Stage IV [2]. The four stages of colon
cancer are shown in Fig. 2. The approach considers a number In the past three decades, several supervised learning
of parameters, including the main tumor's size and location, the algorithms have been created, and they are quite good at
amount of its dissemination to lymph nodes and other organs, dealing with biological data. Toraman et al. in [8] presented
and the existence of any biomarkers that impact cancer spread. research aimed at classifying the probability of colon cancer
At certain phases, the odds of survival fluctuate dramatically. using Fourier Transform Infrared (FTIR) spectroscopy signals.
In the case of colon cancer, for example, more than 93% of The authors collected various statistical characteristics from the
persons between the ages of 18 and 65 may survive with signals and then used SVM and ANN to categorize them,
effective treatment if they are discovered at Stage 0; however, yielding a classification accuracy of 95.71 % for ANN. Liping
survival rates at the later stages are 87%, 74%, and 18%, Jiao et al. [9] used the Gray-Level Cooccurrence Matrix
respectively [3]. The possibility of survival for colon cancer (GLCM) method to extract eighteen ordinary characteristics,
patients drops from 70% at Stage 0 to a terrifying 13% at Stage including grayscale mean, grayscale variance, and 16 texture
IV. As previously said, there is no sure therapy for cancer, thus features. On 60 colon tissue images partitioned evenly into the
the sooner a person is detected, the more time physicians have two groups, an SVM-based classifier obtained accuracy, F1-
to design a treatment plan for the patients, the greater chance score, and recall of 96.67%, 83.33%, and 89.51%, respectively.
they get of surviving the condition. Early detection and early S. Rathore et al. [10] developed a feature extraction method
treatment are presently the only ways to prevent cancer-related that mathematically mimics the geometric properties of colon
fatalities [4]. However, most of the population lacks access to tissue components. A hybrid feature set is created by
competent diagnostic facilities, making the fight against this combining conventional features such as morphological,
deadly illness even more difficult. texture, SIFT, and elliptic Fourier descriptors. SVM is then
applied as a classifier on 174 colon biopsy pictures, with an
In the field of diagnostics, AI has shown tremendous accuracy of 98%. Yuan et al. [11] described a DL technique for
promise and provided us with a viable alternative to automatically detecting polyps in colonoscopy films. The
conventional diagnostic approaches. Currently, diagnosing an authors utilized AlexNet, a well-known CNN-based
illness entails obtaining samples from a patient, executing a architecture, for classification, which resulted in a
series of tests on those samples, putting the findings into an classification accuracy of 91.47 %. In [12], Babu et al.
understandable format, and enlisting the help of a skilled expert presented an RF-based classification algorithm for predicting
to make judgments based on those findings. Now, if the the existence of colon cancer based on histological cancer
samples taken from a patient are digital or have been images. First, the R-G-B images are transferred to the HSV
digitalized somehow, machines can evaluate those. These data plane. Then wavelet decomposition for feature selection is used
may then offer them a package of data comprising previous to obtain a maximum classification accuracy of 85.4 % by
judgments on comparable circumstances. Finally, instructions varying the degree of image magnification. Mo et al. utilized a
are to be provided on how to detect the disorders that the new Faster R-CNN-based approach to identify colon cancer in [13].
patient has. In machine learning, supervised learning refers to The authors utilized a joint approximation optimization, which
making judgments based on information obtained from past may optimize classification and regression losses
experiences. Different forms of biological signals have been simultaneously. In [14], Urban et al. developed a technique for
classified and predicted using machine learning methods. detecting polyps in colonoscopy images with 96%
Machines can now analyze high-dimensional data such as classification accuracy. The authors hand-labeled 8641
images, multidimensional anatomy images, and video thanks to colonoscopy images from 2000 individuals and used them to
the advent of Deep Learning (DL) algorithms. The learning train a CNN model. They next tested their technique on 20
algorithms inspired by the structure and function of the human colonoscopy films totaling five hours in length. Akbari et al.

688 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

developed a CNN-based classification approach with binarized III. METHODOLOGY


weights in [15] to detect colorectal cancer from colonoscopy Image data of colon cells were used in the proposed method
films. The approach was tested using data from the Asu Mayo to detect colon cancer. The images are then labeled in order to
Test Clinic database and obtained over 90% classification determine which cells cause cancer. The prediction is made
accuracy. Masud et al. [16] inscribe a classification framework using the MobileNetV2 classifier. Fig. 3 illustrates the system's
to distinguish colon tissues (two benign and three malignant) total flow diagram.
by evaluating their histological pictures using CNN and Digital
Image Processing (DIP) methods. The obtained findings A. Data Description
indicate that the proposed framework can detect cancer tissues Kaggle.com was used to gather the dataset. There are
with an accuracy of up to 96.33 %. Garg et al. in [17] used and 25000 images in the dataset. The images are 768 x 768 pixels
modify an existing pre-trained CNN-based model to detect in resolution and JPEG format. In the dataset, there are two
lung and colon cancer using histopathology pictures and classes, i.e.
improved augmentation methods. On the LC25000 dataset,
eight different Pre-trained CNN models, VGG16, 1) Colon adenocarcinoma (cancerous).
NASNetMobile, InceptionV3, InceptionResNetV2, ResNet50, 2) Colon benign tissue (not cancerous).
Xception, MobileNet, and DenseNet169, are trained. Precision,
recall, f1-score, accuracy score are used to evaluate model Of all the images in the dataset, 12,500 images are of colon
performance. The findings show that all eight models achieved cancer cells, as shown in Fig. 4(a, b). Fig. 4(c, d) shows the
notable outcomes ranging from 96% to 100% accuracy. sample of the rest of the cell images without colon cancer.

In the proposed study, authors tested image data for colon B. Environment Setup
cells obtained from online data sources to detect colon cancer. Tensorflow and the Keras library were used to carry out
They are using the Transfer learning model MobileNetV2. The this analysis. Tensorflow is a free, open-source Python library
process contains two CNN layers, Max Pooling, and average for performing large-scale machine learning calculations.
pooling. The image data goes through a number of Tensorflow is used extensively in artificial neural networks and
preprocessing steps to give a better classification outcome. The is used in Keras' backend.
performance of the model is evaluated based on the confusion
matrix.

Fig. 3. Proposed Model Processed Diagram.

689 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

1) Max pooling layer: It is a pooling operation that selects


the maximum element from the feature map area covered by
the filter. By decreasing the number of pixels in the output,
max-pooling lowers the dimensionality of pictures [19]. The
following Fig. 5 is our study model based on the Max pooling
Layer:
2) Average pooling layer: It is a pooling operation that
selects the average element from the filter's covered area of
the feature map. Average pooling counts all values and passes
them on to the next layer, implying that all values are utilized
for feature mapping and output generation, which is a
comprehensive calculation [20]. Fig. 6 is our study model,
which is based on the Average Pooling Layer.
3) MobileNetV2 classifier: MobileNetV2 model has 32
filters on its initial fully convolution layer. There are 19
bottleneck layers that remain. It is used in the classification of
Fig. 4. Sample Images of (a, b) Colon Cancer Cells, (c,d) Healthy Colon images [21]. MobileNetV2 introduces two new kinds of
Cells.
blocks.
C. Data Preprocessing i. Downsizing block of 2 stride.
To make sure the image data are fit to be used to train and ii. Residual block of stride 1.
test the classifier, preprocessing is done. Raw data has to be
All blocks are made up of three layers. With 1X1
preprocessing according to the use of the study. Following are:
convolution, the ReLU6 activation mechanism is used in the
• To expand the volume of the dataset, first layer. On the second sheet, a depth wise is added, and the
ImageDataGenerator class in Keras library is used to third layer is also a 1X1 convolution, except for some non-
create augmented images using the attributes in Table I. linearity. The activation mechanism of ReLu is often included
in the third layer. The architecture of the model is illustrated in
TABLE I. IMAGEDATAGENERATOR ATTRIBUTES Fig. 7.
Rotation Range 20
Zoom_Range 0.15
Width_Range 0.2
Height_Range 0.2
Shear_Range 0.15
Horizontal_Flip True
Vertical_Flip True
Mode Nearest

• Images resized to 224 X 224 pixels.


• LabelBinarizer() is used to assign unique values to each
label in categorical features.
• The image data is converted to a NumPy array. Fig. 5. Two Convolution Layer with Max Pooling Action.

D. CNN Classifier
CNN is an example of a Deep Learning algorithm that
takes an input image and assigns priority to different aspects of
the image, allowing it to distinguish one image from another
based on its features. In this system, two convolutional layers
in the CNN model are used where each convolutional layer
used convolutional 2D. In both convolutional 2D layers, 'Relu
activation' is utilized. For complete connectivity, two Dense
Layers are used. 'Relu activation' for the first dense layer and
'Sigmoid activation' for the second dense layer is used. Aside
from these layers, there are several hidden layers, as well as an
input layer. In this study, two pooling layers: Max Pooling 2D
Fig. 6. Two Convolution Layer with Average Pooling Action.
and Average Pooling 2D, are implemented [18]. Finally, for
the classification of image data MobileNetV2 classifier is used.

690 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

TABLE II. ACCURACY OF OUTCOMES IN MAX POOLING LAYER FOR


DIFFERENT EPOCHS

Training Training Test data Test


Epoch
Loss Accuracy Loss Accuracy
1 1.7743 0.5232 0.6825 0.5276
2 0.6733 0.5338 0.6610 0.5528
3 0.6535 0.5603 0.8085 0.5126
4 0.6317 0.6066 0.6061 0.6884
5 0.6292 0.6066 0.5827 0.6482
6 0.5984 0.6583 0.5451 0.7538
7 0.6000 0.6609 0.5580 0.7136
8 0.5875 0.6808 0.5037 0.7789
9 0.5531 0.6781 0.4449 0.8141
10 0.5364 0.7205 0.4406 0.8342
11 0.5450 0.7325 0.3826 0.8241
12 0.5221 0.7364 0.3723 0.8442
13 0.5113 0.7457 0.4109 0.8442
Fig. 7. MobileNetV2 Architecture. 14 0.4927 0.7457 0.3840 0.8291
15 0.4860 0.7616 0.3745 0.8342
E. Performance Evaluation
16 0.7510 0.7510 0.3551 0.8693
After the training and testing process, the performance is 17 0.4967 0.7536 0.3428 0.8392
evaluated using specificity, recall, precision, accuracy and f1- 18 0.4662 0.7656 0.3323 0.8392
score. Eq. 1, 2, 3, 4 and 5 are the equations used for the task. 19 0.4205 0.7960 0.3235 0.8593
TN 20 0.4362 0.7775 0.2994 0.8995
Specificity = (1) 21 0.4114 0.8106 0.2682 0.8744
TN + FP
TP 22 0.4127 0.8013 0.2726 0.8693
Sensitivity or recall = (2) 23 0.3863 0.8066 0.3230 0.8191
TP+ FN
TP 24 0.3425 0.8464 0.2279 0.8995
Precision = (3) 25 0.3594 0.8278 0.2297 0.9095
TP+ FP
26 0.3644 0.8397 0.2388 0.8894
TP+TN
Accuracy = (4) 27 0.3101 0.8742 0.2190 0.9196
TP+FP+TN+FN
28 0.3074 0.8596 0.2089 0.9146
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑟𝑒𝑐𝑎𝑙𝑙
F1 − score = 2 × (5) 29 0.3329 0.8583 0.2119 0.9146
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑟𝑒𝑐𝑎𝑙𝑙
30 0.3069 0.8570 0.2064 0.9246
Here, the true positive (TP), true negative (TN), false 31 0.3153 0.8583 0.2890 0.8794
positive (FP), false negative (FN) are obtained from the 32 0.2862 0.8675 0.1678 0.9497
confusion matrix in Fig. 8. 33 0.2834 0.8768 0.2158 0.8995
34 0.2698 0.8861 0.2034 0.9296
35 0.2902 0.8861 0.1936 0.9347
36 0.2554 0.8940 0.1924 0.9296
37 0.2363 0.8993 0.3039 0.8995
38 0.2246 0.9099 0.1353 0.9397
39 0.2312 0.9086 0.1523 0.9648
40 0.2290 0.8887 0.1172 0.9598
41 0.2334 0.9086 0.1160 0.9497
Fig. 8. Confusion Matrix. 42 0.2321 0.9020 0.1153 0.9749
43 0.2002 0.9033 0.1717 0.9397
IV. EXPERIMENTAL RESULT AND ANALYSIS 44 0.2028 0.9192 0.0879 0.9749
45 0.2091 0.9192 0.1232 0.9648
A. Outcome of Max Pooling Layer
46 0.1759 0.9272 0.0756 0.9749
The training set contains 80% of the data from the dataset 47 0.1881 0.9245 0.0874 0.9648
and the rest 20% is in the test set. During the process of data 48 0.1636 0.9311 0.0718 0.9598
classification, 94.44% accuracy is obtained in the training data 49 0.1634 0.9444 0.0968 0.9548
set and 97.49% accuracy in the testing data set is obtained as 50 0.1898 0.9245 0.0802 0.9648
shown in Table II at the max pooling layer.
The data loss of the model in the training and testing
The accuracy of the max pooling model gradually increases dataset decreases rapidly with the number of epochs as
as the number of epochs increase as shown in Fig. 9. The illustrated in Fig. 10. The lowest data loss is found at epoch 48
training set reaches the highest accuracy at epoch 49, whereas for both training and test set.
the test set has the highest accuracy at epoch 46.

691 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

C. Outcome of Average Pooling Layer


In the average pooling model, the accuracy of 90.73% in
the training data set and 95.48% in the testing data set was
achieved. The record of the outcomes of all the epochs is
shown in Table III.

TABLE III. ACCURACY OF OUTCOMES IN AVERAGE POOLING LAYER FOR


Fig. 9. Test Accuracy and Training Accuracy for Max Pooling Layer at DIFFERENT EPOCHS
Different Epochs.
Training Training Test Data Test
Epoch
Data Loss Accuracy Loss Accuracy
1 2.0488 0.4795 0.6844 0.5075
2 0.6634 0.5152 0.6586 0.6382
3 0.6527 0.5854 0.6430 0.6181
4 0.6385 0.6159 0.6071 0.6935
5 0.6142 0.6477 0.5732 0.6884
6 0.5987 0.6821 0.6038 0.6884
7 0.5900 0.6834 0.5728 0.6985
8 0.5963 0.6609 0.4985 0.7588
Fig. 10. Test Loss and Training Loss with Max Pooling Layer at Different
Epochs. 9 0.5634 0.7020 0.4718 0.7990
10 0.5741 0.6821 0.4919 0.7789
1) MSE (Mean Square Error) and AUC: The following 11 0.5394 0.7192 0.4575 0.8141
12 0.5317 0.7311 0.4708 0.8291
MSE and AUC applying on the test data set using Max
13 0.5251 0.7272 0.3687 0.8744
Pooling Layer are achieved:
14 0.5115 0.7444 0.4035 0.8191
• MSE (Mean Square Error) of 0.0286 (Fig. 11) 15 0.5464 0.6980 0.4161 0.8442
16 0.5239 0.7457 0.3710 0.8593
• AUC of 0.9932 (Fig. 12) 17 0.4736 0.7470 0.3436 0.8392
18 0.4686 0.7722 0.4097 0.8040
19 0.4716 0.7589 0.2913 0.8693
20 0.4791 0.7656 0.2726 0.8945
21 0.4162 0.7907 0.2840 0.8794
22 0.4244 0.7974 0.2986 0.8794
23 0.4024 0.7881 0.4226 0.7940
24 0.4428 0.7868 0.2846 0.8995
25 0.4118 0.8000 0.2934 0.8643
Fig. 11. Test Mean-Square Error for Max Pooling Layer at Different Epochs.
26 0.4088 0.8119 0.2507 0.8894
27 0.3700 0.8318 0.2775 0.8945
28 0.3424 0.8411 0.2778 0.8794
29 0.3418 0.8490 0.2186 0.8794
30 0.3194 0.8583 0.2455 0.8744
31 0.3585 0.8331 0.3152 0.8543
32 0.3567 0.8503 0.2626 0.8744
33 0.3227 0.8636 0.2387 0.8995
34 0.3196 0.8649 0.2223 0.9095
Fig. 12. Test AUC with Max Pooling Layer at Different Epochs.
35 0.3223 0.8596 0.2035 0.9146
B. Confusion Matrix using Max Pooling Layer 36 0.3078 0.8662 0.2163 0.9196
37 0.3071 0.8768 0.2536 0.8794
Using 3000 image data, the confusion matrix is created for 38 0.3086 0.8781 0.2009 0.9397
max-pooling layer. The outcome of the matrix is as follow: 39 0.2727 0.8834 0.1955 0.9347
• True positives (TP): 1745.0000 40 0.2848 0.8887 0.2586 0.8794
41 0.3037 0.8728 0.1925 0.9045
• True negatives (TN): 1217.0000 42 0.2903 0.8795 0.2803 0.8794
43 0.2860 0.8781 0.1715 0.9347
• False positives (FP): 27.0000
44 0.3110 0.8768 0.1637 0.9397
• False negatives (FN): 11.0000 45 0.2678 0.8980 0.1365 0.9548
46 0.2254 0.9073 0.1471 0.9397
• Sensitivity or Recall= 0.993=99.3% 47 0.2393 0.8954 0.1299 0.9497
• Specificity= 0.9782=97.82% 48 0.2263 0.8967 0.1527 0.9447
49 0.2442 0.8887 0.1942 0.9146
• Precision= 0.983=98.3% 50 0.2169 0.9046 0.1492 0.9296

692 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

E. Confusion Matrix using Average Pooling Layer


The confusion matrix for the average pooling layer is built
using 2500 image data. The confusion matrix produced the
following results:
• True positives (TP): 14105.0000
• True negatives (TN): 9851.0000

Fig. 13. Test Accuracy and Training Accuracy in Average Pooling Layer for • False positives (FP): 466.0000
Different Epochs.
• False negatives (FN): 578.00
As shown in Fig. 13, the accuracy of the average pooling • Sensitivity or Recall = 0.9606 = 96.06%
model progressively improves as the number of epochs grows.
The highest accuracy for the test set is in the 46th epoch and the • Specificity = 0.9548 = 95.48%
training set is in the 45th epoch. • Precision = 0.9702 = 97.02%
The model's data loss in the training and testing datasets
• F1-Score = 0.9657 = 96.57%
reduces quickly with the number of epochs, as seen in Fig. 14
for the average pooling layer. F. Classification Outcome of MobileNetV2 Model
After loading the MobileNetV2 model, the top layer is
frozen and the weights from ImageNet are loaded. A custom
model is placed there, and the architecture is trained. The
AveragePooling2D operation is included in the model, and the
pool size is (7, 7). There is a 128-node hidden layer, and the
ReLU activation function is used to remove features correctly.
Because deep learning models are prone to overfitting, dropout
is used to select training images at random. All of
MobileNetV2's trainable layers are no longer used. The Adam
Fig. 14. Test and Training Data Loss in Average Pooling Layer For Different optimizer feature is used to better learn models from errors. By
Epochs. setting the trainable layer parameter to False, the base layers of
all transfer learning models were frozen. A customize trainable
D. MSE (Mean Square Error) and AUC layer consisting of one hidden layer with 128 neurons was
The following MSE and AUC were achieved by applying introduced at this stage. The Average Pooling operation was
the test data set on the Average Pooling Layer: applied where the pool size is (7,7). The process is shown in
Fig. 17.
• MSE (Mean Square Error) of 0.0588 (Fig. 15)
For the back-propagation process, the learning rate is set to
• AUC of 0.9753 (Fig. 16) 0.01. Binary cross-entropy is used to calculate the loss
function. SoftMax activation is included in the output layer and
is more accurate than other activation functions. Table IV
displays the training and test accuracy, as well as the data loss
rate.

Fig. 15. Test Mean-Square Error in Average Pooling Layer.

Fig. 17. Execution of MobileNetV2.


Fig. 16. Test AUC in Average Pooling Layer.

693 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

TABLE IV. TRAINING AND TSET ACCURACY WITH DATA LOSS OF


MOBILENETV2
Training Training
Epoch Tset Data Test
Data Loss Accuracy in
Number Loss in % Accuracy
% %
1 35.95 97.54 39.34 83.11
2 7.50 98.24 6.61 98.18
3 2.91 98.69 4.44 98.79
4 4.87 98.76 5.46 98.49
5 3.25 98.79 5.78 98.11
6 4.38 98.30 2.66 99.33
7 3.53 98.86 3.11 98.87
8 3.33 99.19 2.76 99.29
9 2.75 99.13 4.05 98.83 Fig. 20. Data Loss Curve for Test Data.
10 2.06 99.57 3.69 98.94
11 1.77 99.62 5.17 98.14
12 1.72 99.72 3.73 98.98
13 1.74 99.65 3.20 98.79
14 1.54 99.73 1.60 99.50
15 1.46 99.81 1.70 99.67
16 2.43 99.12 3.34 99.19
17 2.34 99.25 2.85 98.82
18 2.40 99.20 2.47 99.17
19 2.63 99.18 1.99 99.23
20 3.76 98.60 1.24 99.60

Maximum training accuracy is 99.81%, and the minimum


data loss is 1.46% in epoch 15 for the training set. The overall Fig. 21. Test Set Accuracy.
accuracy of the model is consistently high the data loss is
consistently low for the Training data, as shown in Fig. 18 The confusion matrix was used to assess results, and the
and 19. outcome represents the model's high accuracy on this dataset.
The performance calculation is demonstrated in Fig. 22.
As deep learning models learn faster with experience, data
loss decreases as the number of epochs increases. The data loss
at epoch 15 is 1.7% and the accuracy is 99.67% test set. The
gradual decrease of data loss and gain of accuracy is illustrated
in Fig. 20 and 21, respectively.

Fig. 22. Classification Performance of MobileNetV2.

Table V compares the findings obtained from the suggested


techniques of colon cancer cell categorization approaches.
Image data were utilized in the study's training and testing
purposes. MobileNetv2 outperforms the other two models
Fig. 18. Data Loss Curve for Training Data. (Max pooling and Average pooling) in terms of performance.
Based on the talks in this part, it is possible to infer that the
suggested models can perform the job of colon cancer tissue
categorization with excellent accuracy and reliability.

TABLE V. RESULTS OBTAINED FROM THE METHODS

Test
Training Training Test
Model Name Data
Accuracy Data Loss Accuracy
Loss

Max Pooling 94.44% 0.1634 97.49% 0.0756


Average Pooling 90.73% 0.2254 95.48% 0.1365
Fig. 19. Accuracy Curve for Training Data. MobileNetV2 99.81%, 1.46 99.6% 1.24

694 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

V. COMPARATIVE ANALYSIS VI. CONCLUSION


As previously stated (Section II), many research have been In recent years, machine learning and deep learning have
conducted to predict colon cancer using machine learning, deep had a significant impact on image processing, the medical
learning, and other methods based on various imaging data of industry, and a variety of other applications. The proposed
cancer cells. The Table VI below compares several methods approach takes around a minute to identify colon cancer from
utilized by researchers on different datasets with the proposed the input pictures. The goal of the study is to make this
method. Despite earlier research shown great accuracy in procedure as easy, quick, and real-time as feasible. The dataset
predicting colon, the suggested approach outperformed the utilized for training and testing includes both cancer cells and
prediction accuracy. healthy cells. Enhanced images were added to the dataset. In
this work, the CNN algorithm with max and average pooling
Though earlier work on the prediction of colon cancer cell layers, as well as a transfer learning MobileNetV2 model, are
has excellent accuracy, it is limited to models built on smaller used to identify colon cancer. It is observed that the CNN-
datasets. In the final prediction stage, the suggested model based Max Pooling and Average Pooling operations have high
outperformed the previously described studies. Furthermore, accuracy of 97.49% and 95.48%, respectively and the
the models in the research are trained and tested on a larger MobileNetV2 model has a high accuracy rate of 99.67%. In
dataset making it more efficient and reliable. future work, the model can be trained and tested using a more
TABLE VI. COMPARISON OF PREVIOUS METHODS AND THE PROPOSED
extensive dataset at the same time this model can be tested on
METHOD other cancer datasets for classification and prediction. The
study would be cooperated with medical researchers in
Studies Datasets Models Accuracy hospitals or clinics that handle colon cancer work in the future,
which would be beneficial for further application of this work
in the medical sector.
Max pooling 97.49%
REFERENCES
Proposed 12,500 images of colon cancer Average [1] Cancer. Available online: https://www.who.int/news-room/fact-
95.48%
study cells from kaggle pooling sheets/detail/cancer (accessed on 20 May 2021).
[2] Stages of Cancer | Cancer.Net. Available online:
MobileNetV2 99.67% https://www.cancer.net/navigating-cancer-care/diagnosing-
cancer/stages-cancer (accessed on 20 May 2021).
samples of 30 colon cancer [3] Cancer Survival Rates. Available online:
patients and 40 healthy subjects https://cancersurvivalrates.com/?type=colon&role=patient (accessed on
Toraman
were obtained from the ANN 95.71% 20 May 2021).
et al. [8]
Department of General Surgery [4] Sánchez-Peralta, L.F.; Bote-Curiel, L.; Picón, A.; Sánchez-Margallo,
of Fırat University F.M.; Pagador, J.B. Deep learning to find colorectal polyps in
colonoscopy: A systematic literature review. Artif. Intell. Med. 2020,
• ImageNet data set 108.
• 8641 colonoscopy images [5] F. M. Javed Mehedi Shamrat, P. Ghosh, M. H. Sadek, M. A. Kazi and S.
• 1330 colonoscopy images Shultana, "Implementation of Machine Learning Algorithms to Detect
from different patients. the Prognosis Rate of Kidney Disease," 2020 IEEE International
Urban et 96.4 ±
• colonoscopy first set of 9 VGG19 Conference for Innovation in Technology (INOCON), Bangluru, India,
al. [14] 0.3%
videos 2020, pp. 1-7, doi: 10.1109/INOCON50539.2020.9298026.
• Augmented dataset [6] P. Ghosh, F. M. Javed Mehedi Shamrat, S. Shultana, S. Afrin, A. A.
• Colonoscopy second set Anjum and A. A. Khan, "Optimization of Prediction Method of Chronic
of 11 videos Kidney Disease Using Machine Learning Algorithm," 2020 15th
International Joint Symposium on Artificial Intelligence and Natural
Language Processing (iSAI-NLP), Bangkok, Thailand, 2020, pp. 1-6,
Masud et employed doi: 10.1109/iSAI-NLP51646.2020.9376787.
LC25000 dataset 96.33%
al. [16] CNN
[7] Pronab Ghosh, Sami Azam, Mirjam Jonkman, Asif Karim, F.M. Javed
Mehedi Shamrat, Eva Ignatious, Shahana Shultana, Abhijit Reddy
hybrid feature Beeravolu, Friso De Boer, "Efficient Prediction of Cardiovascular
space based Disease Using Machine Learning Algorithms with Relief and LASSO
Rathore et
174 colon biopsy images colon 99.18% Feature Selection Techniques," in IEEE Access, doi:
al. [22] 10.1109/ACCESS.2021.3053759.
classification
(HFS-CC) [8] Toraman, S.; Girgin, M.; Üstündağ, B.; Türkoğlu, İ. Classification of the
likelihood of colon cancer with machine learning techniques using FTIR
Hamida et signals obtained from plasma. Turk. J. Electr. Eng. Comput. Sci. 2019,
CRC-5000 dataset SEGNET 98.66% 27, 1765–1779.
al. [23]
[9] Jiao, Liping & Chen, Qi & Li, Shuyu & Xu, Yan. (2013). Colon Cancer
Detection Using Whole Slide Histopathological Images. IFMBE
MFF-CNN Proceedings. 39. 1283-1286. 10.1007/978-3-642-29305-4_336.
Hematoxylin and Eosin
Liang et based on [10] S. Rathore, M. Hussain, and A. Khan, "Automated colon cancer
(H&E) stained human colon 96%
al. [24] shearlet detection using hybrid of novel geometric features and some traditional
tissue histopathology image
transform features," Comput. Biol. Med., 2015.

695 | P a g e
www.ijacsa.thesai.org
(IJACSA) International Journal of Advanced Computer Science and Applications,
Vol. 12, No. 8, 2021

[11] Yuan, Z.; IzadyYazdanabadi, M.; Mokkapati, D.; Panvalkar, R.; Shin, 2020 3rd Artificial Intelligence and Cloud Computing Conference.
J.Y.; Tajbakhsh, N.; Gurudu, S.; Liang, J. Automatic polyp detection in Association for Computing Machinery, New York, NY, USA, 38–45.
colonoscopy videos. Med. Imaging 2017 Image Process. 2017, 10133, DOI:https://doi.org/10.1145/3442536.3442543.
101332K. [18] S. Chakraborty, F. M. J. M. Shamrat, M. M. Billah, M. A. Jubair, M.
[12] Babu, T.; Gupta, D.; Singh, T.; Hameed, S. Colon Cancer Prediction on Alauddin and R. Ranjan, "Implementation of Deep Learning Methods to
Different Magnified Colon Biopsy Images. In Proceedings of the 10th Identify Rotten Fruits," 2021 5th International Conference on Trends in
International Conference on Advanced Computing (ICoAC), Chennai, Electronics and Informatics (ICOEI), 2021, pp. 1207-1212, doi:
India, 13–15 December 2018; pp. 277–280. 10.1109/ICOEI51242.2021.9453004.
[13] Mo, X.; Tao, K.; Wang, Q.; Wang, G. An Efficient Approach for Polyps [19] Akter, S. , Shekhar, H. and Akhteruzzaman, S. (2021) Application of
Detection in Endoscopic Videos Based on Faster R-CNN. In Biochemical Tests and Machine Learning Techniques to Diagnose and
Proceedings of the International Conference on Pattern Recognition Evaluate Liver Disease. Advances in Bioscience and Biotechnology, 12,
(ICPR), Beijing, China, 20–24 August 2018; pp. 3929–3934. 154-172. doi: 10.4236/abb.2021.126011.
[14] Urban, G.; Tripathi, P.; Alkayali, T.; Mittal, M.; Jalali, F.; Karnes,W.; [20] Lee, H., Park, J., & Hwang, J. Y. (2020). Channel attention module with
Baldi, P. Deep Learning Localizes and Identifies Polyps in Real Time multiscale grid average pooling for breast cancer segmentation in an
With 96% Accuracy in Screening Colonoscopy. Gastroenterology 2018, ultrasound image. IEEE transactions on ultrasonics, ferroelectrics, and
155, 1069–1078.e8. frequency control, 67(7), 1344-1353.
[15] Akbari, M.; Mohrekesh, M.; Rafiei, S.; Reza Soroushmehr, S.M.; [21] Toğaçar, M., Cömert, Z., & Ergen, B. (2021). Intelligent skin cancer
Karimi, N.; Samavi, S.; Najarian, K. Classification of Informative detection applying autoencoder, MobileNetV2 and spiking neural
Frames in Colonoscopy Videos Using Convolutional Neural Networks networks. Chaos, Solitons & Fractals, 144, 110714.
with BinarizedWeights. In Proceedings of the Annual International [22] Rathore, S., Hussain, M., & Khan, A. (2015). Automated colon cancer
Conference IEEE Engineering in Medicine and Biology Society detection using hybrid of novel geometric features and some traditional
(EMBS), Honolulu, Hawaii, 17–22 July 2018; pp. 65–68. features. Computers in biology and medicine, 65, 279-296.
[16] Masud, M.; Sikder, N.; Nahid, A.-A.; Bairagi, A.K.; AlZain, M.A. A [23] Hamida, A. B., Devanne, M., Weber, J., Truntzer, C., Derangère, V.,
Machine Learning Approach to Diagnosing Lung and Colon Cancer Ghiringhelli, F.,. & Wemmert, C. (2021). Deep learning for colon cancer
Using a Deep Learning-Based Classification Framework. Sensors 2021, histopathological images analysis. Computers in Biology and Medicine,
21, 748. https://doi.org/10.3390/s21030748. 104730.
[17] Satvik Garg and Somya Garg. 2020. Prediction of lung and colon cancer [24] Liang, M., Ren, Z., Yang, J., Feng, W., & Li, B. (2020). Identification of
through analysis of histopathological images by utilizing Pre-trained colon cancer using multi-scale feature fusion convolutional neural
CNN models with visualization of class activation and saliency maps. network based on shearlet transform. IEEE Access, 8, 208969-208977.

696 | P a g e
www.ijacsa.thesai.org

You might also like