1 s2.0 S235234092300968X Main
1 s2.0 S235234092300968X Main
Data in Brief
Data Article
a r t i c l e i n f o a b s t r a c t
∗
Corresponding author.
E-mail address: [email protected] (T. Khatun).
https://doi.org/10.1016/j.dib.2023.109936
2352-3409/© 2023 The Author(s). Published by Elsevier Inc. This is an open access article under the CC BY license
(http://creativecommons.org/licenses/by/4.0/)
2 T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
Specifications Table
Subject Computer Science
Specific subject area Image Categorization, Image Detection, Robotic Harvesting, Maturity Analysis
Data format Raw jpg
Type of data Image
Data collection In collaboration with an expert from Ministry of Agriculture, Bangladesh, a collection of
images was taken during the period from May 2023 to August 2023 from the
demonstration areas of three different locations in Bangladesh.
Data source location Location: Bappi Taj Agro Farm demonstration farm in Gazipur, Tipu Sultan Agro Farm in
Jhenaidah, and the Daffodil research farm in Gazaria, Munshiganj
Zone: Gazipur, Jhenaidah, Munshiganj
Country: Bangladesh
Data accessibility Repository name: Mendeley Data
Data identification number: 10.17632/2jpzbx8tm6.1
Direct URL to data: https://data.mendeley.com/datasets/2jpzbx8tm6/1
• Inconsistent human harvesting practices lead to the risk of overripe or underripe fruit. Har-
vesting dragon fruit prematurely leads to decreased sweetness, flavor, and overall quality, po-
tentially dissatisfying customers and reducing demand and sales, resulting in financial losses,
increased labor costs, and lower prices for growers. Physical characteristics like the weight,
texture, and external color of the peel are commonly employed as non-invasive techniques
for assessing the ripeness of dragon fruit [1]. Therefore, utilizing the computer vision ap-
proach this dataset has the potential to develop an automated harvesting system that can
empower farmers by delivering accurate advice on optimal harvest times by analyzing images
of various fruit development stages, consequently lowering labor requirements and minimiz-
ing financial losses.
• Detecting the freshness and identifying defects in dragon fruit is essential for upholding
product quality, minimizing wastage, avoiding economic repercussions, as well as creating
avenues for international exports while promoting the production of top-tier goods to satisfy
global market requirements [2]. The dragon fruit image dataset presented in this article can
play a pivotal role in this endeavor by serving as an asset for training computer vision and
deep learning models. This involvement aids in quality assurance, waste reduction, optimized
harvesting practices, and the automation of inspection processes. Ultimately, the dataset’s ap-
plication results in improved product quality, economic advantages for growers and the agri-
cultural sector, and heightened customer contentment, underscoring its significance in fresh
and defective dragon fruit detection.
• This dragon fruit dataset is significant for researchers as it serves as a valuable resource for
developing and testing computer vision, machine learning, and deep learning technologies.
Researchers can create automated systems for fruit recognition, improving harvest efficiency,
predicting freshness, and automating packaging and this dataset encourages interdisciplinary
collaboration between computer scientists and experts in other fields, particularly agricul-
ture. Moreover, the dataset has the potential to deliver economic benefits by reducing labor
costs and enhancing crop quality, underscoring its relevance and importance in the field of
computer science.
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936 3
2. Background
The compilation of this dataset arose out of the need to address challenges in identifying
dragon fruit developmental stages prevalent in agriculture. The creation of the dataset aligns
with ongoing efforts in precision agriculture, which aims to improve crop management prac-
tices through technological interventions. Motivation also arose from the lack of comprehensive
datasets specific to dragon fruit stages and diseases, which hindered the development of accu-
rate detection models. We collect 3780 images displaying different growth stages, and condi-
tions, this dataset serves as a valuable resource for training and validating deep learning algo-
rithms and enables fast and accurate detection of dragon fruit stages and qualities. The dataset
article complements a related research publication by providing researchers and practitioners
with access to raw data, increasing transparency, reproducibility, and further investigations to
optimize agricultural practices.
3. Data Description
The dataset comprises images that depict different phases of dragon fruit development, en-
compassing healthy young fruits, ripe fruits, and decayed specimens. These images were manu-
ally taken during the period spanning from May to August 2023 from the demonstration farm of
Bappy Taz Agro Farm in Gazipur, Tipu Sultan Agro Farm in Jhenaidah, and the Daffodil Research
Farm in Gazaria, Munshigonj with guidance from a domain expert using the cameras of a Redmi
Note 11 Pro Plus and a Samsung S22 smartphone. The resultant images with sizes 80 0×80 0 pix-
els are captured and stored in the JPG format. Each image in the dataset is labeled according to
its corresponding stage of maturity and quality, allowing for easy classification and analysis.
While gathering pictures from the dragon fruit orchard, we ran into a few difficulties such
as,
1. The primary challenge encountered during data collection pertained to capturing images
amidst noisy backgrounds and uneven lighting conditions.
2. The growth of dragon fruits is very time-sensitive. For the dataset to be accurate and rele-
vant, photos had to be collected at growth stages or during particular seasons.
Fig. 1 illustrates the dragon fruit field from where we gathered dataset images.
In this paper we have presented three varieties Bari Dragon Fruit-1, Connie Mayer Dragon
Fruit, and Thai Red Dragon Fruit. Table 1 represents the details of these varieties of dragon fruits.
In the field of agriculture science, Automation is a game-changer that benefits a nation’s agri-
culture economy in several ways. The raising of quality is one of the main benefits. A final result
that is uniform and of high quality is made possible by automation in tasks like fruit and veg-
etable sorting and grading. This is crucial for satisfying customer demands and those of global
markets, which frequently have high standards for quality. While manual fruit and vegetable
sorting is still common, it is well known to have a number of disadvantages. As human per-
ception can be subjective and impacted by things like exhaustion or personal judgment, it is
prone to mistakes and inconsistencies. Additionally, it takes a lot of time, especially when pro-
cessing greater amounts of fruit, which can result in inefficiencies and higher labor expenses.
Moreover, hand sorting can be expensive because Intelligent fruit grading systems have been
created to address these issues. These systems use computer vision algorithms to classify and
evaluate products automatically according to a variety of quality criteria. Computer vision makes
it possible to precisely measure and analyze traits including color, texture, size, shape, and
flaws.
To enable these advancements, this paper introduces two sets of data. The first dataset, re-
ferred to as the Dragon Fruit Maturity Detection Dataset, and the second dataset, the Dragon
Fruit Quality Grading Dataset, are presented. Each of these dataset folders is further divided
into two subfolders: the original dataset, consisting of images directly captured with a camera,
4 T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
Fig. 1. The real dragon fruit field from where we collected the dataset images.
and the augmented dataset, containing images generated from the original dataset using data
augmentation software. The Dragon Fruit Maturity Detection Dataset takes up 976MB of space,
while the Dragon Fruit Quality Grading Dataset occupies 624MB in its folder.
The ripeness and quality of dragon fruits are closely linked to characteristics such as color,
skin appearance, texture, flavor, size, and shape [1]. Within the Dragon Fruit Maturity Detec-
tion Dataset, both the original and augmented datasets are categorized into two groups: Ma-
ture Dragon Fruit and Immature Dragon Fruit. Similarly, within the Dragon Fruit Quality Grading
Dataset, both the original and augmented datasets are divided into two groups: Fresh Dragon
Fruit and Defect Dragon Fruit. Each of these folders includes relevant images of dragon fruits.
The organization of the dataset is presented in Fig 2.
The progression of dragon fruit growth differs based on factors such as its variety, cultivation
conditions, and climatic influences. Generally, it spans an average duration of 31 to 41 days,
roughly equivalent to one and a half months, for the fruit to attain its full mature size [1].
Furthermore, it is important to harvest the fruit at its optimal stage of maturity to ensure the
best quality, flavor, and texture [11]. Table 2 explains each category in both the Dragon Fruit
Maturity Detection and Quality Grading Dataset.
The dragon fruit dataset holds promise across various applications:
Developing robotic harvesting systems: The dragon fruit dataset serves as a pivotal re-
source in crafting sophisticated robotic harvesting systems capable of selectively picking ripe
fruits through image analysis. Leveraging this dataset, machine learning models are trained to
precisely locate and discern ripe dragon fruits amidst varying backgrounds. These models en-
able the development of algorithms that empower robots to make real-time decisions based on
color, texture, and shape analysis, selectively harvesting only ripe fruits while leaving others to
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936 5
mature further. Moreover, integrating this dataset-derived intelligence into robotic systems not
only streamlines fruit picking but also facilitates continuous learning and adaptation, refining
the system’s accuracy and efficiency in the dynamic context of fruit harvesting.
Automating quality control processes: The dragon fruit image dataset holds immense po-
tential in automating quality control processes within packaging facilities. Through machine
learning, this dataset can train models to assess various quality parameters, such as size, shape,
color, and defects, enabling automated inspection of dragon fruits as they move through the
packaging line. By leveraging the dataset, these systems can accurately identify, and sort fruits
based on predetermined quality standards, ensuring consistency and adherence to quality bench-
marks. Moreover, the dataset facilitates continuous learning, allowing the system to adapt and
improve its accuracy over time, enhancing efficiency and precision in the packaging process
while reducing human intervention.
Table 1
Details about the dragon fruit varieties in the dataset.
Table 1 (continued)
The information was collected by employing the cameras of a Redmi Note 11 Pro Plus and a
Samsung S22 smartphone.
The camera of the Redmi Note 11 Pro Plus device is equipped with a 108MP Samsung ISOCELL
HM2 sensor, which is a relatively large sensor with a size of 1/1.52 inches. The individual pixels
on the sensor have a size of 0.7μm, but they can be combined using a technique called 9-in-1
binning, where 9 pixels are merged to create a larger pixel with a size of 2.1μm.
Table 2
Concise overview of the dragon fruit maturity detection and quality grading dataset.
Dragon Fruit Maturity Immature Dragon Fruit Premature dragon fruit, in contrast to its ripe
Detection Dataset counterpart, is smaller in size, typically green,
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
or light pink, has a firmer texture, a milder
and less sweet flavor, underdeveloped seeds,
and may exhibit a slightly sour taste [1]. Its
firmness sets it apart from the softer and
sweeter qualities of fully ripe dragon fruit. The
exact characteristics can vary depending on the
dragon fruit variety and its specific stage of
ripeness.
7
8
Table 2 (continued)
Dragon Fruit Quality Fresh Dragon Fruit Depending on the variety, fresh dragon fruit
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
Grading Dataset has a bright exterior skin in tones of pink, red,
or yellow. The skin typically has scales or
spikes covering it, giving it an unusual and
exotic appearance. The flesh can be white or
red and is soft, juicy, and slightly crunchy due
to small black seeds [7]. A vivid color, a subtle
softness to the touch, and a delightful perfume
are indications of ripeness.
Table 3
Statistics of the dragon fruit dataset.
The Samsung S22 device’s camera is furnished with a 50MP Samsung GN5 sensor and Sony
IMX766 sensors, featuring a relatively spacious 1/1.57-inch sensor size. The individual pixels on
this sensor measure 1.0 μm each, accompanied by an f/1.8 aperture.
Data augmentation is essential for deep learning models, particularly for visual object recog-
nition. It is a potent technique for strengthening deep learning models, in particular, supple-
ments the training dataset by creating new images from the ones that already exist, enhancing
model generalization, and reducing overfitting. We used a variety of augmentation strategies
such as shearing, random rotation, horizontal flipping, width, and height changing, zooming,
and brightness modifications. To increase the dataset’s diversity and resilience, several proce-
dures were used in accordance with accepted best practices.
The photos may be oriented in a variety of ways according to these specifications, which in-
clude a rotation range of 45 degrees. Additionally, we added a 0.2 width and height shift range,
allowing for the displacement of the image’s content in both directions. The controlled defor-
mation was introduced with a shear range of 0.2. We changed the scale of the photographs by
applying a zoom range of 0.2 to provide more diversity. The dataset was expanded with mirrored
versions of the photos when horizontal flipping was enabled. We used the ’reflect’ fill mode to
manage picture modifications without any hiccups. Additionally, to ensure a dynamic range of
lighting circumstances, we changed brightness in the range of 0.5 to 1.5. The robust and var-
ied dataset produced by these parameter settings improved the deep learning models’ training
process.
Within our dataset, a code-driven, automated augmentation procedure was used to create a
total of 10010 augmented pictures. These enhanced pictures were carefully designed to increase
the variety and depth of our dataset. These improved images are skillfully paired with the appro-
priate original sample images for each category in Table 3. This careful pairing serves to give a
clear and instructive representation of the results of the augmentation, successfully demonstrat-
ing the effectiveness of the data augmentation process in growing and enhancing our dataset.
The training dataset is given controlled variance, which makes the model more adaptable
to actual-world circumstances. Our main methods include shearing for various viewpoints, hor-
izontal/vertical shifting (up to 20% width/height), and random rotation (0-45 degrees). While
horizontal flipping teaches orientation invariance, random zooming (80–120%) aids in managing
various scales. A fill mode keeps the image’s original content while adjusting the brightness (50–
150%) and contrast (70–130%) to account for changes in lighting. Pre-processing adds a random
zoom function, increasing the model’s versatility. Models are now able to distinguish things in a
variety of real-world scenarios thanks to these strategies. Fig. 3 displays the augmented images
of dragon fruit from the dataset, while Table 3 provides the dataset’s statistical information.
We introduced a deep learning model designed to effectively train the dataset, striving for
state-of-the-art outcomes. The validation of this deep learning model requires a thorough eval-
10 T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
uation of its performance on a dataset. A deep learning model comprises interconnected layers
of nodes, where each node signifies a computational unit. The input layer’s nodes receive data,
while the output layer’s nodes generate the ultimate outcome. Situated between these input
and output layers are hidden layers, housing the neural network’s primary computational capac-
ity [9]. Deep learning models have made substantial strides in analyzing visual data, including
tasks such as classifying images or videos, detecting objects, and processing natural language
[10]. The deep learning model follows a structured five-step process, encompassing data pre-
processing, data segmentation, model training, performance evaluation on a validation set, and
ultimately, testing the model on a completely distinct test set. This rigorous approach is crucial
to verify the model’s reliability in producing accurate results and its capacity to adapt to new
data.
Data pre-processing is critical for deep learning because it prepares visual data for model
input, enhances data quality, and influences model performance, generalization, and efficiency. It
ensures that the images are in a suitable format for the computer vision tasks, addresses issues
that can affect model learning and decision-making, and ultimately leads to more accurate and
reliable results in various applications. In this research work, image pre-processing involves a
range of data transformations, including actions such as data labeling, image resizing, image
augmentation, and segmentation.
Data labeling: During the first round of data pre-processing, we scrupulously labeled the
data, properly assigning each image to its corresponding class or category. Labeled data serves
as the foundation for training and refining deep learning models; without precise labels, models
are unable to acquire knowledge and make reliable predictions.
Image resizing: Because images within the dataset may come in different sizes, we found
it necessary to resize them according to our specifications to provide a consistent and under-
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936 11
standable dataset. This reduced computational demands and guaranteed compatibility during
the training of deep learning models.
Image segmentation: As needed, we carried out image cropping to remove undesirable back-
ground elements, thereby improving the dataset’s overall quality.
Data augmentation: The deep learning model requires a huge volume of data as it enhances
model performance reduces overfitting and enables complex feature extraction [12]. Hence, we
expand the dataset size by employing various augmentation techniques, as comprehensively out-
lined in Section 3.2.
Fig. 4 represents the pre-processing steps that we have applied to the dataset.
The dataset underwent a meticulous division into two distinct sets, namely the training
dataset and the testing dataset, following a thoughtful separation process. This involved ran-
domly selecting 80% of the photos to compose the training dataset, with the remaining 20%
constituting the test dataset. Importantly, there were no repeated images shared between the
training and test sets. The testing set played a pivotal role in evaluating the model’s perfor-
mance, serving as a robust benchmark after it had been trained on the training data.
A comprehensive overview of the rigorous validation techniques employed in our deep-
learning model, utilizing the dragon fruit image dataset, is thoughtfully presented in Fig. 5. These
validation procedures encompassed various tasks, including the discrimination of mature and
immature dragon fruit, as well as the classification of dragon fruit as fresh or defective. This
validation framework ensured the model’s effectiveness and reliability in achieving the specific
objectives of our study.
Fig. 5. The working process for assessing dragon fruit ripeness and distinguishing between fresh and defective dragon
fruits.
to learn complex data representations by integrating shortcut connections and limiting overfit-
ting by bypassing some layers. The result of the residual block is subsequently transferred to
the following block. Convolutional blocks facilitate feature extraction and boost network perfor-
mance with convolutional layers, batch normalization, and ReLU activation functions.
In ResNet50, batch normalization is applied after each convolutional layer and before the
activation function (e.g., ReLU), ensuring that the inputs to subsequent layers are well-scaled
and centered. ReLU introduces non-linearity into the network by replacing negative values with
zeros. In addition, to reduce the spatial resolution, capturing the most important information
while reducing computational complexity max pooling layer is used periodically which involves
selecting the maximum value in a local region of the feature map.
The architecture concludes with a global average pooling layer, followed by a fully connected
layer and softmax layer. ResNet employs global average pooling as an alternative to the conven-
tional fully connected layers, which serves to decrease spatial dimensions and create a feature
vector. The classification output is generated through a last fully connected layer, and the quan-
tity of neurons within this layer is determined by the number of categories involved in the clas-
sification task. The softmax layer in ResNet-50 serves the purpose of converting the raw output
into a probability distribution, particularly for multi-class classification tasks. It ensures that the
network’s output represents the likelihood of the input belonging to different classes, making it
easier to determine the predicted class and calculate the loss during training.
Precision: Precision indicates how accurately the model’s optimistic predictions came true. It
measures the proportion of real positives to all anticipated positives.
T rue Positive
P recision = (2)
T rue Positive + F alse Positive
Recall: The model’s capacity to recognize all pertinent instances in the dataset is measured
by recall, also known as sensitivity or true positive rate. It measures the proportion of real pos-
itives to all actual positives.
T rue Positive
Recall = (3)
T rue Positive + F alse Negative
F1-Score: The harmonic mean of recall and precision is known as the F1-Score. When you
need to take into account both false positives and false negatives, it provides a balance between
these two measures and is particularly helpful.
Recall × P recision
F 1 − Score = 2 × (4)
Recall ± P recision
Confusion matrix: A crucial tool for assessing the effectiveness of classification models, par-
ticularly in situations with several classes, is the confusion matrix. It gives a thorough under-
standing of how closely the model’s predictions match the actual class labels for distinct cat-
egories. This matrix is crucial for identifying the model’s benefits and drawbacks when cate-
gorizing various groups, allowing for a thorough assessment of its effectiveness. The confusion
matrix equips data scientists to make well-informed decisions, comprehend class-specific per-
formance, and pinpoint areas for development by classifying forecasts into true positives, true
negatives, false positives, and false negatives. The following Fig. 6 represents the confusion ma-
trix of ResNet50 model for dragon fruit maturity detection and quality grading dataset.
This amazing achievement underlines the ResNet50 architecture’s potency in precisely deter-
mining the maturity and quality grading of dragons. The model has demonstrated strong perfor-
mance, achieving a 90% accuracy rate in distinguishing between immature and mature dragon
fruit and a 98% accuracy rate in identifying fresh or damaged dragon fruit which is clearly de-
picted in Table 4. This outstanding performance highlights the model’s strong ability to general-
ize to new data, demonstrating its utility for real-world applications.
In the times to come, we will thoroughly investigate advanced deep learning models with
the help of this dataset to identify the most effective approach for real-world applications. In the
14 T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936
Table 4
Classification report for maturity detection and quality grading.
future, by utilizing machine learning algorithms and AI for image processing we will develop a
consumer-oriented mobile app aiding in selecting ripe, fresh, and defective dragon fruits.
Limitations
The classification of any other fruit would not be possible for this dataset because it solely
relates to and is primarily focused on dragon fruit.
Data availability
Dragon Fruit Maturity Detection and Quality Grading Dataset (Original data) (Mendeley Data)
Ethics Statement
None of the authors of this article have conducted any research using humans or animals
as subjects. The datasets consulted for this article are accessible to everyone but following the
correct citation guidelines is essential.
Acknowledgements
We are very grateful to the domain expert Mohammad Enayet-e-Rabbi, Deputy Director of
Quality Control, Seed Certification Agency, Ministry of Agriculture, Bangladesh for the valuable
feedback and cooperation to accomplish the task.
The authors declare that they have no known competing financial interests or personal rela-
tionships that could have appeared to influence the work reported in this paper.
T. Khatun, Md.A.S. Nirob and P. Bishshash et al. / Data in Brief 52 (2024) 109936 15
References
[1] Deep Lata, et al., Maturity determination of red and white pulp dragon fruit, J. Horticult. Sci. 17 (1) (2022) 157–165,
doi:10.24154/jhs.v17i1.1309.
[2] Pallavi U. Patil, et al., Grading and sorting technique of dragon fruits using machine learning algorithms, J. Agric.
Food Res. 4 (2021) 100118, doi:10.1016/j.jafr.2021.100118.
[3] Modern manufacturing techniques of Bari Dragon Fruit-1 by Bangladesh Agricultural Research Institute. Available
at: https://bari.portal.gov.bd/sites/default/files/files/bari.portal.gov.bd/page/dff6cca4_a440_403a_a7e6_b50f4d3ed2f0/
Dragon%20Fruit%20%281%29.pdf, (Accessed: 24 November 2023).
[4] Connie Mayer, Dragon Fruit ROOTED Plants, Healthy Harvesters, Available at: https://hhplantnursery.com/products/
4- connie- mayer- dragon- fruit- rooted- plants, (Accessed: 24 November 2023).
[5] Naruwan Yusamran, Nualsawat Hiransakolwong, DIPDEEP: classification for Thai dragon fruit, Eng. Appl. Sci. Res. 49
(4) (2022) 521–530.
[6] Kristina, How to store dragon fruit, Savory Suitcase (2023) https://www.savorysuitcase.com/
how- to- store- dragon-fruit/. (Accessed: 03 November 2023).
[7] F. Spritzler, Dragon fruit: nutrition, benefits, and how to eat it, Healthline (2023). Available at: https://www.
healthline.com/nutrition/dragon-fruit (Accessed: 03 November 2023).
[8] Viccie (2022) How to tell if dragon fruit has gone bad? - check your fruit!, Miss Vickie. Available at: https:
//missvickie.com/how- to- tell- if- dragon- fruit- has- gone- bad/ (Accessed: 03 November 2023).
[9] Rofia Abada, Abdulhalim Musa Abubakar, Muhammad Tayyab Bilal, An overview on deep leaning application of big
data, Mesopotamian J. Big Data (2022) (2022) 31–35, doi:10.58496/MJBD/2022/004.
[10] Md.Ashiqul Islam, et al., An automated convolutional neural network based approach for paddy leaf disease detec-
tion, (IJACSA) Int. J. Adv. Comput. Sci. Appl. 12 (1) (2021) 280–288.
[11] N. Minh Trieu, N.T. Thinh, Quality classification of dragon fruits based on external performance using a convolu-
tional neural network, Appl. Sci. 11 (22) (2021) 10558, doi:10.3390/app112210558.
[12] Tania Khatun, et al., An extensive real-world in field tomato image dataset involving maturity classification and
recognition of fresh and defect tomatoes, Data Brief 51 (2023) 109688, doi:10.1016/j.dib.2023.109688.
[13] Kaiming He, et al., Deep residual learning for image recognition, in: Proceedings of the IEEE Conference on Com-
puter Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June, 2016, pp. 770–778.