Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
32 views6 pages

1 s2.0 S2666285X21000364 Main

The document discusses using data mining techniques like the random forest algorithm to predict crop yields. It provides background on the importance of accurate crop yield prediction for farmers and countries. It also reviews several related studies applying different predictive modeling methods and evaluates their accuracy in predicting yields, finding random forest often provides the best results.

Uploaded by

Gajalakshmi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views6 pages

1 s2.0 S2666285X21000364 Main

The document discusses using data mining techniques like the random forest algorithm to predict crop yields. It provides background on the importance of accurate crop yield prediction for farmers and countries. It also reviews several related studies applying different predictive modeling methods and evaluates their accuracy in predicting yields, finding random forest often provides the best results.

Uploaded by

Gajalakshmi K
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Global Transitions Proceedings 2 (2021) 402–407

Contents lists available at ScienceDirect

Global Transitions Proceedings


journal homepage: http://www.keaipublishing.com/en/journals/global-transitions-proceedings/

Crop yield forecasting using data mining


Pallavi Kamath∗, Pallavi Patil, Shrilatha S, Sushma, Sowmya S
Dept. Computer Science and Engineering, Shri Madhwa Vadiraja Institute of Technology and Management (Affiliated to VTU), Udupi 574115, India

a r t i c l e i n f o a b s t r a c t

Keywords: India is a heavily reliant on agriculture. Organic, economic, and seasonal factors all influence agricultural yield.
Accuracy Estimating agricultural production is a difficult task for our country, particularly given the current population
Agriculture situation. Crop production assumptions made far in advance can help farmers make the necessary planning for
Crop yield prediction
things like storing and marketing. Crop production prediction involves a huge amount of data, making it a per-
Data mining
fect candidate for data mining methods. Data mining is method of accumulating previously unseen anticipated
Random forest algorithm
information from vast database. Data mining assists in the analysis of future patterns and character, enabling com-
panies to make informed decisions. For a specific region, this research provides a fast inspection of agricultural
yield forecast using the Random Forest approach.

1. Introduction research mainly is Farmers will benefit from this forecast. To determine
which crops are best for their farm based on soil type, ph., and fertilizer
India’s primary occupation is agriculture and the country’s economy [11, 12].
are entirely dependent on it for rural survival. Farming accounts for
roughly 70% of the primary and secondary sectors. As a result, many
farmers have begun to employ new technology and methods to improve 2. Related works
their farming operations. People, on the other hand, are unaware of the
importance of cultivating crops at the appropriate time and place. In this Shailesh Shetty S et al. [1] This project supports farmers in evaluating
situation, using multiple elements that influence production to identify which crop to grow in a specific area at a specific time and predicting
crop adaptability and yield can improve crop quality and yield, resulting whether it will be profitable or not. It gives the specifics by specifying
in higher economic growth and profitability [2]. Crop development is a whether the crop is profitable. As a result, this device aids farmers in
challenging phenomenon that agriculture input parameters recommend. their decision-making process, allowing them to save time.
Data mining is method of accumulating previously unseen antici- Suvidha Jambekar et al. [2] Regression analysis is applied as a pre-
pated information from vast databases. Data mining assists in the anal- dictive modelling tool to predict crop production for crop production.
ysis of future patterns and character, enabling companies to make in- The regression algorithms applied were, Multivariate Adaptive Regres-
formed decisions. The process of analysing, cleaning, and modelling sion Splines, and then Multiple Linear Regression, Random Forest Re-
data to generate useful knowledge and conclusions is known as data gression. According to the results, Random Forest Regression may be
analysis [6]. used to accurately estimate wheat, and rice, and maize production.
Methods are used to convert the customer’s raw data into valuable B. Devika, B. Ananthi et al. [3] Agriculture expands yield production
information. This research can be extended to agriculture as well. Most to meet demand to limit overlapping, and the government encourages
farmers relied on their long-term field experience with specific crops it for crop yield forecast on TamilNadu dataset imports. The regression
to forecast a greater yield in the coming season. Nonetheless, they do method is put to the test of yield prediction capabilities in this study.
not receive a fair price for their crops. It typically occurs because of R. Vidhya et al. [4] They observed accuracy rate improves when a
insufficient irrigation or poor crop selection, but it may also occur when dataset with more features is used. As opposed to other approaches, such
crop yields are lower than expected. Due to a variety of factors, the as Decision trees, linear regression, random forest algorithm is shown
farmers who make up the majority do not achieve the predicted Crop to be superior to other prediction algorithms. The included dataset in-
yield. That data set of crop yield which consists of many components. By corporates a lot more variables resulting in more precise prediction.
studying the soil and atmosphere for the specific area, by which increase Hetal Patel, Dharmendra Patel et al. [5] They measured performance
crop production, optimal crop can be estimated [10]. Advantage of our of the classification algorithms Naive Bayes, J48, and Simple Cart. This
crop prediction comparative analysis employs a large dataset and 10-


Corresponding author.
E-mail addresses: [email protected] (P. Kamath), [email protected] (S. S).

https://doi.org/10.1016/j.gltp.2021.08.008
Received 28 June 2021; Accepted 5 July 2021
Available online 13 August 2021
2666-285X/© 2021 The Authors. Publishing Services by Elsevier B.V. on behalf of KeAi Communications Co. Ltd. This is an open access article under the CC
BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
P. Kamath, P. Patil, S. S et al. Global Transitions Proceedings 2 (2021) 402–407

fold cross validation to give an indication of the predictive abilities of


the data mining methods used.
Sangeeta, Shruthi G, et al. [6] They examined performance combin-
ing machine learning, Decision Tree and Polynomial Regression, Ran-
dom Forest, and algorithms. Random Forest method outperforms the
other algorithms in terms of yield prediction, according to the approach
they have proposed. The Decision Tree model, like the random forest,
polynomial regression, and decision tree models, classifies performance
that shows changes in the dataset. As a result, we determined that the
proposed model is more efficient than the current model for determin-
ing crop yield. The introduction of the above scheme will aid in the
betterment of our country’s agricultural practises. It can also be used
to help farmers reduce their losses and increase crop yields in order to
increase their resources in agriculture. To support our country’s agricul-
tural production, the system can be enhanced by merging it with other
fields such as horticulture, sericulture, and others.
B A Harshanand, Swathi Sriram B Srishti, Chaitanya R, Kirubakaran
Nithiya Soundari, V Mano Kumar, Varshitha Chennamsetti, Venkatesh-
waran G, Dr. Pramod Kumar Maurya, Akshay Prassanna S, et al. [7] de-
termined the accuracy provided by the Decision Tree Model was about
92.66 percent. Maize, with a predicted increase of +3.32 percent, and Fig 1. Block diagram of model
Sunflower, with a predicted increase of 2.43 percent, are the top crop
gainers. Niger will lose -7.81 percent of its crop, Moong will lose -4.2
percent of its crop, and Masoor will lose 2.4 percent of its crop. The new dicting datasets. It will choose a collection of features at random from
approach used to improve efficiency was the Random Forest Ensemble the dataset’s attributes and construct a set of decision trees by locating
Method, which will be provided accuracy of about 97.57 percent, which the root nodes and splitting the attributes. Following the creation of the
is greater than Decision Tree Model. Rape is expected to gain +0.92 per- forest, the best decision is made based on the highest number of votes
cent, while groundnut is expected to gain +0.445 percent. As a result, among the projected targets as the classifier’s final prediction. Crop yield
the comparison clearly demonstrates that ensemble approaches are of- prediction systems provide for better planning and decision-making to
ten superior in terms of improving performance and efficiency while increase production. The proposed system involves a prediction module
still providing high accuracy. based on data mining classification algorithm namely Random Forest
Saksham Garg, Parul Agrawal, Archit Agrawal, Aruvansh Nigam, et used to forecast the yield of major crops based on historical data.
al. [8] In this paper, we looked upon ML algorithms which are used to
determine harvests and based and mean absolute error techniques are
compared. They converted three variables from the Indian government’s 3.1. Agricultural dataset
official website, including temperature, rainfall, and production, into a
final dataset. They considered 4 models: Logistic Regression XGBoost Most of the research papers examined considered climatic variables
Classifier, KNN Classifier, Random Forest Classifier, and calculated ac- such as, area, Temperature, Precipitation, and Humidity. Some soil agro-
curacy of each model. They found that Random forest is best. nomical parameters, such as chalky, clay, loamy, sandy, and so on, as
N.Rohit M.Vineeth and S.Bhanumathi, et al. [9] The proposed sys- well as different seasons, are included. The data of these variables were
tem predict the crop yield using random forest and deep learning model. given as input. Initially dataset is collected which consisting of the pa-
And suggests the amount of fertilizer should be used for high yield. 2 rameters such as attributes like State Name, District name humidity,
different dataset -for crop data and fertilizer data are used. 80% used temperature, yield etc. Take into consideration any crops that will be
for training and 20% dataset is used for testing the data. The district, planted in the region. This collected dataset is in csv format.
area, season, and production factors are used to build a machine learn-
ing model which predicts yield. By considering phosphorous, potassium,
3.2. Pre-processing
nitrogen amount in soil, quantity of fertilizer required is determined.
Kunal Teeda Nandini, Vallabhaneni, Dr.T.Sridevi et al. [10] Various
A large dataset is needed for the of data mining application. The in-
models were discussed and their performance for a given dataset was
formation gathered from different sources is often in raw form. It could
measured in this paper, as well as artificial neural networks, Bayesian
include information that is incomplete, obsolete, or inconsistent. As a
network, Cluster model, Conventional Methods for prediction using De-
result, such redundant data should be filtered in this process. The infor-
cision trees, and ARIMA Prediction Model. K- Nearest Neighbours, mul-
mation should be normalized. The provided data collection has many
tivariate Linear Regression prediction models are applied to analyse the
’NA’ values, which are filtered in Python.
rainfall and soil behaviour. And found that KNN classification model is
Normalization is related to robust scaling, was also used but, uses
the best because the accuracy of the training data is more compared to
the interquartile range instead of normalizing the data because the data
other.
set contains numeric data. Normalization reduces the size of the data by
a factor of 0 to 1.
3. Methodology

The overall Architecture of the proposed model using Random forest 3.3. Train and test model
algorithm is described in Fig. 1.
The studies in this paper were carried out with PyCharm Community In the pre-processing step dataset will be divided into training
Edition 2021.1.1 × 64. The Important Classification Algorithm Random dataset and testing dataset. This is the important step while creating
Forest is applied to the data collection provided from the Official Gov- model. The training dataset is used to train a model and testing dataset
ernment website. To ensure accuracy, the datasets are examined. Ran- is used to evaluate the model. So, we fit the model with training data
dom Forest is a supervised learning technique for classifying and pre- and test it with testing data.

403
P. Kamath, P. Patil, S. S et al. Global Transitions Proceedings 2 (2021) 402–407

3.4. Classification algorithm Table 1


Comparison of different Models wrt Accuracy [8]
Once data splitting is done next process is Creating and Training MODEL ACCURACY (in percentage)
model using scikit-learn. The action of training machine learning model
Random Forest Classifier 67.80
requires machine learning algorithm along with training data to grasp
XGBoost Classifier 63.63
the pattern. Here we are using Random Forest algorithm which is well KNN Classifier 43.25
known supervised learning algorithm that works on bagging technique. Logistic Regression 25.81
Random Forest Algorithm is a combination of number of decision tree.
This algorithm is a classification algorithm based on ensemble classifier.
It will divide the dataset into Training data and Testing data. Further
training dataset is used to build the decision tree. Model will build a de-
cision tree by considering training data and separates the weaker node
from training data to get a better model. Each and individual training
dataset will generate a decision tree and then generate random forest.
The general idea of the bagging method is that an aggregate of mastering
output will increase the overall result. Random forest algorithm builds
multiple decision trees during training. Predictions made from these de-
cision trees will be collected and the final output will be the one which
is having maximum votes. Jupiter notebook is a platform which used to
create the trained model using Random Forest Algorithm.
Algorithm of Random Forest:
Input:
i) node from the decision tree, if node,
attribute = k then the split is done on the Kth
attribute
ii) V as the value obtained from the decision
tree then Vk= the value of kth attribute
Output:
label of V
If node is a Leaf, then
Return the value predicted by d Fig 2. Comparison between random forest and deep learning [9]
Else
Let k= node. Attribute
We have achieved the 98% accuracy which means this model is good
If j categorical then
for predicting yield.
Let v= Vk
Comparison with different model:
Let Cv = child node corresponding to the
By considering the different algorithm while predicting the yield,
attribute’s value v
The Random Forest Algorithm achieved High Accuracy. This is because
Return Classify (Cv, V)
the Random forest will construct the decision tree for individual set of
Else K is real valued
training dataset and then combine the multiple decision tree into to a
Let t= node. threshold (split threshold)
single decision tree and it will predict the yield by considering the Aver-
If Vk<t then
age value of the Tree [8]. Analysis of different algorithm is considered
Let Cl = child node corresponding to (<t)
in Table 1 with respect to accuracy and comparison between Random
Return Classify (Cl, V)
forest and Deep learning model [9] is graphically explained in Fig 2.
Else
Let Ch = child node corresponding to (>=1)
4. Result and discussion
Return Classify (Ch, V)
In this paper effort is made in order to know the region-specific crop
3.5. Predict yield yield analysis and it is processed by implementing by random forest
algorithm. In this project have chosen dataset which in .csv format. For
The trained model is used to predict the output on new input. Here the training purpose 80% of data is used and remaining 20% of data
we saved the trained model in a file so that model can be predict on the is used for testing. After the successful training and testing next step is
new input. finding the accuracy of the model. We have achieved a good accuracy
In this system we used pickle format developed in Jupyter, to store which means this model is good for predicting yield. We have designed
the trained Machine Learning model which stores the object in binary the Website which consists of Four Functional Modules as shown in the
stream and evaluate the model with testing dataset. This prediction Fig. 3.
model contains random forest algorithm that learn properties from train-
1) Crop Module: This module will provide the list of available crops.
ing data by using data it will make the predictions.
On selection of each one of it will give the detailed description of
the crop.
3.6. Accuracy 2) Soil Module: This module will provide the list of available soils. On
selection of each one of it will give the detailed description of the
Accuracy is the one of the metrics uses for evaluating classification soil.
model. Accuracy is calculated by dividing number of correct predictions 3) Weather Module: In this module by entering the city name the user
by total number of predictions. can get the live weather forecast. Openweatherapi is free open source
𝑁𝑢𝑚𝑏𝑒𝑟𝑜𝑓 𝑐 𝑜𝑟𝑟𝑒𝑐 𝑡𝑝𝑟𝑒𝑑𝑖𝑐 𝑡𝑖𝑜𝑛𝑠 weather data. By using weather API key can fetch the current or
𝐴𝑐 𝑐 𝑢𝑟𝑎𝑐 𝑦 = (1) historical weather data.
𝑇 𝑜𝑡𝑎𝑙𝑛𝑢𝑚𝑏𝑒𝑟𝑜𝑓 𝑃 𝑟𝑒𝑑𝑖𝑐 𝑡𝑖𝑜𝑛𝑠

404
P. Kamath, P. Patil, S. S et al. Global Transitions Proceedings 2 (2021) 402–407

Fig 3. Module description

Table 2
Fig 4. Prediction module
Comparing the accuracy

Model Accuracy
make it easier to identify patterns, trends, and outliers in large data
Proposed Model 98%
Saksham Garg et al. (2019) 67.80%
sets. Fig. 8 gives the proper idea about the information that is present
Shriya Sahu et al. (2017) 91.43% in the dataset [18-20].
Pair-plot is one which is used for visualization of dataset which is
graphically represented in Fig. 5. This is the module of seaborn library.
From the image below we can observe variation in each plot [21-22].
4) Predict: This predict module allows the user to select the district
Jointplot is also a seaborn library which is used to quickly visualize
name, crop name, soil type and area. After selecting these values
the relationship between two variables which is graphically shown in
user can click the predict button to get the estimated yield [15-17].
Figs. 6 and 7. In the below figure it gives relationship between yield
Comparative analysis of Random forest algorithm accuracy [8, 13, and year.
14] is mentioned in Table 2 Fig. 9 shows the graphically comparison between Actual and pre-
Fig. 4 Webpage defines the yield (tons) predicted by the consumer. dicted value of allocated dataset.
Data visualization is done by plotting the Yield variable with different
parameter 5. Conclusion
Data visualization is the practice of translating information into a
visual context, such as a map or graph, to make easier for the human The paper discussed machine learning algorithms for predicting crop
to grasp and pull insights. The main goal of data visualization is to yield based on temperature, season, and location. A Yield prediction for

Fig 5. Pairplot between area, yield and temperature

405
P. Kamath, P. Patil, S. S et al. Global Transitions Proceedings 2 (2021) 402–407

Fig 6. Jointplot of yield vs year

Fig 8. Data visualization between yield and area

Fig 7. Jointplot of area and year

a specific district can be made by combining Precipitation, Temperature,


and other parameters such as season and location. When all the factors
are considered, Random Forest emerges as the greatest classifier. The
dataset which is in use with more features increases the accuracy rate.
Random forest is the superior prediction algorithm when compared to
other technologies that are multiple linear regression and decision trees.
Our dataset contains a lot more variables, resulting in more accurate
predictions. The introduction of this project which are helpful to the Fig 9. Comparison between actual and predicted values
farmers to reduce their losses and increase crop yields to increase their
resources in agriculture. This will not only help farmers choose the best
crop to cultivate in the future season, but it will also help bridge the

406
P. Kamath, P. Patil, S. S et al. Global Transitions Proceedings 2 (2021) 402–407

technological and agricultural divide. Limitation of our project is, Yield [11] B.Jabber Potnuru Sai Nishant, Venkat, Bollu Lakshmi Avinash, Pinapa Sai, Crop
is predicted for 100acres and implemented for 30 districts. The Future Yield Prediction based on Indian Agriculture using Machine Learning, INCET, Bel-
gaum, India, Jun 5-7, 2020.
work of our project is to overcome our limitations. [12] Yogesh Gandge, Sandhya “A study on various data mining techniques for crop yield
prediction” 2017 International Conference on Electrical, Electronics, Communica-
References tion, Computer and Optimization Techniques (ICEECCOT).
[13] Shriya Sahu, Meenu Chawla “An efficient analysis of crop yield prediction using
[1] Shailesh Shetty S, Akshatha, Anet P James, Chaitra M Poojary “Crop analysis and hadoop framework based on random forest approach” International Conference on
profit prediction using data mining techniques” (IJERT). Computing, Communication and Automation (ICCCA2017).
[2] Shruthi G Sangeeta, Design and implementation of crop yield prediction model in [14] Dr. R. Sujatha, P.Isakki Devi, A Study on Crop Yield Forecasting Using Classification
agriculture, Int. J. Sci. Technol. Res. 8 (01) (JANUARY 2020). Techniques IEEE, 2016. 978-1-4673-8437-7/16/$31.00 ©.
[3] B. Devika, B. Ananthi, Analysis of crop yield prediction using data mining technique [15] S.I. Chu, C.L. Wu, T.N. Nguyen, B.H. Liu, Polynomial computation using unipolar
to predict annual yield of major crops, Int. Res. J. Eng. Technol. (IRJET) 05 (12) stochastic logic and correlation technique, IEEE Trans. Comput. (2021).
(Dec 2018). [16] T.N. Nguyen, V.V. Le, S.I. Chu, B.H. Liu, Y.C. Hsu, Secure localization algorithms
[4] R. Vidhya, Pragya Mathur, Shivani Sai Valluri, Crop yield prediction using random against localization attacks in wireless sensor networks, Wireless Pers. Commun.
forest, Int. J. Adv. Sci. Technol. 29 (9s) (2020) 3084–3086. (2021) 1–26.
[5] Hetal Patel, Dharmendra Patel, “A comparative study on various data mining algo- [17] S. Veenadhari, Dr. Bharat Misra, Dr. CD Singh “Data mining techniques for predicting
rithms with special reference to crop yield prediction.” Indian J. Sci. Technol. crop productivity – a review article”
[6] Suvidha Jambekar, Shikha Nema, Zia Saquib, “Prediction of Crop Production in In- [18] D.N. Tran, T.N. Nguyen, P.C.P. Khanh, D.T. Trana, An iot-based design using ac-
dia Using Data Mining Techniques”, IEEE, 2018 978-1-5386-5257- 2/18/$31.00 ©. celerometers in animal behavior recognition systems, IEEE Sens. J. (2021).
[7] BA Harshanand, Swathi Sriram B Srishti, Chaitanya R, Kirubakaran Nithiya [19] P. Subramani, G.B. Rajendran, J. Sengupta, R. Pérez de Prado, P.B. Divakarachari,
Soundari, V Mano Kumar, Varshitha Chennamsetti, Venkateshwaran G, Dr. Pramod A block bi-diagonalization-based pre-coding for indoor multiple-input-multiple-out-
Kumar Maurya, Akshay Prassanna S, “Crop value forecasting using decision tree put-visible light communication system, Energies 13 (13) (2020) 3466.
regressor and models” Eur. J. Mol. Clin.l Med. [20] V. Rajeswari, K. Arunesh, Analysing soil data using data mining clas-
[8] Saksham Garg, Parul Agrawal, Archit Agrawal, Aruvansh Nigam, “Crop yield pre- sification techniques, Indian J. Sci. Technol. 9 (19) (May 2016),
diction using machine learning algorithms” 2019 Fifth International Conference on doi:10.17485/ijst/2016/v9i19/93873.
Image P rocessing(ICIIP). [21] L.J.L. Sujan, V.D. Telagadi, C.G. Raghavendra, B.M.J. Srujan, R.V. Prasad,
[9] N. Rohit, M. Vineeth, S. Bhanumathi, Crop yield prediction and efficient use of fertil- B.D. Parameshachari, K.L. Hemalatha, Joint reduction of sidelobe and pmepr in
izers, International Conference on Communication and Signal Processing, April 4-6, multicarrier radar signal, in: Cognitive Informatics and Soft Computing, Springer,
India, 2019. Singapore, 2021, pp. 457–464.
[10] Kunal Teeda Nandini, Vallabhaneni, T. Sridevi “Comparative analysis of data mining [22] Rajendran, G.B., Kumarasamy, U.M., Zarro, C., Divakarachari, P.B. and Ullo, S.L.,
models for crop yield by using rainfall and soil attributes” Proceedings of the 2nd 2020. Land-use and land-cover classification using a human group-based particle
(ICICCT 2018) IEEE Xplore Compliant. swarm optimization

407

You might also like