Report Second Part
Report Second Part
CHAPTER-1
INTRODUCTION
1|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 1
INTRODUCTION
1.1 Background
Agricultural practices heavily rely on traditional knowledge, which may not always align
with current environmental and economic conditions. For centuries, farmers have depended
on methods passed down through generations, such as crop rotation schedules or planting
techniques. While these methods have their merits, they often fail to address modern
challenges like soil degradation, climate variability, and changing market demands.
These limitations underscore the urgent need for modernization in agriculture. Machine
learning (ML) provides a robust framework to bridge this gap by integrating diverse data
sources—such as soil health, weather patterns, and historical yields—into actionable insights.
By leveraging ML techniques, farmers can adopt precision agriculture, optimizing resource
use and maximizing productivity while mitigating environmental risks. practices heavily rely
on traditional knowledge, which may not always align with current environmental and
economic conditions. Machine learning (ML) provides a data-driven approach to address
these challenges. By integrating historical agricultural data, soil composition, and climate
factors, we can build a predictive model for crop recommendation.
Traditional methods, while effective in the past, often fail to accommodate modern
challenges such as climate change and soil degradation. For instance, a farmer relying on
ancestral knowledge may continue to grow wheat even as soil nitrogen levels decline, leading
to poor yields. By incorporating ML, we can dynamically assess conditions and recommend
alternatives like legumes to replenish soil health.
2|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
In many regions, farmers face challenges in selecting the right crop for their land due to
changing climate patterns, soil degradation, and limited knowledge of modern agricultural
practices. A data-driven crop recommendation system can:
For example, in arid regions, traditional rice cultivation might lead to water wastage, whereas
ML models could recommend drought-resistant crops like millets or sorghum.
In many regions, farmers face challenges in selecting the right crop for their land due to
changing climate patterns, soil degradation, and limited knowledge of modern agricultural
practices. A data-driven crop recommendation system can:
3|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
This project focuses on recommending crops based on static data, such as soil composition
and climatic averages. While real-time dynamic data integration is beyond the scope of this
study, the current system lays the groundwork for future developments. By emphasizing
general suitability over hyper-specialization, it ensures broad applicability across diverse
agricultural contexts. The project aims to:
Set a foundation for integrating dynamic data sources, such as IoT sensors and
satellite imagery, in future iterations.
4|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER-2
LITERATURE REVIEW
5|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 2
LITERATURE REVIEW
Algorithms like Decision Trees and K-Nearest Neighbours (KNN) have been employed to predict
crops based on soil properties. For instance, Decision Trees provide an interpretable framework for
farmers to understand why a particular crop is recommended. For example, they can reveal that high
nitrogen levels Favor crops like wheat or corn.
Techniques like Random Forests and Support Vector Machines (SVMs) incorporate temperature and
rainfall data, enhancing prediction accuracy. Random Forests are particularly robust in handling
diverse datasets with varied feature importance. This approach is beneficial in areas with mixed
climatic zones were rainfall distribution impacts crop choice.
Deep learning has been used to analyse large-scale datasets, especially when combining satellite
imagery with traditional agricultural data. Convolutional Neural Networks (CNNs) and Recurrent
Neural Networks (RNNs) have shown promise in predicting both crop type and yield. For instance,
CNNs can identify soil moisture levels from images, enhancing prediction accuracy.
Limited regional adaptation: Current models often fail to generalize across different
geographic areas.
High computational requirements for deep learning models, making them inaccessible
to resource-constrained regions.
6|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER-3
METHODOLOGY
7|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 3
METHODOLOGY
The data sourced includes both structured and semi-structured formats. These datasets were
validated for completeness and relevance to ensure high-quality input for the model
development. Regional agricultural databases were particularly valuable in incorporating
localized insights.
Soil properties: pH, nitrogen (N), phosphorus (P), and potassium (K) levels.
Each feature was carefully selected to maximize the relevance of the recommendations. For
instance, high potassium levels in soil often correlate with better yields for crops like bananas
and tomatoes. Similarly, analyzing rainfall thresholds can help determine whether rice or
maize is more suitable. The inclusion of economic viability ensures that the system aligns not
only with environmental factors but also with farmers’ profitability goals.
8|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
Handling Missing Values: Missing data were imputed using statistical techniques
such as mean or median imputation. For example, if humidity data were missing for a
region, the average humidity for similar climatic zones was used.
Normalization: Features were scaled to a uniform range (e.g., 0-1) to ensure that the
machine learning algorithm could converge efficiently.
Algorithm Selection: The choice of Random Forest was driven by its balance of
interpretability, robustness, and capacity to handle diverse and non-linear data. While
other algorithms like Support Vector Machines (SVMs) and K-Nearest Neighbors
(KNN) were considered, Random Forest stood out due to its ability to manage high-
dimensional datasets and its resistance to overfitting through ensemble learning.
Additionally, Random Forest provides feature importance scores, which were crucial
for understanding the influence of soil and climatic factors on crop recommendation.
Training and Testing: The dataset was split into 80% for training and 20% for
testing to evaluate the model’s performance on unseen data.
Evaluation Metrics:
1. Accuracy: Percentage of correctly predicted outcomes.
2. Precision and Recall: To assess the model’s reliability in classification tasks.
3. F1-Score: Harmonic mean of Precision and Recall.
9|Page
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
Input Layer: Accepts soil and climatic parameters entered by the user, including
essential data such as pH, NPK values, temperature, and rainfall.
Processing Layer: Implements the trained Random Forest model, which processes
the input data to predict the recommended crop. This layer involves multiple decision
trees working collectively to generate robust and reliable predictions.
Output Layer: Displays the recommended crop along with supplementary
information, such as the ideal planting time, potential yield estimates, and required
soil amendments to enhance productivity. These additional outputs are designed to
assist farmers in making informed decisions and implementing precise agricultural
practices.
10 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER-4
11 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 4
The Random Forest model demonstrated robust performance with the following metrics:
Accuracy: 92%
Precision: 90%
Recall: 93%
F1-Score: 91%
These results indicate that the model effectively identifies suitable crops for various soil and
climate conditions. However, the performance of the model heavily depends on the quality of
input data.
Analysis of Metrics:
Recall: At 93%, the model effectively captures the majority of suitable crops for
specific inputs. The higher recall value highlights the model's reliability in diverse
agricultural scenarios, although borderline cases where soil pH or climatic conditions
deviate from standard thresholds remain challenging.
F1-Score: The F1-score of 91% reflects a balanced trade-off between precision and
recall. This suggests that the model performs consistently well across datasets, though
there is room for improvement in making finer distinctions, particularly for crops with
overlapping feature profiles.
12 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
Crops requiring high levels of nutrients were correctly recommended for fertile soils,
showcasing a strong alignment with scientific and agricultural expectations. For
instance, the model accurately identified nitrogen-rich soils as optimal for crops like
wheat and corn, which depend heavily on nitrogen availability for robust growth.
In arid regions, the system reliably suggested drought-resistant crops such as millets
and sorghum, effectively addressing the challenges of water-scarce conditions. This
ensures resource-efficient farming in areas where traditional water-intensive crops
would likely fail.
While the system achieved high accuracy overall, it occasionally struggled in borderline
climatic regions or when specific soil properties, such as micronutrient levels, were
unavailable. Addressing these gaps through enhanced data collection methods and integrating
dynamic data sources, such as IoT-based soil sensors or live weather feeds, would
significantly enhance its predictive capabilities.
13 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
4.3 Limitations
Static Datasets: The reliance on static datasets limits real-time adaptability and the
ability to predict changes in environmental conditions. This static nature does not
account for unpredictable factors like sudden weather changes or pest outbreaks,
reducing its real-world applicability in dynamic scenarios.
Data Quality Dependence: Missing or inaccurate input data can adversely affect
predictions, emphasizing the need for robust data collection and preprocessing
methods. For instance, errors in soil pH values or outdated climatic records could lead
to inappropriate crop suggestions.
Real-Time Integration: Lack of integration with IoT devices and live weather feeds
restricts the model's responsiveness to immediate environmental changes. Without
real-time feedback, the model cannot adapt to sudden rainfall or unexpected drought
conditions, which are crucial for modern precision agriculture practices.
14 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER-5
15 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 5
5.1 Conclusion
The system was built using a comprehensive dataset that included key factors such as soil
type, climate, and historical crop yield data. Data preprocessing was employed to clean and
transform the raw data into usable features, which were then used to train various predictive
models. After testing and evaluation, the final model demonstrated a reasonable level of
accuracy in recommending crops based on the given parameters.
Climate plays a crucial role in determining the suitability of crops for a particular
region.
Soil characteristics, including texture and nutrient content, are vital factors for crop
selection.
Despite the success of the developed system, several limitations were identified. For instance,
the model's accuracy may vary across different regions or data quality. Additionally, the
model requires continuous updates to account for changes in environmental factors such as
soil health and climate conditions.
16 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
While the project has made significant progress, there are areas where further
development and improvement can be made. The following suggestions outline the
potential future directions:
Incorporation of Advanced Algorithms: Future work could involve the use of more
advanced algorithms, such as reinforcement learning or ensemble models, to improve
the robustness and reliability of the crop recommendations. These algorithms may
adapt more dynamically to changing conditions over time.
Predictive Maintenance of Crops: The system could also evolve to predict potential
diseases, pest outbreaks, or nutrient deficiencies by analyzing trends in historical data
and environmental factors. This proactive approach would help farmers address issues
before they significantly impact crop yield.
17 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
18 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER-6
APPENDICES
19 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
CHAPTER 6
APPENDICES
6.1 Dataset
The dataset used in this project was sourced from publicly available agricultural and
environmental records, ensuring diverse and comprehensive data for accurate predictions.
Key features of the dataset include:
Soil Properties:
Concentrations of nitrogen (N), phosphorus (P), and potassium (K), which are
essential nutrients for crop growth.
Climatic Data:
Crop Characteristics:
The dataset underwent thorough preprocessing to ensure its compatibility with the machine
learning model:
1. Handling Missing Data: Missing entries, such as rainfall data, were imputed using
statistical averages derived from geographically similar climatic zones.
20 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
3. Data Cleaning: Erroneous and irrelevant records were removed to improve model
accuracy
21 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
22 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
23 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
24 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
Built using Flask to create a web-based user interface that accepts input
parameters and displays crop recommendations.
3. Key Scripts:
Predict(): Accepts user inputs, preprocesses them using scalers, and predicts
the recommended crop.
25 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
Source Code:
import pandas as pd
from sklearn.ensemble
import LabelEncoder
import joblib
# Load dataset
df = pd.read_csv('Updated_Crop_Recommendation_with_Disease_Info.csv’)
if df.isnull().sum().any():
df = df.fillna("Unknown Disease")
y_crop = df['Recommended_Crop’]
y_disease = df['Disease']
label_encoder_crop = LabelEncoder()
y_crop_encoded = label_encoder_crop.fit_transform(y_crop)
label_encoder_disease = LabelEncoder()
y_disease_encoded = label_encoder_disease.fit_transform(y_disease)
# Train models
26 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
disease_model = RandomForestClassifier(random_state=42)disease_model.fit(X,
y_disease_encoded)
joblib.dump(crop_model, 'crop_model.pkl’)
joblib.dump(disease_model, 'disease_model.pkl’)
joblib.dump(label_encoder_crop, 'label_encoder_crop.pkl’)
joblib.dump(label_encoder_disease, 'label_encoder_disease.pkl’)
df['Recommended_Crop'] = df['Recommended_Crop'].str.split(',').str[0].str.strip()
27 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
6.4 Output:
28 | P a g e
Department of Data Science 2024-25
Smart Crop Prediction based on Soil Fertility
7. References
30 | P a g e
Department of Data Science 2024-25