Weather Patterns Analysis and
Prediction
Presentation
Student Name : Riyan Ahmed
Date of submission:22.10.24
Project Report
INTRODUCTION
OBJECTIVE:
• To Conduct exploratory data analysis (EDA),
• To Implement K-Nearest Neighbors (KNN) classification
for prediction
• Apply K-Means clustering to group weather patterns
The dataset includes several columns related to
environmental factors, such as Dew Point, Humidity,
Pressure, Temperature, and Visibility. These factors can be
used to understand the atmospheric conditions associated
with different weather patterns.
Tools Used:
• Orange
• Excel
Project Report
EXPLORATORY DATA ANALYSIS (EDA):
Using orange, we import the csv dataset in
order to do further progress
For questions 1-4, we find the values using the
“Feature Statistics” widget
The Highest temperature is 45 C, which
we visualize using the scatter plot to find
the exact point,25 May 2015
Average Humidity (Mean)=36%
Median visibility=2
Most frequently observed wind
direction= WNW
Project Report
For Question 5, We have to find Which Weather
Condition, Temperature pair has the highest average
temperature?
To do that, We first have to separate out the pair from
the rest using “select columns” widget
Then We aggregate these columns using the mean
value, as the average is required.
Finally, after aggregation of two columns , we visualize
it using scatter plot also while using feature statistics.
Which displays that haze, has the highest average
temperature pairing at Haze, 31.92°C
Project Report
Record for 14-Jan-2016
Project Report
Record for 28-Jan-2017
K-Nearest Neighbors (KNN)
For The K-NN section, we have to use the
K-NN algorithm in order to answer the
questions
We were given a record for 6-jan-2019,
which we will add to the dataset using
excel.
For Question 6, we use “neighbors” widget
to find the 3 nearest neighbors using
Euclidean distance.
Ignoring categorical columns, using select
column widget, we use a data table to
select the record.
We provide the dataset to be the data and
selected record to be the reference point to
the widget with Euclidean distance metric
and 3 neighbors.
Which provides us with 3 records.
Project Report
Up Next for question 7, we use K-NN
algorithm with K=3, to predict the rain
presence for the new record.
We import the Updated dataset and
ignore categorical columns and feed it to
kNN widget as data.
Then We import the record as a separate
file and feed it to predictions widget as
data
We then connect knn model as predictors
to predictions widget.
The k-NN predictions indicate rain
presence to be 0.
We can cross check this using the
neighbors, which also give out a 0.
7
Project Report
K-means
We have been given K=2 and record for
cluster center C1 and C2.
To find distance between the given record and
C1, we use the Euclidean distance formula.
Or else, we can use neighbors widget again
but this time without limiting the number of
neighbors.
Thus, we get the distance to be 6.24
Similarly, for Question 9, We know the record
will be assigned to the nearest cluster.
Thus using distance formula or the neighbors
widget , we find the nearest cluster center,
which turns out to be C1.
Project Report
We see that the given record is
closer to C1 than C2 by a large
margin
Thus , it will be assigned to C1
( record for 1-Jan-2015)
Project Report
For The last question , we cluster them
using k-means widget
Then we use feature statistics once again
along with select row to find the new
values of cluster centers C1 and C2
Project Report
LEARNINGS:
We could predict the rain presence without k-nn just with the help of neighbors. This
shows how humans can also perform it.
Simple distance calculations can solve A question, But its not efficient.
Insights:
Seasonal Weather Patterns: Identify clusters of weather conditions and analyze
their seasonal variations.
Environmental Drivers: Understand how environmental factors influence different
weather clusters.
Predictive Weather Modeling: Use KNN to predict weather conditions based on
current factors.
Wind Patterns and Precipitation: Analyze prevailing wind directions and their
relationship to precipitation.
Anomaly Detection: Identify unusual weather events by detecting outliers in the
data.
Data Exploration and Visualization:
Data Table: Orange's Data Table widget provides an easy-to-use interface for
viewing and manipulating the dataset. It allows you to filter data, sort columns,
and quickly spot missing values or inconsistencies.
Scatter Plot: This widget was used to visualize relationships between different
variables. For example, plotting Temperature against Humidity can reveal potential
correlations or patterns.
Project Report
Machine Learning:
K-Nearest Neighbors: Orange has a dedicated widget for implementing KNN classification. You
can easily experiment with different values of k and evaluate the model's performance using
various metrics.
K-Means Clustering: Orange provides a straightforward implementation of the K-Means
algorithm. You can easily visualize the clusters and explore their characteristics.
Project Report
CHALLENGES:
1. Data Handling:
Dataset Size and Complexity: The dataset presented challenges due
to its relatively large size, leading to increased processing times and
potential memory limitations.
2. Orange Tool Usage:
Learning Curve: The initial learning curve associated with navigating
the Orange interface and understanding the workflow proved to be
a time-consuming aspect of the project.
Widget Configuration: Configuring the parameters of various
Orange widgets correctly and efficiently required careful
consideration and experimentation.
3. Prediction and Error Analysis:
Model Interpretation: Interpreting the factors that influenced the
model's predictions and understanding the reasons behind specific
errors proved to be complex.
Evaluation Metrics: Selecting appropriate evaluation metrics and
interpreting the results to assess model performance accurately
required careful consideration.
Project Report
CONCLUSION
OBJECTIVES COMPLETED:
• Conducted exploratory data analysis (EDA),
• Implemented K-Nearest Neighbors (KNN)
classification for prediction
• Applied K-Means clustering to group weather
patterns
In essence, while the analysis of this specific dataset may
have limitations, the techniques and insights gained can
be applied to broader weather-related problems with
significant societal and environmental implications.
Project Report
Data Science and AI in Weather Analysis:
•Improved Forecasting:
•Accuracy: AI models, especially deep learning models, can analyze vast amounts of
historical weather data, including temperature, humidity, wind speed, pressure, and satellite
imagery, to create highly accurate weather forecasts. This can significantly improve the
lead time and accuracy of predictions, which is crucial for various sectors like agriculture,
transportation, and disaster preparedness.
•Spatial Resolution: AI can generate high-resolution forecasts for specific locations,
providing more localized and accurate predictions.
•Climate Change Analysis:
•Trend Identification: Data science techniques can analyze long-term weather patterns to
identify trends related to climate change, such as rising temperatures, changing
precipitation patterns, and increasing frequency of extreme weather events.
•Impact Assessment: AI can help assess the potential impacts of climate change on
various sectors, such as agriculture, water resources, and human health.
Project Report
REFERENCES:
Orange, data mining tool
https://orangedatamining.com
https://youtube.com
https://www.geeksforgeeks.org/
Microsoft Excel
Thank You