Rainfall Prediction
Using Modified Linear
Regression
Submitted by:
John Philip O. Echevarria
Lazan, Rolan
ABSTRACT
Analytics often involves studying past historical data to research potential trends.
Weather condition is the state of atmosphere at a given time in terms of weather variables
like rainfall, cloud condition, temperature, etc., the existing models use data mining
techniques to predict the rainfall. The main disadvantage of these systems is that it
doesn’t provide an estimate of the predicted rainfall. The system calculates the average of
values and understand the state of atmosphere, which doesn’t yield estimate result. This
paper represents a mathematical method called linear regression method is modified in
order to obtain the prediction of rainfall through the temperature and wind speed.
INTRODUCTION
Rain and bad weather conditions bring direct consequences to several sectors of
the economy. One of those that suffers the most is precisely the maritime transport sector.
Some shipment products cannot be transported under rain, which brings about a deep
impact in logistics and in business generation. Some Brazilian ports register up to 110
days of rain per year, which means that these places might have their activities suspended
for up to one third of the year (Wilson Sons, 2019).
In the Philippines, shipping has always played an important role in the economy
of the country, since the Spanish colony and it has improved over the years. As
mentioned earlier, weather affects the maritime transportation and most likely the
shipment of products especially those vulnerable goods which cannot be unloaded in the
ports while raining. It is therefore necessary to predict the weather condition of the ports
if it involves this type of activities among others.
Analytics often involve studying past historical data to research potential trends.
Weather condition is the state of atmosphere at a given time in terms of weather variables
like rainfall, cloud conditions, temperature, etc., the existing models use data mining
techniques to predict the rainfall. The main disadvantage of these systems is that it
doesn't provide an estimate of the predicted rainfall. The system calculates average of
values and understand the state of atmosphere, which does not yield estimate results (S.
Prabakaran et al., 2017).
In this study, a mathematical method called Linear Regression to predict the
rainfall in the ports of Metro Manila that serve as the unloading area of mostly imported
goods. The Linear Regression method is modified in order to obtain the most optimum
error percentage by iterating and adding some percentage of error to the input values.
This method provides an estimate of rainfall using different atmospheric parameters like
average temperature and relative humidity to predict the rainfall. The linear regression is
applied on the set of data and the coefficients are used to predict the rainfall based on the
corresponding values of the parameters. The main advantage of this model is that this
model estimates the rainfall based on the previous correlation between the different
atmospheric parameters. Thus, an estimate value of what the rainfall could be at a given
time period.
In prediction or forecasting study, the rule of thumb is that you should have at
least 50 but preferably more than 100 observations (Box and Tiao 1975). In this context,
monthly data of rainfall in port area from 2000 to 2017 will be utilized to form at least
120 data or observations to predict the future rainfall condition of the area.
The objectives of this study is to:
(1) Postulate an empirical model that estimates the changes in the
rainfall volume of the ports area
(2) Provide a one-year forecast of rainfall volume.
(3) Strengthen the evidence in correlation of other atmospheric
condition such as relative humidity, temperature and wind speed to rainfall.
In line with maritime sector, the result of this study can be used to provide the maritime
organization with the best possible information about the future, anticipate the possible
challenges and guide them in their decision making and policy creation for the improvement of
administrative processes and facilitation of effective information dissemination with regards to
loading or unloading of products from importation or exportation.
For many years the effect of global warming on food security, rainfall and temperature
patterns has received great attention from policy makers and academics.
BACKGROUND
Linear Regression
It is a method used for defining the relation between a dependent variable (Y) and one or
more independent variables or explanatory variables, denoted by (X). For multiple explanatory
variable, the process is defined as Multiple Linear Regression(MLR)
The general equation for linear regression is given as
i=1,…n,
Where y denotes the dependent variable(rainfall) where i=1,2 …n denotes the
explanatory or independent variable and is called the intercept.
The general linear regression equation used in this system is given as
Rainfall=(AvgTemp* )+
where represents the different coefficients.
The data necessary for the system to predict rainfall are prev
METHODOLOGY
The historical data in the PORT AREA for period years 2000 to 2018 was collected from
PAG-ASA. The mean annual rainfall values and use modified linear regression to perform the
prediction of rainfall in our system. The process of this method is explained in this following
steps.
1. The input data set are examined. The input data of training set is obtained from 2000-
2017 for each year to perform
2. The set data contains the average rainfall cover from 2000-2018 from the input data
sets
3. The linear regression is applied on the data sets and the rainfall is forecasted using the
rainfall.
4. The error percentage is calculated by subtracting the predicted value from the actual
value and multiplying it with100 to get the percentage.
Year Average Rainfall Average Temperature Average Wind Speed
2000 9.421038251 36.7 2.87431694
2001 5.019726027 36.5 3.04109589
2002 7.23890411 35.4 2.879452055
2003 28.29452055 39.56 3.235616438
2004 4.22704918 37.8 4.22704918
2005 4.358356164 38.6 4.358356164
2006 5.740273973 39.5 3.252054795
2007 5.417808219 36.5 2.964383562
2008 6.01010929 36.4 2.669398907
2009 8.071780822 34.5 2.868493151
2010 5.375890411 37.5 2.978082192
2011 9.330642955 37.8 3.035616438
2012 9.760655738 39.6 2.786885246
2013 8.424109589 37.6 2.747945205
2014 5.346575342 40.1 2.871232877
2015 5.845348837 36.23 2.816438356
2016 6.011747851 37.5 3
2017 6.726069364 39 2.759562842
GET DATA /TYPE=XLSX
/FILE='F:\AVERAGE.xlsx'
/SHEET=name 'Sheet1'
/CELLRANGE=full
/READNAMES=on
/ASSUMEDSTRWIDTH=32767.
EXECUTE.
DATASET NAME DataSet1 WINDOW=FRONT.
CORRELATIONS
/VARIABLES=AverageRainfall AverageTemperature
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
Correlations
Notes
Output Created 12-APR-2019 13:21:09
Comments
Input Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data
18
File
Missing Value Handling Definition of Missing User-defined missing values are treated
as missing.
Cases Used Statistics for each pair of variables are
based on all the cases with valid data
for that pair.
Syntax CORRELATIONS
/VARIABLES=AverageRainfall
AverageTemperature
/PRINT=TWOTAIL NOSIG
/MISSING=PAIRWISE.
Resources Processor Time 00:00:00.06
Elapsed Time 00:00:00.13
[DataSet1]
Correlations
Average
Average Rainfall Temperature
Average Rainfall Pearson Correlation 1 .268
Sig. (2-tailed) .282
N 18 18
Average Temperature Pearson Correlation .268 1
Sig. (2-tailed) .282
N 18 18
REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT AverageRainfall
/METHOD=ENTER AverageTemperature.
Regression
Notes
Output Created 12-APR-2019 13:21:51
Comments
Input Active Dataset DataSet1
Filter <none>
Weight <none>
Split File <none>
N of Rows in Working Data
18
File
Missing Value Handling Definition of Missing User-defined missing values are treated
as missing.
Cases Used Statistics are based on cases with no
missing values for any variable used.
Syntax REGRESSION
/MISSING LISTWISE
/STATISTICS COEFF OUTS R
ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT AverageRainfall
/METHOD=ENTER
AverageTemperature.
Resources Processor Time 00:00:00.02
Elapsed Time 00:00:00.14
Memory Required 2480 bytes
Additional Memory Required
0 bytes
for Residual Plots
Variables Entered/Removeda
Variables Variables
Model Entered Removed Method
1 Average
. Enter
Temperatureb
a. Dependent Variable: Average Rainfall
b. All requested variables entered.
Model Summary
Adjusted R Std. Error of the
Model R R Square Square Estimate
1 5.35933375494
.268a .072 .014
2995
a. Predictors: (Constant), Average Temperature
ANOVAa
Model Sum of Squares df Mean Square F Sig.
1 Regression 35.586 1 35.586 1.239 .282b
Residual 459.559 16 28.722
Total 495.145 17
a. Dependent Variable: Average Rainfall
b. Predictors: (Constant), Average Temperature
Coefficientsa
Standardized
Unstandardized Coefficients Coefficients
Model B Std. Error Beta T Sig.
1 (Constant) -26.897 31.209 -.862 .402
Average Temperature .923 .829 .268 1.113 .282
a. Dependent Variable: Average Rainfall
Linear regression equation or prediction equation annually
y= 0.923x+(-26.897), if the temperature from the given year is 41.6 then the rainfall is 11.50, the
higher temperature is the heavier rainfall.
CONCLUSION
Rainfall is the major cause for many of the natural disaster like flashflood, drought,
tsunamis. So in order to prevent these natural calamities, we should be able to predict the cause
of the source. The proposed system can be used to estimate the rainfall over required period so
that the respective authorities can take precautions to prevent the loss of life and property. This
data is used to perform the necessary calculation to predict the rainfall from the average
temperature, and wind speed.