2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)
Air Quality Prediction Of Data Log
By Machine Learning
Venkat Rao Pasupuleti1, Uhasri 2 , Pavan Kalyan3 , Srikanth4 , Hari Kiran Reddy5
1
Assistant Professor, Department of ECE, Lakireddy Bali Reddy College of Engineering, Mylavaram, 521230, India.
2, 3, 4,5
B.Tech Students, Department of ECE, Lakireddy Bali Reddy College of Engineering, Mylavaram, 521230, India.
[email protected] Abstract— The air quality monitoring system measures affects the ecological balance but also the health of humans.
various air pollutants in various locations to maintain good air As the levels of gases increases in the air, those gases show a
quality. It is the burning issue in the present scenario. Air is major impact on the human body and lead to hazardous
contaminated by the arrival of dangerous gases into the climate effects. Air pollution also affects the seasonal rainfall too due
from the industries, vehicular emissions, etc... Nowadays, air to an increase of pollutants in the air. The rainfall is also
pollution has reached critical levels and the air pollution level in affected. Hence, continuous monitoring of the air is necessary.
many major cities has crossed the air quality index value as set
by the government. It has a major impact on the health of the The major cases of air pollution are Ozone
human. With the advancement in technology of machine (O3),Nitrogendioxide(NO2),CarbonMonoxide(CO),Sulphurdi
learning, it is now possible to predict the pollutants based on the oxide(SO2),Particularmatter(PM). These gases are cannot
past data. In this paper we are introducing a device that can been seen or noticed which are produced from burning of
continue that can take present pollutants and with the help of fossil fuels, wood burnings, industrial boilers and from the
past pollutants, we are running an algorithm based on the explosion of volcano. They may cause the affects in humans
machine learning to predict the future data of pollutants. The and are the main reason for causing cancer, birth defects and
sensed data is saved inside the Excel sheet for further evaluation. breathing-related problems.
These sensors are used on the Arduino Uno platform to collect
the pollutant data. Air Quality Index- Nowadays pollution levels are
increasing due to the PM2.5 gases which affect the heart
Keywords—Machine learning; Internet of Things; AQI; Air functionalities, lung cancer and other respiratory and breathing
poluutio problems. The long term damage to the liver, kidney, brain,
nerve and other organs in the human body system is affected
I. INTRODUCTION by air pollution. The AQI is a linear feature of the pollutant
concentration. The boundaries between AQI there is
Air pollution monitoring has gained attention these days as
discontinuous jump between AQI categories unit to other. To
it has a major impact on the health of humans as well as on the
calculate the AQI from the concentration the below equation is
ecological balance. Besides due to the effects of toxic
used.
emissions on the environment, health, work productivity and
efficiency of energy are also affected by the air pollution.
Since air pollution has caused many hazardous effects on
humans it should be monitored continuously so that it can be
controlled effectively. One of the ways to control air pollution
is to know its source, intensity and its origin. Usually, it is
monitored by the respective state government’s environment
ministry. They keep the cord of the pollutant gases in the
respective areas. The data presented by the WHO is warning
about the pollutions levels in the country. It tells us it’s high
time that we should monitor the air.
Air tracking manner to measure ambient ranges of air
pollutants inside the air. Monitoring has become a major job
as air pollution has been increasing day by day. Continuous
monitoring of air pollution at a place gives us the levels of
pollution in that area. From the information obtained by the
device gives us information about the source and intensity of
the pollutants in that area. Using that information we can take
measures or make efforts to reduce the pollution level so that
we can breathe in a good quality of air. Air pollution not only
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 1395
Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on February 20,2025 at 04:08:53 UTC from IEEE Xplore. Restrictions apply.
2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)
Environmental protection agency breakpoint table[1] Pressure, PM2.5, Concentration (ug/m^3).
A. Gaps Identified in the Literature
In those papers, they only implemented the prediction of
PM2.5. In this project they want to implement prediction of all
the pollutants [CO, O3, NO2, SO2, PM2.5, PM10] with the
help of meteorological data for better prediction.
III. EXPERIMENTAL ARRANGEMENT OF THE
EXPECTED SYSTEM
A. Flow Chart
The proposed technique is represented within the under
II. LITERATURE SURVEY block diagram as proven in Fig. 1.
A smart air quality monitoring system is proposed in [2]
which senses the pollutant gases at a particular area and
upload the data into the server so that everybody can be aware
of the air pollution. The collected data uploaded to the server
at any point in the respective website. This method detects
motors causing pollutants and measures various sorts of
pollutants, and its level in air. The measured information is
shared to car owner and authorities of the site visitors control
to keep pollutants created through the Air quality monitoring
with event based sensing is presented in [3]. In this work,
authors proved that the technique saves 50% of the sensor
energy consumption compared to traditional periodic sensing
methods based on the case study in the city of Spain.
The air quality values are predicted using three
binary machine learning algorithms are presented in [4]. In
this error analysis is done with GLM, SVM and Bayes
methods. The accuracy of the simple machine learning Fig. 1. Flow chart for the propsed approach
methods are compared in [5] and variation in the accuracy is
presented with different sizes and data divisions. The data set Central pollution control board built many pollution
of air quality consists of pollutant data of CO, O3, NO2, SO2, monitoring stations in heavily polluted areas, we collect the
PM10, and PM2.5. For the better air quality prediction, we data from those monitoring stations.
must co-relate the pollutant data with meteorological data
[Temperature, Wind Speed, Humidity, Wind direction]. B. Implementaion of software
Neural network method provides better accuracy compared to
In Software specifications, used IDE is Anaconda python,
others. The air pollution prediction is implemented in
Operating systems must be Windows 7/10 and we used the
[6]based on different norm regularization and optimization
Coding language as Python.
algorithms as a machine learning tools.
For pollution estimation or prediction, linear regression
algorithms are suitable and for forecasting the pollution levels
neural network methods and SVM based methods are
preferred[7].
The air quality index is predicted by using machine
learning algorithms for the detection of PM2.5 level using
logistic regression [8].There are applications that show the
constant PM2.5 levels, while some show the forecast of a
specific day. This framework abuses AI models to recognize
and forecast PM2.5 levels dependent on an informational
collection consisting of meteorological conditions in a
particular city. The data set [9] used in detection of PM2.5
level consists of Temperature, Wind speed, Dew point,
Identify applicable sponsor/s here. If no sponsors, delete this text box
(sponsors).
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 1396
Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on February 20,2025 at 04:08:53 UTC from IEEE Xplore. Restrictions apply.
2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)
Fig. 2. Implementation flow of the model Natural forest is a method of bagging, and not of boosting.
The trees are running in parallel in random woods. There is no
contact among those trees while the trees are being installed.
IV. METHODOLOGY
It performs by constructing a multitude of decision trees
A. Data Set during training time and outputting the class which is the
The Pollutant data: particular trees ' class mode (classification) or average
prediction(regression).
Data information which is used to train the system to
detect the air quality was obtained. The data set was to have A random forest[12 ] is a metaestimator (i.e., it combines
attributes like CO, SO2, and O3. the outcome of many predictions) that aggregates many
decision trees, with some useful improvements.The number of
Meteorological data: functions at each node that can be split on is limited to a
The meteorological data information set parameters which certain percentagof the total (known as the hyper parameter).
is used to train the system are Temperature, Wind Speed, Every tree takes a random sample from when, the original
Humidity and Wind Direction. data set it generates its splits, adding another element of
randomness which prevents over fitting.
B. USING MACHINE LEARNING MODELS
Linear regression: V. RESULTS AND DISCUISION
Linear Regression [10] is nothing but an algorithm based We implemented the different machine learning algorithms
on the machine learning are depends on supervised learning in Python using Jupyter notebook. The following plot shows
which performs a regression task. Depending on independent that all the features that are considered for the prediction are
variables linear regression gives a target prediction value correlated and thus can be considered to train the model.
which is most likely used for finding the relationship among
variables and forecasting. Depending on the connection
among the established and the independent variables, different
regression models differ, they are being considered and List of
independent variables used.
y = mx+c
In the above expression y indicates labels to data and x
indicates the input training data (input parameter).
Value of x is used to predict the value of y which gives
best fit line for finding the best m and c values during training
the model. Fig. 3. SO2 prediction probability
c = intercept For the SO2 prediction, we get prediction accuracy as
m = slope of line follows
When we get the best m and c esteems, we get the best fit Type of Algorithm Prediction Probability of SO2
line. So when we are at long last utilizing our model for Linear Regression 0.125
expectation, it will foresee the estimation of y for the Decision Tree 0.8060
information estimation of x. Random Forest regression 0.856
Decision Tree:
The Regression on the Decision Tree [11] is both a non-
linear and non-continuous construct. It represents a function
that takes an attribute values vector as input, and returns a
decision.
Decision tree falls within the Supervised Learning group.
It can be used to solve regression as well as classification
problems. By conducting a series of operations a decision tree
makes a decision.
Random forest:
Fig. 4. CO prediction probability
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 1397
Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on February 20,2025 at 04:08:53 UTC from IEEE Xplore. Restrictions apply.
2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)
For the CO prediction, we get prediction accuracy as Fig. 7. PM2.5 prediction probability
follows
For the PM2.5 prediction, we get mean square Coefficient as
Type of Algorithm Prediction Probability of CO follows
Linear Regression 0.02 Type of Algorithm Prediction Probability of PM2.5
Decision Tree 0.61 Linear Regression 0.02
Random Forest regression 0.79 Decision Tree 0.75
Random Forestregression 0.86
Type of Algorithm Prediction Probability of PM210
Linear Regression 0.02
Decision Tree 0.61
Random Forestregression 0.79
The curve fitting for the above algorithms are shown in the
below curves
Fig. 5. O3 prediction probability
For the O3 prediction, we get prediction accuracy as follow
Type of Algorithm Prediction Probability of O3
Linear Regression 0.09
Decision Tree 0.62
Random Forest regression 0.79
Fig. 8. Linear regression fitted curve for CO.
Fig. 6. NO2 prediction probability
For the NO2 prediction, we get prediction accuracy as follows
Type of Algorithm Prediction Probability of NO2
Linear Regression 0.1
Decision Tree 0.64
Random Forest regression 0.701
Fig. 9. Random forest fitted curve for CO.
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 1398
Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on February 20,2025 at 04:08:53 UTC from IEEE Xplore. Restrictions apply.
2020 6th International Conference on Advanced Computing & Communication Systems (ICACCS)
Fig. 10. Decision tree fitted curve for CO.
VI. CONCLUSION
We predict the air quality index by using different machine
learning algorithms like linear regression, Decision Tree and
Random Forest. From the results, we concluded that the
Random Forest algorithm gives better prediction of air quality
index.
References
[1] https://en.wikipedia.org/wiki/Air_quality_index
[2] Kennedy Okokpujie, Etinosa Noma-Osaghae, Odusami Modupe, Samuel
John, and Oluga Oluwatosin, “A SMART AIR POLLUTION
MONITORING SYSTEM,” International Journal of Civil Engineering
and Technology (IJCIET), vol. 9, no. 9, pp. 799–809, Sep. 2018.
[3] C. Santos, J. A. Jiménez, and F. Espinosa, “Effect of Event-Based
Sensing on IOT Node Power Efficiency. Case Study: Air Quality
Monitoring in Smart Cities,” IEEE Access, vol. 7, pp. 132577–132586,
2019.
[4] D. Wei, “Predicting air pollution level in a specific city,” 2014.
[5] Kostandina Veljanovska and Angel Dimoski, “Air Quality Index
Prediction Using Simple Machine Learning Algorithms,” International
Journal of Emerging Trends & Technology in Computer Science, vol. 7,
no. 1, 2018.
[6] D. Zhu, C. Cai, T. Yang, and X. Zhou, “A Machine Learning Approach
for Air Quality Prediction: Model Regularization and Optimization,” Big
Data and Cognitive Computing, vol. 2, no. 1, p. 5, Mar. 2018.
[7] A. Masih, “Machine learning algorithms in air quality modeling,”
Global Journal of Environmental Science and Management, vol. 5, no. 4,
pp. 515–534, 2019.
[8] Aditya C R, Chandana R Deshmukh, Nayana D K, Praveen Gandhi
Vidyavastu;Detection and Prediction of Air Pollution using Machine
Learning Models(IJETT)
[9] https://archive.ics.uci.edu/ml/datasets/Air+quality
[10] David A. Freedman (2009). Statistical Models: Theory and Practice.
Cambridge University Press. p. 26. A simple regression equation has on
the right hand side an intercept and an explanatory variable with a slope
coefficient.
[11] Rokach, Lior; Maimon, O. (2008). Data mining with decision trees:
theory and applications. World Scientific Pub Co Inc. ISBN 978-
9812771711.
[12] BreimanL (2001)."RandomForests". MachineLearning. 45 (1):32.
doi:10.1023/A:1010933404324
978-1-7281-5197-7/20/$31.00 ©2020 IEEE 1399
Authorized licensed use limited to: Birla Institute of Technology & Science. Downloaded on February 20,2025 at 04:08:53 UTC from IEEE Xplore. Restrictions apply.