International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 6 June 2021, pp: 1123-1127 www.ijaem.net ISSN: 2395-5252
Airfare Price Prediction using various Machine
Learning Models
Ankita Harkude#1, Sarika Namade#2, Saloni Gholapl#3
1,2,3#Department of Information Technology and Computer Science and Technology,
Usha Mittal Institute of Technology, SNDT Women's University,
Juhu-Tara Road, Sir Vitthaldas Vidyavihar, Santacruz(W), Mumbai 400049
---------------------------------------------------------------------------------------------------------------------------------------
Submitted: 01-06-2021 Revised: 14-06-2021 Accepted: 16-06-2021
---------------------------------------------------------------------------------------------------------------------------------------
ABSTRACT—Nowadays air travelling is getting Square Regression(PLSR) for developing a model
popular in our country and hence people are for predicting the best booking time for airline
seeking to get the lowest price possible. While the tickets. The data collected was from travel journey
airlines are using various strategies and methods to websites from 22 February 2011 to 23 June 2011.
predict flight prices in a smart fashion. They keep Further to this, additional data were also collected
their profit maximized and keep the revenue high. and used to check the comparisons of the
Due to this it becomes difficult for the customer to performances of the final model.
buy a ticket at the minimum cost as the price Janssen [1] built up an expectation model
changes dynamically. This paper aims at utilizing the Linear Quantile Blended Regression
implementing the machine learning regression strategy for San Francisco to New York course
methods to predict the prices at the desired time so with existing every day airfares given by
the customer can decide a proper airline according www.infare.com. The model has utilized two
to their budget. highlights which includes the number of days left
Index Terms—Machine Learning Algorithm, until the departure date and whether the flight date
Predictor, Airfare, Random forest, Naive Bayes. is at the end of the week or weekday. The model
predicts airfare very well before the days that are a
I. INTRODUCTION long way from the departure date, anyway for a
Any person who has booked a flight ticket considerable length of time close to the takeoff
very well knows how the price changes date, the expectation isn’t compelling.
dynamically with the time, season, holidays, etc. Wohlfarth [3] proposed a airline ticket
That is why the customers try to book the tickets buying time enhancement model that is dependent
prior to the departure date to avoid the high prices on an extraordinary pre preparing step which is
at peak time. This is the reason why many known as information mining systems
techniques are being made using AI and Machine (arrangement and bunching) and macked point
learning models to predict the price in the given processors and measurable investigation strategy.
time so the customer has a clear Idea beforehand. This system was mainly proposed to change the
In this paper, we have tried to predict the price of heterogeneous value arrangement information into
the flight of several airlines from the year 2018 to added value arrangement direction that can be
2020 and of several cities by making a flask web bolstered to unsupervised grouping calculation.
app. The value direction is collected or grouped into
gathering dependent on comparative estimating
II. LITERATURE SURVEY conduct. Advancement model gauge the value
Nowadays, buying a ticket at minimum chain designs. It is a tree based order calculation
price is not easy for the customer. To give the price used to choose the best coordinating group and
of an air ticket many techniques are used. Machine then later on comparing the advancement model.
learning models and Artificial intelligence are the Before buying a flight ticket this model gives the
most useful techniques. Using 75.3 percent most extreme number of days.
precision the PLSR(Partial Least Square The ideal buying time dependent on
Regression) model is connected to get least cost of nonparametric isotonic relapse methods for
aircraft ticket buying to acquire greatest particular carriers,timeframe and course is
presentation from Utilizing AI models. William recommended by DominguezMenchero [5]. For the
Groves and Maria Gini[2] took the Partial Least expectation two sorts of the variables are
DOI: 10.35629/5252-030611231127 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1123
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 6 June 2021, pp: 1123-1127 www.ijaem.net ISSN: 2395-5252
considered. One is the date of procurement and • Time of Arrival
another is the passage. This model is very helpful • Place of Destination/Arrival
to buy flight tickets. Before the buying ticket of • Airway company
any flight this model gives the customer the most • Total Stops
correct price. • Price
III. DATA COLLECTION B. Cleaning and preparing data
We have tried to make a web spider that Data needs to be cleaned and prepared
extracts data from a website and stores it in a csv according to the model requirements. After this is
file. Different sources from API’s to customer done it is further analysed and distribution is
travel sites are accessible for information performed. The unnecessary data is removed like
scratching. duplicates and null values. Several statistical
methods are used to prepare and clean the data. For
A. Data Collection example, departure time and arrival time is split
Data collection is performed using two into hour and minute and converted into integer.
data sets i.e. Data Train and Test Data. To train the
data combination of categorical and numerical data C. Analyzing data
is used. The test data is also the same as a data train After data cleaning and preparation
except the price column. In data collection, feature machine learning mod-els are applied and some
engineering is the most important step as it features are calculated on the basis of existing
includes cleaning the unwanted data, creating features. These include Random Forest, Logistic
specific columns. So only the following features Regression, Gradient Boosting and combination of
are considered as all are not required- these mod-els to increase the accuracy. Also, time
• Date of journey plays an important role. So, the time can be divided
• Time of Departure up as: morning, evening and night.
• Place of Departure
Fig.1.Dataset
IV. MACHINE LEARNING continuous variables. Predictor variable is the one
PERFORMANCE MODELS of the two variables to which value we have to find.
Now,we have to predict the prices of the Linear regression does not give a deterministic
flight ticket for that we introduced many algorithms relation but gives a statistical relationship between
in machine learning which are: Support Vector two variables. It gives prediction error which is
Machine (SVM) , Linear regression, K Nearest minimum from best fit line of given data.To
neighbours , Decision understand the linear regression two major factors
tree,MultilayerPerceptron,Gradient Boosting and used which is Gradient descent and cost function.
Random Forest Algorithm. For implementing these Equation for linear regression is : y(pred) = b0+b1
models we use python library scikit learn. To verify x (1) We have to choose the value of b0 and b1 so
the performance of these models different the error value became as small as possible. It
parameters are considered which are R-square, shows the difference between actual and square of
MAE and MSE. predicted value. Mean square error(MSE) is used to
deal with negative values. Here b1 means bias and
A. Linear Regression b0 is used for giving the positive or negative
Simple linear regression analysis is used relationship between x and y. R-squared, MAE and
to determine the correlation between two
DOI: 10.35629/5252-030611231127 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1124
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 6 June 2021, pp: 1123-1127 www.ijaem.net ISSN: 2395-5252
MSE these terms are used to measure the accuracy model. In order to obtain the highly uncorrelated
of regression problems. decision trees the features are sampled and then
passed to trees without any replacement. It required
B. Decision Tree less correlation between trees to select the best
To make comparative same time persistent split. Aggregated uncorrelated trees are the main
this tree count small subsets from isolating the concept which makes it different from decision
collected information. The final result shows that trees. It maintains accuracy and handles missing
tree with decision centres like the leaf centres.It values for the missing data. It is basically a bagging
contains two branches at any rate. We have to think technique. Hyperparameters as a decision tree or
about the root as an informational index. Then we bagging classifier are nearly the same as the
have to discretize the model before structuring it. random forest. Over fitting is an error which occurs
For the decision of tree computation information when a function is closely fit with a limited set of
Gain and Gini index is essential. And it is defined data points that is the reason why Random Forests
as a change amount in entropy. Basic squared do not over-fit.
conditions for regression tree: Y is predicted value
and having maximum number of expected value. V. EXPERIMENTAL RESULTS
The training example on leaf nodes assigned to Output of the model is plotted for the
stop slow and overfitting the model. selected test dataset across the test dataset.
Comparative study of original values and predicted
C. K-Nearest Neighbours(KNN) results are shown by Graphs. The predicted values
K-nearest neighbour algorithm is the type of the fare to purchase the fight ticket at the right
of supervised ML classification algorithm that can time given by the analysis of results which is
also be used as a regression. The k-nearest obtained from the algorithm such as Decision Tree,
algorithm is one of the most used ML algorithms Random Forest, KNN, Linear Regression. Below
due to its simplicity. In k-nearest neighbour table gives values of R-square, MAE, MSE. The
regression analysis, the output is mean of its k given graph is plotted between the fare of the flight
nearest neighbours. Like SVM this is also a non- versus the days left until departure. The red color
parametric method. Considering few values, results line shows the predicted value of flight tickets
are computed to achieve the best value. It assigns a whereas the blue color line denotes the actual value
new data point to the class. It is non-parametric of the flight ticket. Fig.3 shows a plot between
because it does not take any assumption. KNN Days remaining for the departure versus Actual and
keeps all training data since they are needed during predicted values evaluated by Random Forest
the testing phase. K- entries in the data set are Algorithm. Compared to other algorithms for a
picked by the model that are close to the new data given dataset, the Decision Tree algorithm has
point. more accuracy. In the regression analysis it gives
D. Random Forest the highest R-square value with maximum
Forest creates large models from accuracy.
aggregating the base model. To produce better
predictive models it ensembles the less predictive
A. Algorithm Evaluation
ML R-squared MAE MSE
Algorithms
Decision Tree 0.67 0.13 0.21
Random 0.68 0.13 0.21
Forest
K-NN 0.65 0.13 0.22
Linear 0.40 0.19 0.29
Regression
DOI: 10.35629/5252-030611231127 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1125
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 6 June 2021, pp: 1123-1127 www.ijaem.net ISSN: 2395-5252
Fig. 2. Graphical result for random forest
Fig. 3. Graphical result for K-Nearest Neighbour
VI. ACKNOWLEDGMENT BIBLIOGRAPHY
Foremost, We would like to express our [1]. T. Janssen, —A linear quantile mixed
sincere gratitude to Professor Dr. Sanjay Shitole regression model for prediction of airline
and Dr.Sanjay Pawar Head of Department ticket prices,|| Bachelor Thesis, Radboud
Information Technology and Computer Science University, 2014
and Technology respectively and our guide, [2]. William Groves and Maria Gini Department
Dr.Sanjay Pawar Sir for his valuable guidance of Computer Science and Engineering
during the Major project phase. My sincere University of Minnesota, USA
gratitude to Dr. Sanjay Pawar, Principal (Usha groves,
[email protected]Mittal Institute of Technology)for his valuable [3]. Wohlfarth, T. Clemencon, S.Roueff, —A
encouragement. We would like to give special Data mining approach to travel price
thanks to our parents and friends for their valuable forecasting||, 10 th international conference
time and support. on machine learning Honolulu 2011
[4]. Viet Hoang Vu, Quang Tran Minh and Phu
VII. CONCLUSION AND FUTURE H. Phung,”An Airfare Prediction Model for
SCOPE Developing Markets '', IEEE paper 2018
We gathered airfare data from the web and [5]. Dominguez-Menchero, J.Santo, Riviera,
showed that it is feasible to predict prices for ||optimal purchase timing in airline markets||,
flights based on historical fare data. The 2014
experimental results show that ML models are a [6]. P. Malighetti, S. Paleari and R. Redondi,
satisfactory tool for predicting airfare prices. Other “Pricing strategies of low-cost airlines: The
important factors in airfare prediction are the data Ryanair case study,” Journal of Air
collection and feature selection from which we Transport Management, vol. 15, no. 4, pp.
drew some useful conclusions. 195-203, 2009
[7]. Supriya Rajankar and Neha Sakharkar, —A
Survey on Flight Pricing Prediction using
MachineLearning International Journal of
DOI: 10.35629/5252-030611231127 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1126
International Journal of Advances in Engineering and Management (IJAEM)
Volume 3, Issue 6 June 2021, pp: 1123-1127 www.ijaem.net ISSN: 2395-5252
Engineering Research and Technology, vol
8, issue 6, June 2019
[8]. M. Papadakis, “Predicting Airfare Prices,”
2014
[9]. R. Ren, Y. Yang and S. Yuan, “Prediction of
airline ticket price,” Technical Report,
Stanford University, 2015
[10]. Qiqi Ren,”When to book:Predicting Flight
Pricing”, Stanford University
DOI: 10.35629/5252-030611231127 Impact Factor value 7.429 | ISO 9001: 2008 Certified Journal Page 1127