See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/352780787
A Machine Learning-Based Solution for Predicting Land Values E-Valuer Land
value predictor
Conference Paper · June 2021
CITATIONS READS
0 482
6 authors, including:
Thenuka Dharmaseelan
Robert Gordon University
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
https://github.com/cdap2019/evaluer View project
All content following this page was uploaded by Thenuka Dharmaseelan on 27 June 2021.
The user has requested enhancement of the downloaded file.
A Machine Learning-Based Solution for Predicting
Land Values
E-Valuer Land value predictor
Bimali Y.M.Y. Rodrigo U.S.D. Thenuka Dharmaseelan
Department of Software Engineering Department of Software Engineering Department of Information Technology,
Faculty of Computing, Sri Lanka Institute Faculty of Computing, Sri Lanka Institute Faculty of Computing, Sri Lanka Institute
of Information Technology (SLIIT) of Information Technology (SLIIT) of Information Technology (SLIIT)
Malabe, Sri Lanka. Malabe, Sri Lanka. Malabe, Sri Lanka.
yashinkabimali@gmail. com
[email protected] [email protected] Thayalini K. Gamage M.P.A.W. Rathnayaka P.B.
Department of Information Technology Department of Information Technology Department of Information Technology,
Faculty of Computing, Sri Lanka Institute Faculty of Computing, Sri Lanka Institute Faculty of Computing, Sri Lanka Institute
of Information Technology (SLIIT) of Information Technology (SLIIT) of Information Technology (SLIIT)
Malabe, Sri Lanka. Malabe, Sri Lanka. Malabe, Sri Lanka.
[email protected] [email protected] [email protected] Abstract - Real Properties are the most valuable possession which must be taken into consideration very carefully in a
of most of the common people. Getting the proper valuation for land valuation procedure.[1] These values can be affected by
these real properties is very much important. This document various social factors too. For example, if there is a crime
analyses an innovative solution proposed to facilitate land happened in that land, it can cause a negative effect on the
valuation based on recent sales, prediction of future price, and
the effect of proposed development work on the land, so that
value.
real-estate customers and owners of real estate companies can Hence, real estate appraisal is a challenging multidimensional
be benefitted and make smarter property related decisions. This problem that involves estimating many facets of a property,
intelligent tool can help people to identify the land they are going
its neighborhood, and its city.[2]
to buy in terms of current value and future value. Machine
learning and optimization are the main research components of Since, Sri Lanka is lacking a good data platform to gather
this system. The system utilizes the LSTM model as well as KNN all these data, considering all these factors can take ages to do
and MLR model in making predictions. LSTM model can make proper valuation considering all these factors. The manual
predictions with an accuracy of over 0.75 in current value
process is a time-consuming slow task that needs to be done
prediction and also future value predictions with reasonable
accuracy. This paper discusses the research methodology we by an experienced professional valuer. The valuation
have used in identifying the most suitable algorithms which can approaches used by those professionals are limited due to the
serve our intended purpose. lack of digital data in Sri Lanka. Also, it is a known fact that
the valuation process can be so subjective to the person.
Keywords- Valuation, AI- Artificial Intelligence, ML- Machine
Ideally, the systematic process of valuation consists of four
learning, ANN- Artificial Neural Network, LSTM- Long Short Term
Memory, RNN- Recurrent Neural network, MLR- Multivariate
different stages as physical and legal identification,
Regression, ARIMA- Auto Regressive Integrated Moving Average, identification of property rights to be valued, gathering and
MAE- Mean Absolute Error, MSE- Mean Squared Error, RMSE- analysis of market data, applying a convenient valuation
Root Mean Squared Error approach. The major convenient valuation approaches are
Sales Comparison Approach, Income Approach, Cost
I. INTRODUCTION Approach [3]. Analyzing the previous land sale details and
trends in those fluctuations and considering those data to
Real Properties are the most valuable possession of most of predict the valuation is called the sales comparison
the common people. In Sri Lankan culture, most of the people approach.[3]
tend to think that owning real estate is a better investment than
having that money saved in a bank. Therefore, getting the The task of automatically estimate the market value of houses
proper valuation for these real properties is very much can be seen as a regression problem, where the price (or the
important. price per square meter) is the dependent variable, while the
independent one is the available information that could help
Land valuation is the process of assessing the to determine the price correctly. [2] When the neighborhood
characteristics of a given piece of land based on experience economical value is combined with the effect of
and judgment.[1] The determination of a land parcel value neighborhood factors such as walkability etc. we believe it is
depends on several physical and economic characteristics
possible to give an accurate, fair prediction of the value of the recurrent neural network approach in predicting daily stock
land. prices an application to the Sri Lankan stock market[7], and
Comparison of Support Vector Regression and Artificial
The influence of technology on the daily life of the Sri
Neural Network Models to Forecast daily Colombo Stock
Lankans has increased immensely. People tend to use traffic
Exchange[8]. According Li et al, [1]to the real estate
data, online shopping more than ever.
valuation researches evaluating the use of GIS technology has
Since the manual process is too slow and dependent to been conducted. But there is no information regarding the
make a quick better decision of the worthiness of the land and application of AI technology in real estate value prediction in
suitability of it for the purpose of the customer, our attempt is the Sri Lankan context.
to digitally assist the people in property related decision
The use of AI for residential value forecasting has been
making by providing them accurate predictions of the values
suggested in the literature from the 1990s. [9]. Although Sri
and future studies of the land. The main research problem is
Lanka is lacking an automated land valuation system, many
to develop an automated system to evaluate the land based on
up and running, reliable solutions have been implemented in
its neighborhood economical value and identify the possible
developed countries like New Zealand, England, and Wales,
effects of development work on the value of the land in the
the USA, etc. It is obvious with the well-structured digital
future. This requirement of a solution to predict the current
data infrastructure of those countries, they can implement
value and future value came from an expertise. While
very accurate systems.
reviewing the literature, by means of supervisor meetings, we
identified another aspect as an improvement, which is to Zillow is an online real estate database company that was
predict the effect of future development work on a particular founded in 2006, and was created by Rich Barton and Lloyd
land, since Sri Lanka is a developing country, although the Frink, former Microsoft executives and founders of Microsoft
rate of development may vary, infrastructure development spin-off Expedia. [10] Zillow.com supports United States of
projects are carried out frequently. America (USA) and Canadian property listings. Zestimate
determines an estimation for 12 months for a house based on
We can never underestimate the duty of a valuation officer
neighbourhood comparable houses. Accuracy of zestimate
as the estimations are affected by numerous factors of
depends on the amount of data used as the underlying
particular to the area. But these factors are subjected to the
approach is Hedonic regression analysis based proprietary
perception of each other’s experience, according to Vaz J.[4],
algorithm [11] which analyses of several features of the
the discretionary and the appraisers’ subjectivity that
house. The forecasted value is interpolated using a cubic
characterizes traditional real estate valuation are still allowed
spline to connect to the current value. [11]
to take part in the formation of the asset price even when
respecting international standards (EVS, IVS) or Appraisal Trulia is also a product offered in the USA, which offers
Institution´s regulations (TEGOVA, RICS, etc.). For a range of services for the real estate sector. The price
example, an experienced valuer who is familiar with the area estimates are based on publicly available information on the
may be biased towards the effect of regional factors, social home’s physical characteristics (e.g. location, number of
factors, then the physical factors compared to a fairly new bedrooms, etc.), Property tax information, Recent sales of
valuer who still sticks to the land valuation theories and similar nearby homes.
follow the proven procedure. Therefore, manual valuation can
It involves more community interaction, for example,
be considered as a more sensitive approach.
Trulia Neighbourhoods provide photographs, drone footage,
Our intention is to provide people with a fair accurate etc. so those who are interested in the neighborhood can refer.
prediction of the land they are going to buy so that they can Trulia provides price using public data which shows the price
decide the investment is fruitful for them. We believe this is fluctuation of a house, compared to the other homes with the
an area improvement is needed because we can assist people same ZIP code.
in making decisions related to property, which would be the
Quotable Value (QV) provides independent and
largest investment most probably in many people’s lives.
authoritative information on any home in New Zealand on or
During the AI Asia Summit 2018, the summit panelists off the market [12] QV.co.nz and their mobile App QV
Dr. Yasantha Rajakarunanayake, Dr. Rukshan Baduwita , Dr. homeguide is known to be providing more accurate values of
James Shanahan, and Dr. Chrisantha Fernando agreed that Sri real estate property and key details to assist people to make
Lanka is behind in terms of AI startups[5], despite the fact instant decisions regarding property. QV with CoreLogic, a
software industry is a vastly growing area. According to the company which analyzes information assets and data to
survey conducted under research done by Karunanda et al[6], provide clients with analytics and customized data services
carried out in 2014, this is due to the lack of popularity, provides a range of reports valuable to the user.
knowledge, experts, requirements, and sponsorship for the AI
Creating a methodology that would bring more
related software projects[6].
sophisticated information, greater accuracy, and analytical
But when analyzing local news, we can see that AI based rigor to the United Kingdom (UK) residential property market
applications have become a trend. For Example, Dialog has is the motivation behind HousePrice.ai. Their proprietary
its own AI powered voice service to support its product model provides a combination of multi-disciplinary
service framework. experiences of AI and Big Data to provide the most accurate
estimations. HousePrice.ai has Horizon app, which calculates
There are researches that have been conducted to predict
capital, rental and gross development values for a single
the Stock prices of Sri Lanka with the usage of Artificial
property or an entire portfolio. [13] it produces accurate
Intelligence and Machine Learning approaches, tilted A
property valuations both in the present time and can offer C. Prediction Models
future predictions. Valuations are based on objective In the prediction system there mainly two components,
measurable values, creating a fact-based result as opposed to namely, current value prediction and future value prediction.
a subjective one [14]. This tool allows the user to adjust, add Each of the components were tested with several algorithms
and remove factors within the surrounding areas to determine to provide the most accurate prediction to the user. For that,
how external changes will affect property prices. first of all, we need to train the algorithm using a suitable
Our intention is to identify the ways to use their dataset containing the factors and actual market value changes
underlying methodology in a suitable manner in the Sri information. Using the past information selected models are
Lankan context. trained.
1) Multivariate Linear Regression
II. METHODOLOGY
MLR is an algorithm used in both the components of
A. Data Collection current value prediction and future value prediction. Simply,
it is assuming that there is a linear relationship between price
The study focuses on Colombo which experienced
predictions and other contributing factors.
relatively high infrastructure development.
Primary data have been collected through questionnaires,
interviews and personal visits to the land area to know the
present situation of the market, and the secondary data are
collected mainly through various survey departments, land
estate agents, newspaper advertisements, and land sale
website contents. The data are useful for assessing the
performance of the property as a key to predict land price.
The cross-sectional data collected for current price Figure 1. Multivariate linear model
prediction to be used with non time-series algorithm were
collected through a questionnaire where residents in the MLR has several advantages than other algorithms. The
Colombo district responded and by means of including ability to determine the relative influence of one or more
publicly available data in newspaper and website predictor variables on the criterion value. multivariate
advertisements. The questionnaire mainly asked for the price techniques provide a powerful test of significance compared
of the land, location of the land, nearest bus route, and to univariate techniques.[15] multivariate techniques to give
distance to the nearest bus route, along with the buying price meaningful results, they need a large sample of data;
and details of valuation history with above 200 samples. The otherwise, the results are meaningless due to high standard
time series data collected to predict the current value from a errors. [15] Standard errors determine how confident you can
land sale company which had monthly land values from the be in the results, and you can be more confident in the results
same area over a period of 10 years, containing above 200 from a large sample than a small one.
samples. MLR model implementation finds the best fitting line
Dataset used for future value prediction has a Land price using model coefficients. The process of optimizing the
in places in Colombo district from 20012 to 2018 which the model is to minimize the error of the predicted value.
algorithm is going to predict the future price for 10 years with The MLR algorithm used for the current value prediction
features state, city, zip code, price, pollution index, hospital component analyzed the factors location, distance to the main
distance, tourist score, bank/ATM, distance to school, bus route, accessibility index, size of the land during testing.
distance to town, population index, bid date having above 500
samples. Each places has 8 or more than 8 samples. MLR used for future value prediction unit analyzed
relationship between land price and latest other factors such
B. Design as state, city, zip code, price, pollution index, hospital
When a customer goes to a land he is willing to buy, they distance, tourist score, bank/ATM, school distance, distance
can input the current location through the application. Based to town, population index, and weather conditions.
on that location, suitable recent sales data are selected. Then
those data will be analyzed by the AI model to predict the 2) Random forest regressor
current value. That predicted value is optimized to produce Random forest regressor operates by constructing a
the most accurate current value. Then the future value will be multitude of decision trees to fit the observations into groups
predicted by the collaboration of two units, one which based on their attribute values and outputs the mean
considers the fluctuation rates of past pricing values and prediction of the individual trees. As the name suggests, the
weather effects, while the other calculates the effect of “decision tree” model builds a reversed tree-like structure,
proposed development projects in the area. All these units where the “root” is at the top, followed by multiple branches,
generate a report which depicts these two types of data with nodes, and leaves. The end of each branch is a decision leaf,
relevant other data in a simpler way anyone can understand. which is the model’s predicted value, given the values of the
The application of machine learning, and deep learning attributes represented by the path from the root node to the
algorithms have been tested in each of the components with said decision leaf.
suitable data.
This model was tested for the current value prediction III. RESULTS AND DISCUSSION
component with the same features tested with the MLR The results obtained by testing the above models in the
model. two different domains of current value prediction and future
value prediction is discussed here.
3) Artificial Neural Networks
ANN design concept is based on the human brain. The A. Current Value Prediction
purpose of ANN is to imitate the human learning process. This was carried out as two phases testing time-series
This model consists of mainly three types of layers namely, algorithms and vice versa. As mentioned above in II, machine
input layer, hidden layer, and output layer, each layer having learning models LSTM and ARIMA were tested with time
artificial neurons contribute in adjusting weights for the input series data while MLR, Random forest regressor, and ANN
features and attempt making conclusions just like the human was tested with cross sectional data. These models were
brain is doing. evaluated in terms of mean absolute error (MAE), mean
standard error(MSE) and root mean squared error (RMSE).
The ANN was also trained for the current value prediction
with the same dataset used for MLR. Through a trivial trial Test results for these models can be summarized as
and error process suitable model was identified and compared follows.
with the others.
MAE MSE RMSE
MLR 12578.2076 37057375442 192502.923
4) LSTM – Recurrent Neural Network
Random
Considering the fact that time has a direct influence on Forest
land prices time-series algorithms were also tested for Regressor 69388.61903 17241415729 131306.572
selecting the best prediction model for the current price. What ANN 495306.848 351944183254.44 593248.838
makes LSTM different from a typical neural network is that LSTM 12150.774 1834424960 42830.187
it has feedback connections. ARIMA 26549.4523 9730956580 98645.6111
To test this model, time series dataset having monthly land
values from the area over a period of 10 years was used. The Table 1. Test results for current value prediction
dataset was having lags of unknown duration hence out of algorithms
available RNN types, LSTM was the best option.
According to the above results, time series algorithms
5) ARIMA model predicted values with comparatively less error than the others.
Finally, it can be concluded that the LSTM model has
ARIMA standing for Auto Regressive Integrated Moving outperformed all the other machine learning models in price
Average is the most popular and commonly used statistical prediction.
method for time series prediction. This model was utilized in
both current value prediction and future value prediction Surprisingly ARIMA model displayed lower performance
units. in this case than the LSTM model.
The procedure to follow with this model is to split the The below figure depicts how the loss metric for the Keras
training dataset into train and test sets, use the train set to fit model was reduced with the time. The error has become less
the model, and generate a prediction for each element on the than 0.01 according to this.
test set. A rolling forecast is required given the dependence
on observations in prior time steps for differencing and the
AR model. A crude way to perform this rolling forecast is to
re-create the ARIMA model after each new observation is
received
6) KNN algorithm
KNN can be used for both classification and regression
problems. The algorithm uses ‘feature similarity’ to predict
the values of any new data points.
This model was used to predict future values with
categorized data being used as training sets and test sets (40%
allocated for test data). First, initialize the model. After that
fit the train data to store RMSE values for different k values,
fit the model. Then make the prediction on the test data set
and calculate the RMSE value and store the RMSE value.[16]
Figure 2. LSTM loss function
Figure 3. LSTM accuracy function Figure 5. ARIMA model forecasts
Based on the above functions we can conclude that the Below is the variation of RMSE with K value as
LSTM model predicts the current value with approximately concluded by the KNN model used for price prediction.
an accuracy over 0.75.
B. Future value Prediction
Future value prediction was carried out based on two
approaches. Then the specific values for each sector are
coming as the output. Then the output will be compared with
other machine learning algorithms. Extracted structured data
are gained by the fellow members and then the data are fed
into the algorithm. After that, a comparison runs between
these data and the price with aid of the algorithm. Hereafter,
the output which comes from this algorithm and other
algorithm is compared. Finally, the decision is made which
the most trustworthy prediction is.
Figure 6. RMSE versus K values
The first approach was based on features of the land and
historical data of price fluctuation rates. There has been used The second approach of predicting future values is based
a combination of KNN and MLR versus ARIMA model. on infrastructural facilities. The prediction is made with
Results can be concluded as follows. These two models when relative to the future development projects in the area. This
evaluated, displays MSE value for multivariant linear includes the infrastructure facilities such as schools, hospitals,
regression is 590293123907492.1, MSE value for the highways, and apartments. The commercial value of land in
ARIMA model is 605023251112851.9. Therefore, MLR and the future is calculated based on the percentage ratio. Each of
KNN combinations can be seen as a more accurate option for these infrastructure facilities are given a specific percentage
future price prediction. value based on studies made on the effect on land value with
emerging facilities. The dataset collection plays a major role
Below Figure 4 shows the results obtained by the ARIMA
here which involves the percentage calculation. The machine
model.
learning field is to be used to predict the land value. To
achieve these objectives, the machine is trained and tested
with the dataset to predict the future commercial value of the
land with the effect of different infrastructural additions. For
example, being the current land price Rs.2000, assuming after
five years to have school, a hospital so land value percentage
increased by the school is 20% and by hospital 40%, hence
future land value will be 2000+20%+40% kind of prediction
is done. In the prediction system there mainly use one
algorithm to predict the data that is MLR the purpose of using
two algorithms is to provide the most accreted point. In doing
that thing, first of all, need to train the algorithm using a future
infrastructure dataset using the past information collected.
Here, Score Model segment that attempts to anticipate the
test information. The training model gives a prescient
calculation that a score model employments. The Scored
Labels segment characterizes the expectation of Sales
Figure 4. ARIMA model results Amount.
For the straight relapse, we will take a gander at Environment. 2018 IEEE 5th International Conference on Data
Science and Advanced Analytics (DSAA).
"Coefficient of Determination". This worth educates us about doi:10.1109/dsaa.2018.00043
the exactness regarding the model and can change somewhere
in the range of 0 and 1. In the event that the worth is close to [3] Schulz, R. (2003). Valuation of properties and economic
0.8 or 1, the straight relapse model is dependable. models of real estate markets. Erscheinungsort nicht
ermittelbar: Verlag nicht ermittelbar.
[4] Vaz, J. (2015). REAL ESTATE APPRAISAL AND
SUBJECTIVITY. European Scientific Journal March 2015,
ISSN: 1857 – 7881(e - ISSN 1857- 7431), pp.55, 63.
[5] De Andrado, M. (2018). Aiming for a Smarter Future With the
AI Asia Summit 2018 – README. [online] README.
Available at: https://www.readme.lk/slasscom-ai-asia-summit-
2018-post-event/ [Accessed 20 Feb. 2019].
Figure 7. Prediction and prediction probability
calculation. [6] Karunananda, A., Asanka, P., Fernando, H., Adhikari, T. and
Pathirage, I. (2014). State of Artificial Intelligence in Sri
Lankan Software Industry. [online] Available at:
https://www.researchgate.net/publication/281224224_State_of
_Artificial_Intelligence_in_Sri_Lankan_Software_Industry
[Accessed 17 Feb. 2019].
[7] A. J. P. Samarawickrama and T. G. I. Fernando, "A recurrent
neural network approach in predicting daily stock prices an
application to the Sri Lankan stock market," 2017 IEEE
International Conference on Industrial and Information
Systems (ICIIS), Peradeniya, 2017, pp. 1-6.
[8] Chandrasekara, Vasana & Tilakaratne, Chandima. (2011).
Figure 8. MAE, MSE for the given model Comparison of Support Vector Regression and Artificial
Neural Network Models to Forecast daily Colombo Stock
When we run our model, we will see a coefficient of Exchange.
assurance of about 0.9. It implies that our model is practically
more accurate. [9] Chaphalkar, N.B, & Sayali Sandbhor. (n.d.). Use of Artificial
Intelligence in Real Property Valuation. Retrieved from
http://www.enggjournals.com/ijet/docs/IJET13-05-03-087.pdf
IV. CONCLUSION
[10] Zillow. (2019, February 21). Retrieved from
Based on the observations above, we can conclude that https://en.wikipedia.org/wiki/Zillow#Zestimate [Accessed 23
LSTM model has the least error among the other tested Feb. 2019].
models and it can achieve an accuracy of around 0.75 in
[11] Hagerty, James R. "How Good Are Zillow's Estimates?", The
predicting the current value. But there can be tradeoffs, Wall Street Journal, 2007-02-14. Retrieved on 2009-02-
depending on the dataset being used and its sample size. In 25.[Accessed 23 Feb. 2019].
predicting future values, the combination of KNN and MLR
was identified to be outperforming the ARIMA model with a [12] “QV Homeguide App Now Available.” New Zealand Property
Investors Federation, 3 Mar. 2015,
lower MSE value. Again, same as in LSTM, tradeoffs are www.nzpif.org.nz/news/view/56971. [Accessed 24 Feb. 2019].
possible.
[13] “ABOUT US.” Houseprice.AI-What's the Fair Price ?,
Hence further work on these models are recommended www.houseprice.ai/about. [Accessed 24 Feb. 2019].
with different features considered based on different
valuation models and with greater sample size. [14] “Introducing Houseprice.AI: The Must Have Tool for Every
Developer.” Bridging Loans | Development Loans |
AvamoreCapital, 29 May 2018,
ACKNOWLEDGMENT avamorecapital.com/introducing-houseprice-ai-the-must-have-
tool-for-every-developer/. [Accessed 24 Feb. 2019].
We would like to express our gratitude to the Faculty of
Computing, Sri Lanka Institute of Information Technology [15] J. Jackson, “Multivariate Techniques: Advantages and
(SLIIT) for providing us with a good academic environment Disadvantages,” The Classroom | Empowering Students in
Their College Journey, 10-Jan-2019. [Online]. Available:
and facilities to complete this project. We also want to express https://www.theclassroom.com/multivariate-techniques-
our thankful words to our families, friends, and lecturers for advantages-disadvantages-8247893.html. [Accessed: 04-Aug-
their understanding and support on us in completing this 2019].
project. [16] A. Singh, “A Practical Introduction to K-Nearest Neighbor for
Regression,” Analytics Vidhya, 07-May-2019. [Online].
REFERENCES Available: https://www.analyticsvidhya.com/blog/2018/08/k-
nearest-neighbor-introduction-regression-python/. [Accessed:
[1] Li, L., Prussella, P.G.R.N.I., Gunathilake, M.D.E.K., 04-Aug-2019].
Munasinghe, D.S. and Karadana, C.A., 2015. Land Valuation
Systems using GIS Technology Case of Matara Urban Council [17] “How to Create an ARIMA Model for Time Series Forecasting
Area, Sri Lanka. Bhumi, The Planning Research Journal, 4(2), in Python,” Machine Learning Mastery, 26-Apr-2019.
pp.7–16. [Online]. Available:
https://machinelearningmastery.com/arima-for-time-series-
[2] Nadai, M. D., & Lepri, B. (2018). The Economic Value of forecasting-with-python. [Accessed: 04-Aug-2019
Neighborhoods: Predicting Real Estate Prices from the Urban
View publication stats