Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views7 pages

House Price Predicting Model Using

The document presents a house price prediction model utilizing machine learning techniques, focusing on algorithms such as Linear Regression, Gradient Boosting Regression, and Random Forest Regression. It outlines the methodology for data collection, cleaning, training, and testing the model, emphasizing the importance of various features like location and number of rooms. The model aims to assist buyers in making informed decisions based on their budget and preferences while improving the efficiency of the real estate market.

Uploaded by

nkaintura388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views7 pages

House Price Predicting Model Using

The document presents a house price prediction model utilizing machine learning techniques, focusing on algorithms such as Linear Regression, Gradient Boosting Regression, and Random Forest Regression. It outlines the methodology for data collection, cleaning, training, and testing the model, emphasizing the importance of various features like location and number of rooms. The model aims to assist buyers in making informed decisions based on their budget and preferences while improving the efficiency of the real estate market.

Uploaded by

nkaintura388
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

House Price Predicting Model using

Machine Learning
Rahul Chauhan Prof. Kamal Ghanshala Utkarsh Gupta
Graphic Era Hill University Graphic Era University Graphic Era
Hill University Dehradun, India Dehradun, India
Dehradun, India [email protected] [email protected]
[email protected]

Abstract— Everyone wishes to buy and live in their dream house and which suits
their lifestyle and which provides facilities according to their needs. The main parameter
people look for will be the surrounding locality, area of the house in square feet, number of
rooms and bathrooms, Location etc, for prediction of house price. This model helps people
in selecting the house that is suitable for their living. As people are very concerned about
their budgets before buying any expensive things. This model also helps people in choosing
houses based on their budgets, which do not affect their financial state in the future. This
model helps us to predict the price of a house according to buyer requirements. This study
has attempted to implement various machine learning algorithms like Linear Regression
(LR), Gradient Boosting Regression (GBR) and Random Forest (RF) Regression
algorithms. Finally, the algorithm that generates high accuracy is considered for predicting
the house price.

Keywords: Machine Learning (ML), House Price Prediction, Regression


Techniques, ML Algorithm.
I. INTRODUCTION

Buying a new own house is everyone’s most important decision that a person makes in his life.
Everyone’s dream is to live in their dream house with a price range of their budget .The price of a house
may depend on a wide variety of factor range such as the location of the house, its features, as well as the
demand of the property. Therefore, predicting house values is not only beneficial for buyers, but also for
real estate agents and economic professionals. Without data we can’t train our model that why data is
called heart of Machine Learning model. So, we give certain information like location of house, number
of bedrooms, bathrooms and other amenities to our model to predict the house price accurately. Machine
Learning involves these model from previous data by using them to predict new data. Demand of housing
is increasing daily because our population is rising rapidly. People who don’t know the actual price of
particular house they may suffer loss of money. In dataset that we have used in our model in that there is
80% of data is used for training purpose and remaining 20% of data used for testing purpose. There are
many algorithm that can be used to predict house price, But I have used here Linear Regression algorithm
that perform sudden task to do prediction of house accurate.
1. Collecting data: Gather a large dataset containing information about houses such as location of house,
number of bedrooms and bathrooms, in which year the house is built, etc.

2. Data Cleaning: Clean the data by removing missing values, outliers and duplicates. Converting
categorical features into numerical features using techniques like one-hot encoding, label encoding, or
ordinal encoding.

3. Splitting of data: Split the data into two sets (i-) A Training set – The training set is used to train the
model, and (ii-) A Testing set- The testing set is used to evaluate the model’s performance.

4. Train the model: Training a regression model using the training set. By this model learn to predict
house prices based on the features of the house.

5. Evaluating the model: use the testing set to evaluate the model’s performance. Calculating metrics such
as mean squared error (MSE), mean absolute error (MAE) and R-squared value.

6. Deploy the model: once you are satisfied with the model’s performance, deploy it to make predictions
on new data.

7. Monitor the model: keep monitoring the model performance and update it periodically to ensure that it
stays accurate.

This model aims to explain how machine learning techniques can be applied to solve real-world problems
in real estate Company, and how accurate the prediction can be done using data-driven approaches. This
model is also very helpful for buyers and sellers make better decision and improve the overall efficiency
of the real estate market.

II. LITERATURE SURVEY

A lot of past works have been done for predicting house prices. Different levels of accuracies and results
have been achieved using different methodologies, techniques and datasets. A study of independent real
estate market forecasting on house price using data mining techniques was done by Bahia [11]. Here the
main idea was to construct the neural network model using two types of neural network. The first one is
Feed Forward Neural Network (FFNN) and the second one is Cascade Forward Neural Network (CFNN).
It was observed that CFNN gives a better result compared to FFNN using MSE performance metric.

A comparative study on the prediction of house prices using regression techniques like Elastic Net,
Multiple linear, Ridge, and LASSO algorithms has been presented by Madhuri. Here the common
parameters of the house have been used. That is, the price and square feet are the parameters used in this.
Tang et. al. have made a study on predicting house prices based on an ensemble learning algorithm.
Ensemble learning is considered the best tool for predicting algorithms. The random forest algorithm and
ensemble learning algorithm were used. The ensemble learning algorithm provides better performance
than the random forest algorithm. The ensemble learning provides the best accuracy compared to the
random forest algorithm.

Darshil Shah Volume 7 Issue 3 (2020) [2], shows the house prediction models are already there in the
market but have high risk and outdated dataset and to overcome that they have proposed a new and
automated system with better prediction and to do it they have used several techniques like XG Boost,
Light GBM, and Random Forest to train the model and predict the house prices and created an RPA that
generates more accurate and consisting results and also shows fewer errors which helps the customer to
make better decisions.

G. Naga Satish Volume 8 Issue 9 (2019)[4] in their paper they have used different algorithms like Linear
Regression, Lasso Regression, and Gradient Boosting and also took different predictable variables to get
the output plots in the form of bar graph and found that 3 bedroom houses are more and 7 bedroom
houses are least and considering all these they made a model that can predict the value of the house.

P. Durganjali proposed a house resale price prediction using classification algorithms. In this paper, the
resale price prediction of house is done using different classifications algorithms like Leaner regression,
Decision Tree, K-Means and Random Forest is used. There are so many factors are affected on house
price include physical attributes, location and also economic factor as well. Here we consider RMSE as
the performance matrix for different dataset and these algorithms are applied and find out most accuracy
model which predict better results.

Sifei Lu, proposed a hybrid regression technique for house price prediction. With limited dataset and data
features, creative feature engineering method is examined in this paper. The proposed approach has
recently has been deployed as the key kernel for Kaggle Challenge “House Price: Advance Regression
Techniques”. The goal of the paper is to predict reasonable price for customers with respect to their
budgets and priorities.

Patel and Upadhyay have discussed various pruning methods and their features and hence pruning
effectiveness is evaluated. They have also measured the accuracy for glass and diabetes dataset,
employing WEKA tool, considering various pruning factors. ID3 algorithm splits attribute based on their
entropy. TDIDT algorithm is one which constructs a set of classification rules through the intermediate
representation of a decision tree. Weka interface is used for testing of data sets by means of a variety of
open source machine learning algorithms.
III. Methodology

In this model, I focus on predicting house price using machine learning algorithms like Linear
Regression. I have proposed the system “House Price Prediction Using Machine Learning” I have predict
the house price using multiple features. In this proposed system, we are able to train model from various
features like Numbers of bedrooms, bathrooms, area of the house in square feet and location of house etc.
The previous data taken and out of this data 80% is used for training purpose and remaining 20% of data
used for testing purpose. Here, the raw data is stored in ‘.csv’ file. I have majorly used three machine
learning libraries to solve these problems. The first one was ‘pandas’, ‘numpy’ and another one is
‘sklearn’. The pandas used for to load ‘.csv’ file into jupyter notebook and also used to clean the data as
well as to manipulate the data. Numpy is used for train-test splitting purpose. Another one is sklearner,
which was used for real analysis and it has containing various inbuilt functions which help to solve the
problem.

1.Data Collection- The dataset is the collection of data, used for prediction purposes. Datasets can hold
any type of record that is stored in the system. For Machine Learning projects, a large amount of data is
required, because without data we cannot train our AI model, An ideal dataset has either well labeled
fields and members or a data dictionary that can be used to relabel the data. A good dataset has
completeness, they are reliable, and have great accuracy, Dataset can also be referred to as a container for
storing data. It has been attempted for various datasets on Kaggle, which would suite our project
objective. After looking at a lot of datasets, this dataset is found. It is a house pricing dataset in the city of
Bangalore.
2. Data cleaning- Data cleaning is the process of detecting and removing errors to increase the value of
data. Data cleaning is carried out with the help of data wrangling tools. It is the way toward identifying
and amending off base records from a record set, table or database. It finds the deficient information and
replaces the messy information. The information is changed to ensure that it is exact and right. It is
utilized to make a dataset predictable.

3. Pre-processing of the dataset- This process includes the pre-processing of the dataset and splitting the
dataset into a train(train.csv) and test(test.csv) dataset. In the dataset, there were non-numerical features
such as the location of the property, condition of property, ventilation, etc.

We have converted these non-numerical features into numerical features using One Hot Ender and label
Encoder function from the scikit learn library. There were empty cells in the dataset, we have replaced
these cells with the mean of the column using the Simple Imputer function from the scikit learn library.
Here, the target feature is the Sale price.

Split the dataset in training and test dataset in the ratio of 8:2 using train_test_split function.

4. Training the model- Here the data is broken into 2 parts. That is training and testing. 80 percent of
data is used for training the model and the rest 20 percent is used for testing purposes. Training the model
is mainly training the dataset with Machine Learning algorithms. It consists of sample output with
corresponding sets of input data for training the model as represented in figure.
5. Testing the model- Once the model is trained, they are tested with the dataset. The model provides the
prediction accuracy or the output for the processed data-set. It is a method to measure the results of the
model that gives the accurate score of the dataset. That is, validation/test is done for the model build. Test
data sets are used to evaluate machine learning programs that have been trained on an initial training data
set.

6. Simple Linear Regression -In this type of regression model a linear relationship is established among
the target variable which is the dependent variable (Y) and a single independent variable (X). Linear
Relationship between dependent and independent variable is established by fitting a regressor line
between them. The equation of the line is given by:

Y=a+bX (1)

where ‘a’ and ‘b’ are the model parameter called as regression coefficients. When we take the value of X
as 0, we get the value of ‘a’ which is the Y intercept of the line and ‘b’ is the slope that signifies the
change of Y with the change of X. If the value of ‘b’ is large then it means with a little change in X there
will be a huge change in Y and vice versa. To compute the values of ‘a’ and ‘b’ we use the Ordinary
Least Square Method. The values predicted by the model Linear Regression may not always be accurate.
There may be some difference hence we add an error term to the original equation (1), it helps for better
prediction of the model.

Y=a+bX+Ɛ (2)

There are some assumptions that are to be made in case of simple linear regression and those are as
follows:

1. The number of observations must be greater than the number of parameters present.

2. The validity of the regression data is over a restricted period.

3. The mean of the error term has expected value of 0, which means that the error term is normally
distributed.
7. Polynomial Regression- It is a special case of Simple Linear Regression. Unlike in linear regression
where the model tries to fit a straight regression line between the dependent and independent variable,
here a line cannot be fit as there doesn’t exists any linear relationship between the target variable and the
predictor variable. Here instead of straight line a curve is being fitted against the two variables. This is
accomplished by fitting a polynomial equation of degree n on the nonlinear data which forms a
curvilinear relationship between the dependent and independent variables. In polynomial regression the
independent variable may not be independent of each other unlike that in case of simple linear regression.
The equation of polynomial regression is as follows:

Y = a+b1X1 +b2X2 +b3X3 +........+bnXn (3)

The advantages of polynomial regression are as follows:

1. Polynomial Regression offers the best estimate of the relationship between the dependent and
independent variable.

2. The higher the degree of the polynomial the better it fits the dataset.

3. A wide range of curves can be fit into polynomial regression by varying the degree of the model.

The disadvantage of polynomial regression is as follows:

1. These are too sensitive towards the presence of outliers in the dataset, as the presence of outliers will
increase the variance of the model. And when the model encounters any unseen data point it under
performs.

8. Data Analysis- Before giving the data to any model we have to be sure that all the data is accurate and
ready to use to do this we have analyzed our dataset based on these features, characteristics, and the
relation among the features. From the analysis we found

You might also like