Machine Learning Project

Analyzing and Predicting Housing Market Trends using Machine Learning

Project Overview 📋

This project utilizes a dataset containing house sale prices in King County, Seattle (US), from May 2014 to May 2015. The aim is to leverage our Python, EDA (Exploratory Data Analysis), and Machine Learning skills to analyze and predict house selling prices.

It simulates real-world real estate analysis, promoting collaborative problem-solving and the practical application of Python in real estate finance.

Goal 🎯

The goal of this project is to apply our Python and Machine Learning knowledge to:

Analyze and clean the data: Prepare the dataset by performing exploratory data analysis (EDA) and preprocessing steps such as handling missing values, removing outliers, and feature engineering.
Apply different supervised regression machine learning models: Utilize various regression algorithms such as linear regression, decision tree regression, random forest regression, etc., to train models on the cleaned dataset.
Assess the results and choose the best model to deploy: Evaluate the performance of each model using appropriate metrics like Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), etc. Select the most accurate and robust model for deployment in predicting house prices in the King County, Seattle market.

Dataset 💾

Source: [Kaggle:] King County, Seattle House Market Price and Sources
Data Timeframe: May 2014 to May 2015
Data Size: (21613 rows, 21columns)

Dataset Features:

id: A unique identifier for each house.
date: The date when the house was sold.
price: The sale price of the house (prediction target).
bedrooms: Number of bedrooms in the house.
bathrooms: Number of bathrooms in the house per bedroom.
sqft_living: Square footage of the interior living space.
sqft_lot: Square footage of the land space.
floors: Number of floors (levels) in the house.
waterfront: Indicates whether the house has a waterfront view.
view: Number of times the house has been viewed.
condition: Overall condition of the house.
grade: Overall grade given to the house based on the King County grading system.
sqft_above: Square footage of the house apart from the basement.
sqft_basement: Square footage of the basement.
yr_built: The year the house was built.
yr_renovated: The year the house was renovated.
zipcode: ZIP code area.
lat: Latitude coordinate.
long: Longitude coordinate.
sqft_living15: Interior living space for the nearest 15 neighbors in 2015.
sqft_lot15: Land space for the nearest 15 neighbors in 2015.

Target column: Price Our primary focus is to understand which features most significantly impact house prices. Additionally, we aim to explore properties valued at $650K and above for more detailed insights.

Exploratory Data Analysis 🔬

EDA Insights

The price data does not follow a normal distribution, indicating skewness towards lower prices.

There is a high number of outliers with exceptionally high house prices, which could affect the model's performance if not addressed.

`Price` Exploration

I divided the prices into 5 price_ranks: (<250k), (250k-500k), (500k-750k), (750k-1M), (>1M).

We can see that most of the house prices are within two rank prices (250k-500k ) & (500k-750k).

Then we looked at Boxplots by price_rank:

We observe that most of the outliers are within teh price rank of houses above 1M.

Machine Learning Models 🤖

In this porject we'll apply three regression supervised machine leraning models:

Train-test Split = 70% Train / 30% Test

Results 📊

We got the following results:

Conclusion 🔎

Based on the evaluation metrics, the decision tree is the best model for predicting accurate house sale prices in the Seattle (US) market. Here are the key findings:

Decision Tree R2: 0.7366
Decision Tree RMSE: 194996.9105
Decision Tree MSE: 38023795092.7977
Decision Tree MAE: 100868.8284

The decision tree model demonstrates the highest R2 score and the lowest RMSE, MSE, and MAE metrics among the models evaluated. Hence, it is the preferred choice for accurate predictions in this context.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
Colab Script Iron Regression Quest.ipynb		Colab Script Iron Regression Quest.ipynb
README.md		README.md
Slides Iron Regression Quest .pdf		Slides Iron Regression Quest .pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Machine Learning Project

Analyzing and Predicting Housing Market Trends using Machine Learning

Project Overview 📋

Goal 🎯