0% found this document useful (0 votes)

95 views7 pages

Housing Data Cleaning & Analysis

The document summarizes the steps taken to clean a housing data set. It reads in a CSV file, gets information about the data, describes it, finds and drops null values, and fills null values using mean, mode, and other imputation methods. The goal is to clean the raw housing data for further analysis by handling missing data.

Uploaded by

krish aggarwal (krish)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views7 pages

Housing Data Cleaning & Analysis

Uploaded by

krish aggarwal (krish)

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

12/4/23, 7:28 PM Data Cleaning Project 1st Draft.

ipynb - Colaboratory

import pandas as pd

Read the CSV File

df = pd.read_csv('/content/Housing_Data_Set (1).csv')

output price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefar

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 y

1 12250000.0 8960 4.0 4 4.0 yes no no no yes 3

2 12250000.0 9960 3.0 2 2.0 yes no yes no no 2 y

3 12215000.0 7500 4.0 2 2.0 yes no yes no yes 3 y

4 11410000.0 7420 4.0 1 2.0 yes yes yes no yes 2

... ... ... ... ... ... ... ... ... ... ... ...

df.head()

price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 yes

1 12250000.0 8960 4.0 4 4.0 yes no no no yes 3 no

Getting information about the Dataset

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 545 entries, 0 to 544
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 price 538 non-null float64
1 area 545 non-null int64
2 bedrooms 544 non-null float64
3 bathrooms 545 non-null int64
4 stories 543 non-null float64
5 mainroad 545 non-null object
6 guestroom 543 non-null object
7 basement 545 non-null object
8 hotwaterheating 537 non-null object
9 airconditioning 544 non-null object
10 parking 545 non-null int64
11 prefarea 545 non-null object
12 furnishingstatus 545 non-null object
13 Date 542 non-null object
dtypes: float64(3), int64(3), object(8)
memory usage: 59.7+ KB

Describing the Dataset

df.describe()

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 1/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory

price area bedrooms bathrooms stories parking

count 5.380000e+02 545.000000 544.000000 545.000000 543.000000 545.000000

mean 4.779255e+06 5150.541284 2.966912 1.286239 2.069982 0.693578

std 1.876768e+06 2170.141023 0.737579 0.502470 4.996187 0.861586

min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000 0.000000

25% 3.438750e+06 3600.000000 2.000000 1.000000 1.000000 0.000000

50% 4.340000e+06 4600.000000 3.000000 1.000000 2.000000 0.000000

75% 5.796000e+06 6360.000000 3.000000 2.000000 2.000000 1.000000

max 1.330000e+07 16200.000000 6.000000 4.000000 110.000000 3.000000

Finding the Null Values in the Dataset

df.isnull().sum()

price 7
area 0
bedrooms 1
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
Date 3
dtype: int64

Dropping the unnecessary columns in the Dataste

df = df.drop(columns = 'Date')

df.head()

price area bedrooms bathrooms stories mainroad guestroom basement hotw

0 13300000.0 7420 4.0 2 3.0 yes no no

1 12250000.0 8960 4.0 4 4.0 yes no no

2 12250000.0 9960 3.0 2 2.0 yes no yes

3 12215000.0 7500 4.0 2 2.0 yes no yes

Fill the null values in the Data using the mean average method

df['price'].fillna(df["price"].mean(), inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 1
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 2/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
df['price'].fillna(df["price"].mean(), inplace=True)

price area bedrooms bathrooms stories mainroad guestroom basement ho

0 13300000.0 7420 4.0 2 3.0 yes no no

1 12250000.0 8960 4.0 4 4.0 yes no no

2 12250000.0 9960 3.0 2 2.0 yes no yes

3 12215000.0 7500 4.0 2 2.0 yes no yes

4 11410000.0 7420 4.0 1 2.0 yes yes yes

... ... ... ... ... ... ... ... ...

540 1820000.0 3000 2.0 1 1.0 yes no yes

541 1767150.0 2400 3.0 1 1.0 no no no

542 1750000.0 3620 2.0 1 1.0 yes no no

543 1750000.0 2910 3.0 1 1.0 no no no

544 1750000.0 3850 3.0 1 2.0 yes no no

df['bedrooms'].fillna(df["bedrooms"].mean(), inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 2
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

df['stories'].fillna(df["stories"].mean(), inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 2
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

Fill the null values by getting the Mode of the Column

df['guestroom'].fillna(df['guestroom'].mode().iloc[0], inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 3/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory
guestroom 0
basement 0
hotwaterheating 8
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

df['hotwaterheating'].fillna(df['hotwaterheating'].mode().iloc[0], inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 1
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

df['airconditioning'].fillna(df['airconditioning'].mode().iloc[0], inplace=True)

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

Sorting the Dataset according to the "furnishingstatus" column

df = df.sort_values('furnishingstatus')

df.head()

price area bedrooms bathrooms stories mainroad guestroom basement ho

0 13300000.0 7420 4.0 2 3.0 yes no no

365 3703000.0 5450 2.0 1 1.0 yes no no

124 5950000.0 6525 3.0 2 4.0 yes no no

362 3710000.0 4050 2.0 1 1.0 yes no no

Rephrasing the Dataset

df = df.reset_index()

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 4/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory

index price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking

0 0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2

1 365 3703000.0 5450 2.0 1 1.0 yes no no no no 0

2 124 5950000.0 6525 3.0 2 4.0 yes no no no no 1

3 362 3710000.0 4050 2.0 1 1.0 yes no no no no 0

4 128 5873000.0 5500 3.0 1 3.0 yes yes no no yes 1

... ... ... ... ... ... ... ... ... ... ... ... ...

540 405 3465000.0 3060 3.0 1 1.0 yes no no no no 0

541 406 3465000.0 5320 2.0 1 1.0 yes no no no no 1

df = df.drop(columns
542 408 = 'index')
3430000.0 4000 2.0 1 1.0 yes no no no no 0

543 410 3430000.0 3850 3.0 1 1.0 yes no no no no 0

df.head()
544 544 1750000.0 3850 3.0 1 2.0 yes no no no no 0
price area bedrooms bathrooms stories mainroad guestroom basement hotwaterheating airconditioning parking prefarea

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 yes

1 3703000.0 5450 2.0 1 1.0 yes no no no no 0 no

2 5950000.0 6525 3.0 2 4.0 yes no no no no 1 no

3 3710000.0 4050 2.0 1 1.0 yes no no no no 0 no

Checking if the Dataset is clean or not

df.isnull().sum()

price 0
area 0
bedrooms 0
bathrooms 0
stories 0
mainroad 0
guestroom 0
basement 0
hotwaterheating 0
airconditioning 0
parking 0
prefarea 0
furnishingstatus 0
dtype: int64

EDA(Exploratory Data Analysis) of the Dataset

import matplotlib.pyplot as plt

plt.scatter(df["price"], df["area"])

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 5/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory

<matplotlib.collections.PathCollection at 0x7a4a6102f010>
import seaborn as sns

sns.pairplot(df)

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 6/7
12/4/23, 7:28 PM Data Cleaning Project 1st Draft.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x7a4a6109f880>

Getting the Correlation of the Data and plotting it on the Heatmap

correlation_matrix = df[['price', 'area']].corr()

sns.set(style="darkgrid")

sns.heatmap(correlation_matrix, annot=True, cmap='magma', fmt=".2f", linewidths=.5)

<Axes: >

sns.lmplot(x='price', y='area', data=df)

<seaborn.axisgrid.FacetGrid at 0x7a4a5c071e40>

https://colab.research.google.com/drive/1HU684UqeRwNdBFPV2X1bDAnDeztx5GVD#scrollTo=W2c02AzKM3kZ&printMode=true 7/7

Airbnb Pricing Model Analysis
No ratings yet
Airbnb Pricing Model Analysis
8 pages
024 Price and Everything PDF
100% (1)
024 Price and Everything PDF
12 pages
Emv Tutorial
0% (1)
Emv Tutorial
4 pages
1Z0 1127 25 Demo
No ratings yet
1Z0 1127 25 Demo
6 pages
Hubbert Smith - Data Center Storage - Cost-Effective Strategies, Implementation, and Management (2011, Auerbach Publications)
No ratings yet
Hubbert Smith - Data Center Storage - Cost-Effective Strategies, Implementation, and Management (2011, Auerbach Publications)
363 pages
How To Create Your NFT Marketplace With An OpenSea Clone Script
No ratings yet
How To Create Your NFT Marketplace With An OpenSea Clone Script
6 pages
Assigment1 - Manuel Tapia
No ratings yet
Assigment1 - Manuel Tapia
3 pages
Python Data Cleaning Guide
No ratings yet
Python Data Cleaning Guide
4 pages
Capstone Project Report
No ratings yet
Capstone Project Report
187 pages
Coating and Services: Interplan Asset Integrity
No ratings yet
Coating and Services: Interplan Asset Integrity
22 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Assignement 4
No ratings yet
Assignement 4
6 pages
00 Data Wrangling
No ratings yet
00 Data Wrangling
10 pages
Real Estate Data Insights
No ratings yet
Real Estate Data Insights
7 pages
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
No ratings yet
Multiple - Linear - Regression - AirBNB - Solution-0.2 - New - Ipynb - Colaboratory
11 pages
022 Price and Location PDF
No ratings yet
022 Price and Location PDF
16 pages
DataCleaning Techniques
No ratings yet
DataCleaning Techniques
20 pages
Python Real Estate Data Analysis
No ratings yet
Python Real Estate Data Analysis
10 pages
Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook
No ratings yet
Daily Task 6 & 7 - Explore Merge Function & Perform Data Cleaning - Jupyter Notebook
23 pages
Bangalore Real Estate Price Analysis
No ratings yet
Bangalore Real Estate Price Analysis
28 pages
House Price Prediction Analysis
No ratings yet
House Price Prediction Analysis
14 pages
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
Air BNB Data Analysis
No ratings yet
Air BNB Data Analysis
12 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
House - Price - Prediction
No ratings yet
House - Price - Prediction
16 pages
House Price Prediction Guide
No ratings yet
House Price Prediction Guide
14 pages
DMV - 3 - Jupyter Notebook
No ratings yet
DMV - 3 - Jupyter Notebook
2 pages
Predicting Home Prices in Bangalore
No ratings yet
Predicting Home Prices in Bangalore
18 pages
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Week 12
No ratings yet
Week 12
2 pages
Real Estate Price Prediction Guide
No ratings yet
Real Estate Price Prediction Guide
13 pages
Eda On Housing Data
No ratings yet
Eda On Housing Data
7 pages
Pandas Assignment 1
No ratings yet
Pandas Assignment 1
7 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
002 Python Pandas
No ratings yet
002 Python Pandas
19 pages
Uber Fare Prediction Analysis
No ratings yet
Uber Fare Prediction Analysis
6 pages
Dataframe
No ratings yet
Dataframe
19 pages
Data Analysis for Sales Insights
No ratings yet
Data Analysis for Sales Insights
4 pages
Ex 1
No ratings yet
Ex 1
119 pages
House Price Prediction: # Importing Necessary Libraries
No ratings yet
House Price Prediction: # Importing Necessary Libraries
18 pages
Data Science: Housing Price Prediction
No ratings yet
Data Science: Housing Price Prediction
2 pages
Hello
No ratings yet
Hello
3 pages
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
No ratings yet
Riya - 2412res102@iitp - Ac.in - Ipynb - Colab
3 pages
Exercise2 Solution
No ratings yet
Exercise2 Solution
15 pages
R Prerequisite1
No ratings yet
R Prerequisite1
4 pages
IE0005 Exercise Solutions 2-6
No ratings yet
IE0005 Exercise Solutions 2-6
84 pages
MLT Practical 2
No ratings yet
MLT Practical 2
6 pages
Boston Housing Solutions
No ratings yet
Boston Housing Solutions
3 pages
Exp 10
No ratings yet
Exp 10
1 page
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
No ratings yet
Assignment-2: Pandas PD Numpy NP Seaborn Sns Matplotlib - Pyplot PLT
14 pages
Design and Implementation of High Speed Carry Select Adder
No ratings yet
Design and Implementation of High Speed Carry Select Adder
6 pages
Ds ML House Price Book
No ratings yet
Ds ML House Price Book
46 pages
Google App Engine
100% (2)
Google App Engine
25 pages
Short Answer Questions (MC) : UNIT-1 Wireless Transmission
No ratings yet
Short Answer Questions (MC) : UNIT-1 Wireless Transmission
7 pages
Zomato Rating Prediction
No ratings yet
Zomato Rating Prediction
11 pages
Innovative Assignment PDF
No ratings yet
Innovative Assignment PDF
11 pages
Pract1.printdsbdapdf 2
No ratings yet
Pract1.printdsbdapdf 2
7 pages
Data Cleaning On Melbourne Housing
No ratings yet
Data Cleaning On Melbourne Housing
16 pages
Excel Add-In User Guide
No ratings yet
Excel Add-In User Guide
7 pages
LOGO Access Tool Help
No ratings yet
LOGO Access Tool Help
22 pages
Maximo Labor Reporting Guide
No ratings yet
Maximo Labor Reporting Guide
21 pages
DGS&D For Laptop
No ratings yet
DGS&D For Laptop
24 pages
Sticker Book PDF
No ratings yet
Sticker Book PDF
66 pages
Marker Based Maze Game Developed On Unity Software: A Project Report On
No ratings yet
Marker Based Maze Game Developed On Unity Software: A Project Report On
22 pages
Syn-2151 10/100/1000baset Ethernet Media Converter
No ratings yet
Syn-2151 10/100/1000baset Ethernet Media Converter
2 pages
Velammal Bodhi Campus: A Project Report On
No ratings yet
Velammal Bodhi Campus: A Project Report On
17 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Teradata Intelliflex
No ratings yet
Teradata Intelliflex
5 pages
04 - Signaling in MTP
No ratings yet
04 - Signaling in MTP
68 pages
Auvik Deployment Guide
No ratings yet
Auvik Deployment Guide
22 pages
BCA 5th Sem Lab (ML)
No ratings yet
BCA 5th Sem Lab (ML)
20 pages
Profile Summary: Pallavi Kumari Pandey
No ratings yet
Profile Summary: Pallavi Kumari Pandey
2 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Living in The IT Era
No ratings yet
Living in The IT Era
19 pages
Presentation 17
No ratings yet
Presentation 17
18 pages
Log
No ratings yet
Log
3 pages
Robotics, Monitoring and Control Systems Questions
No ratings yet
Robotics, Monitoring and Control Systems Questions
8 pages
Should: Action
No ratings yet
Should: Action
12 pages
FCC Install Guide - With Dual Lane Support Updated
No ratings yet
FCC Install Guide - With Dual Lane Support Updated
35 pages
Pandas Library
No ratings yet
Pandas Library
6 pages
Saksham Jain: November 2024 - Present
No ratings yet
Saksham Jain: November 2024 - Present
1 page
HTC 121107
No ratings yet
HTC 121107
11 pages
Swift Programming The Ultimate Beginner S Guide To Learn Swift Programming Step by Step 3nd Edition Alexander Aronowitz & NLN LNC (Aronowitz PDF Download
100% (1)
Swift Programming The Ultimate Beginner S Guide To Learn Swift Programming Step by Step 3nd Edition Alexander Aronowitz & NLN LNC (Aronowitz PDF Download
42 pages
Minor Assignment
No ratings yet
Minor Assignment
34 pages
Numpy - Pandas - Colab
No ratings yet
Numpy - Pandas - Colab
6 pages
Amex 2nd Class
No ratings yet
Amex 2nd Class
3 pages
Patel - ML Lab Exercise 8
No ratings yet
Patel - ML Lab Exercise 8
10 pages

Housing Data Cleaning & Analysis

Uploaded by

Housing Data Cleaning & Analysis

Uploaded by

12/4/23, 7:28 PM Data Cleaning Project 1st Draft.

Read the CSV File

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 y

1 12250000.0 8960 4.0 4 4.0 yes no no no yes 3

2 12250000.0 9960 3.0 2 2.0 yes no yes no no 2 y

3 12215000.0 7500 4.0 2 2.0 yes no yes no yes 3 y

4 11410000.0 7420 4.0 1 2.0 yes yes yes no yes 2

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 yes

1 12250000.0 8960 4.0 4 4.0 yes no no no yes 3 no

Getting information about the Dataset

Describing the Dataset

price area bedrooms bathrooms stories parking

count 5.380000e+02 545.000000 544.000000 545.000000 543.000000 545.000000

mean 4.779255e+06 5150.541284 2.966912 1.286239 2.069982 0.693578

std 1.876768e+06 2170.141023 0.737579 0.502470 4.996187 0.861586

min 1.750000e+06 1650.000000 1.000000 1.000000 1.000000 0.000000

25% 3.438750e+06 3600.000000 2.000000 1.000000 1.000000 0.000000

50% 4.340000e+06 4600.000000 3.000000 1.000000 2.000000 0.000000

75% 5.796000e+06 6360.000000 3.000000 2.000000 2.000000 1.000000

max 1.330000e+07 16200.000000 6.000000 4.000000 110.000000 3.000000

Finding the Null Values in the Dataset

Dropping the unnecessary columns in the Dataste

price area bedrooms bathrooms stories mainroad guestroom basement hotw

0 13300000.0 7420 4.0 2 3.0 yes no no

1 12250000.0 8960 4.0 4 4.0 yes no no

2 12250000.0 9960 3.0 2 2.0 yes no yes

3 12215000.0 7500 4.0 2 2.0 yes no yes

price area bedrooms bathrooms stories mainroad guestroom basement ho

0 13300000.0 7420 4.0 2 3.0 yes no no

1 12250000.0 8960 4.0 4 4.0 yes no no

2 12250000.0 9960 3.0 2 2.0 yes no yes

3 12215000.0 7500 4.0 2 2.0 yes no yes

4 11410000.0 7420 4.0 1 2.0 yes yes yes

... ... ... ... ... ... ... ... ...

540 1820000.0 3000 2.0 1 1.0 yes no yes

541 1767150.0 2400 3.0 1 1.0 no no no

542 1750000.0 3620 2.0 1 1.0 yes no no

543 1750000.0 2910 3.0 1 1.0 no no no

544 1750000.0 3850 3.0 1 2.0 yes no no

Fill the null values by getting the Mode of the Column

Sorting the Dataset according to the "furnishingstatus" column

price area bedrooms bathrooms stories mainroad guestroom basement ho

0 13300000.0 7420 4.0 2 3.0 yes no no

365 3703000.0 5450 2.0 1 1.0 yes no no

124 5950000.0 6525 3.0 2 4.0 yes no no

362 3710000.0 4050 2.0 1 1.0 yes no no

Rephrasing the Dataset

0 0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2

1 365 3703000.0 5450 2.0 1 1.0 yes no no no no 0

2 124 5950000.0 6525 3.0 2 4.0 yes no no no no 1

3 362 3710000.0 4050 2.0 1 1.0 yes no no no no 0

4 128 5873000.0 5500 3.0 1 3.0 yes yes no no yes 1

540 405 3465000.0 3060 3.0 1 1.0 yes no no no no 0

541 406 3465000.0 5320 2.0 1 1.0 yes no no no no 1

543 410 3430000.0 3850 3.0 1 1.0 yes no no no no 0

0 13300000.0 7420 4.0 2 3.0 yes no no no yes 2 yes

1 3703000.0 5450 2.0 1 1.0 yes no no no no 0 no

2 5950000.0 6525 3.0 2 4.0 yes no no no no 1 no

3 3710000.0 4050 2.0 1 1.0 yes no no no no 0 no

Checking if the Dataset is clean or not

EDA(Exploratory Data Analysis) of the Dataset

import matplotlib.pyplot as plt

Getting the Correlation of the Data and plotting it on the Heatmap

correlation_matrix = df[['price', 'area']].corr()

sns.heatmap(correlation_matrix, annot=True, cmap='magma', fmt=".2f", linewidths=.5)

sns.lmplot(x='price', y='area', data=df)

You might also like