Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
20 views2 pages

PrOGRAM1.Ipynb - Colab

The document outlines a Python program that analyzes the California Housing dataset using pandas, seaborn, and matplotlib. It includes steps for creating histograms and box plots for numerical features, as well as identifying outliers using the IQR method. The program reports the number of outliers for various features in the dataset.

Uploaded by

snehagnn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views2 pages

PrOGRAM1.Ipynb - Colab

The document outlines a Python program that analyzes the California Housing dataset using pandas, seaborn, and matplotlib. It includes steps for creating histograms and box plots for numerical features, as well as identifying outliers using the IQR method. The program reports the number of outliers for various features in the dataset.

Uploaded by

snehagnn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

14/05/2025, 12:35 PrOGRAM1.

ipynb - Colab

Open in Colab

PROGRAM 1

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_california_housing

# Step 1: Load the California Housing dataset


data = fetch_california_housing(as_frame=True)
housing_df = data.frame

# Step 2: Create histograms for numerical features


numerical_features = housing_df.select_dtypes(include=[np.number]).columns

# Plot histograms
plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.histplot(housing_df[feature], kde=True, bins=30, color='blue')
plt.title(f'Distribution of {feature}')
plt.tight_layout()
plt.show()

# Step 3: Generate box plots for numerical features


plt.figure(figsize=(15, 10))
for i, feature in enumerate(numerical_features):
plt.subplot(3, 3, i + 1)
sns.boxplot(x=housing_df[feature], color='orange')
plt.title(f'Box Plot of {feature}')
plt.tight_layout()
plt.show()

# Step 4: Identify outliers using the IQR method


print("Outliers Detection:")
outliers_summary = {}
for feature in numerical_features:
Q1 = housing_df[feature].quantile(0.25)
Q3 = housing_df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = housing_df[(housing_df[feature] < lower_bound) | (housing_df[feature] > upper_bound)]
outliers_summary[feature] = len(outliers)
print(f"{feature}: {len(outliers)} outliers")

# Optional: Print a summary of the dataset


print("\nDataset Summary:")
print(housing_df.describe())

https://colab.research.google.com/drive/1R2-255RkgVho0lrXDadCR3c4J113DP8B#scrollTo=clcq4dvE7kHI&printMode=true 1/2
14/05/2025, 12:35 PrOGRAM1.ipynb - Colab

Outliers Detection:
MedInc: 681 outliers
HouseAge: 0 outliers
AveRooms: 511 outliers
AveBedrms: 1424 outliers
Population: 1196 outliers
https://colab.research.google.com/drive/1R2-255RkgVho0lrXDadCR3c4J113DP8B#scrollTo=clcq4dvE7kHI&printMode=true 2/2

You might also like