This project demonstrates a simple workflow for performing exploratory data analysis (EDA) on an open dataset. Weβll download the Palmer Penguins dataset, explore it, create visualizations, and finally generate an HTML report.
mkdir -p data
wget https://raw.githubusercontent.com/allisonhorst/palmerpenguins/master/inst/extdata/penguins.csv -O data/penguins.csvpython -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
pip install pandas matplotlib seaborn jupyter jupyterlab ydata-profilingStart Jupyter Lab:
jupyter labIn a notebook (notebooks/01_exploration.ipynb):
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load data
df = pd.read_csv("../data/penguins.csv")
# Quick overview
print(df.head())
print(df.info())
print(df.describe())
# Simple visualization
sns.pairplot(df.dropna(), hue="species")
plt.show()Use ydata-profiling (formerly pandas-profiling) to automatically create an EDA report:
from ydata_profiling import ProfileReport
profile = ProfileReport(df, title="Penguins EDA Report", explorative=True)
profile.to_file("reports/penguins_report.html")The report will be saved at:
reports/penguins_report.html