This repository contains two assignments focused on applying unsupervised learning techniques and performing exploratory data analysis (EDA) on the Houston Weather Dataset (HWD). The goal is to extract insights from weather patterns in Houston using clustering, outlier detection, and visualization techniques.
- Data cleaning & preprocessing
- Summary statistics for all attributes
- Distribution plots and trend visualizations
- Pairwise relationships (e.g., humidity vs temperature)
- Prepared dataset for clustering & outlier tasks
- Histograms & Boxplots
- Correlation heatmaps
- Line graphs for temporal weather patterns
- Scatter plots with class overlays
- Applied unsupervised learning algorithms in real-world data
- Evaluated clustering quality and interpretability
- Designed and implemented custom outlier detection logic
- Conducted complete exploratory data analysis pipeline
- Applied with
k=3 - Evaluated using:
- Purity Score (with actual
classlabels) - SSE (Sum of Squared Errors)
- Boxplots of each cluster
- Cluster Centroids and summaries
- Purity Score (with actual
- Hyperparameter tuning to yield 2–15 clusters and <20% noise
- Comparison with K-Means using:
- Purity Score
- Cluster shapes
- Noise points
- Refined subset:
RHOUSTONW(data from 2021)
- Distance-Based Outlier Detection
- Density-Based Outlier Detection
- Implemented multivariate distance & density scoring functions
- Applied each technique with 3 different hyperparameter settings
- Generated Outlier Likelihood Score (OLS) for every instance
- Ranked the dataset:
- Identified Top 3 Outliers
- Identified 1 Most Normal record
- Compared detection techniques and interpretations