π PuckInsights: NHL Data Analysis and Cloud Deployment PuckInsights is an end-to-end data science project that explores historical NHL and ice hockey statistics through practical exploratory and statistical analysis. The project starts in an interactive Google Colab notebook and evolves into a cloud-deployed, containerized analytics service hosted on Azure.
π Project Roadmap This project is documented in a Medium article series covering:
π Descriptive Statistics β Central tendencies and dispersion metrics
π Correlation and Regression β Linear vs monotonic trends, residual diagnostics, and modeling
π Distributions and Patterns β Fitting and evaluating probabilistic models
π³ Dockerization & Cloud Deployment β Building and deploying a full pipeline on Azure Container Apps
π¦ Dataset Overview The dataset was collected and cleaned from public NHL sources.
Property Value Rows 12,250 Columns 23 Memory Usage 5.67 MB Bytes per Row ~485.3 Year Range 1963 to 2022
The dataset includes aggregated and per-season metrics for players and goalies, allowing for rich EDA, correlation analysis, and modeling exercises.
π§ͺ Current Focus: Exploratory & Statistical Analysis We're currently deep-diving into:
Pre-correlation diagnostics: linear vs monotonic detection using Pearson and Spearman
Residual analysis: to check for patterns, nonlinearity, or heteroscedasticity
Quadratic model comparison: to identify nonlinear trends not captured by linear models
Visual diagnostics: heatmaps, scatterplots, residual plots, and QQ-plots
βοΈ Coming Soon: Deployment After the notebook analysis is complete:
The codebase will be modularized and refactored
Containerized using Docker
Deployed as a stateless app on Azure Container Apps
Exposed via HTTP API and integrated with basic visualization tools