Azure Olympics
Trend
& Comparative
Analysis
Exploring 120+ years of athlete & medal
data with Azure + Python + Power BI
Problem
Statement
The Olympics hold over 120 years
of history with 70K+ athlete
records, but the data is scattered
and unstructured.
This makes it hard to see clear
trends in participation, gender,
and performance, or to make
comparisons across countries
and sports.
Tech Stack
Built the pipeline using Azure
Data Factory, Delta Lake, and
Databricks.
Leveraged Python libraries like
Pandas, Seaborn, and Matplotlib
for analysis, and Power BI for
dashboards.
Processed data formats evolving
from CSV → Parquet → Delta.
Data Analysis
Highlights
Performed exploratory analysis
by cleaning duplicates and
missing values.
Studied age, height, and weight
distributions, gender evolution,
and medal patterns.
Visualized traits with box and
violin plots, while uncovering
extremes like youngest, oldest,
tallest, and heaviest Olympians.
Insights & Trends
Participation grew steadily since
1896, with USA, Russia, and China
leading medals.
Women’s participation rose
significantly after the 1970s.
Heatmaps highlighted shifting
dominance across decades,
showing evolving global
competitiveness in the Olympics.
BI Dashboard
Designed Power BI dashboards
with KPIs for total medals and
athlete counts.
Integrated filters for country,
year, and sport, enabling
comparative analysis of gender
trends, decade-wise
participation, and medal
distribution through interactive
visualizations.
Outcome
Delivered an end-to-end pipeline
from raw data to insights and
dashboards.
Showcased the integration of
data engineering, analysis, and
visualization to uncover hidden
Olympic stories.
Published results through an
interactive dashboard and GitHub
repository for broader impact.