This project automates the scraping of NBA 2024–25 season box scores from Basketball Reference and stores the raw data in a Google Cloud Storage bucket. The data is then transformed using dbt Core and loaded into Google BigQuery. From there, it is connected to Power BI to build interactive reports and dashboards.
The entire workflow is orchestrated using Apache Airflow, containerized with Docker, and deployed on Google Cloud Platform resources provisioned via Terraform.
- Docker
- Terraform
- Power BI
- GCP Account
-
Fill in the variables and rename accordingly:
- var.tf.example
- dev.env
- gcreds.json.example (Not Recommended if project is shared with others)
-
Run terraform commands inside of the terraform directory. These steps create the GCS and Bigquery infrastructure.
terraform initterraform apply
-
Run docker commands inside of the airflow directory. These steps set up the airflow orchestration inside docker.
docker compose builddocker compose up airflow-initdocker compose up -d
-
Trigger the DAG
A PowerBI report can be interacted with through this link. Several measures were created using complex DAX queries to create some of the more interesting visuals. Snapshots of the dashboard can be seen below.