Chicago Taxi data ingestion project

This project builds a data pipeline for loading the Chicago taxi trips dataset into BigQuery for subsequent analysis.

Project Steps:

Terraform: Create a bucket in GCP and a dataset in BigQuery.
AirFlow: Pipeline for loading data into the bucket and subsequently creating an external table in BigQuery.
dbt: Create models for use in subsequent analysis.

Terraform docs

Before Running Terraform

You need to:

configure a service account in GCP
install Google Cloud SDK
authenticate in GCP
install Terraform

To create the infrastructure, run the following script:

bash run_terraform.sh

Check what it is going to do and press yes.

To destroy the created infrastructure run:

bash destroy_terraform.sh

Airflow docs

Before Running the Docker Container with Airflow

You need to place the Google credentials in the ~/.google/credentials/ directory on your machine (either local or VM).

cd ~ && mkdir -p ~/.google/credentials/
mv <path/to/your/service-account-authkey>.json ~/.google/credentials/google_credentials.json

Before running the container, remember to update:

the GCP_PROJECT_ID and GCP_GCS_BUCKET variable values in the .env file
the DOWNLOAD_START_DATE, DOWNLOAD_END_DATE, BIGQUERY_DATASET, TABLE_ID variable values in dag__data_ingestion.py

Execution:

Run the following command to build an image, initialize Airflow, and kick up all services:

bash run_airflow.sh

Login to Airflow web UI on localhost:8080 with default credentials admin/admin and run DAG named dag__data_ingestion
To shutdown all Airflow services run:

bash shutdown_airflow.sh

dbt docs

Before running models please install dbt-core or set up dbt cloud. For more details, refer to the official documentation

Commands to run dbt models:

dbt seed
dbt build

Models overview:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chicago Taxi data ingestion project

Terraform docs

Before Running Terraform

Airflow docs

Before Running the Docker Container with Airflow

dbt docs

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
airflow		airflow
analysis		analysis
dbt		dbt
pictures		pictures
terraform		terraform
.gitignore		.gitignore
README.md		README.md
destroy_terraform.sh		destroy_terraform.sh
run_airflow.sh		run_airflow.sh
run_terraform.sh		run_terraform.sh
shutdown_airflow.sh		shutdown_airflow.sh

Folders and files

Latest commit

History

Repository files navigation

Chicago Taxi data ingestion project

Terraform docs

Before Running Terraform

Airflow docs

Before Running the Docker Container with Airflow

dbt docs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages