This project implements a modern ELT data pipeline using dbt, Snowflake, and Apache Airflow. The pipeline extracts data from Snowflake's TPCH sample dataset, transforms it using dbt models, and orchestrates the workflow with Airflow.
This pipeline demonstrates a typical ELT architecture used in modern analytics engineering:
-
Raw data source Data comes from Snowflake's
snowflake_sample_data.tpch_sf1dataset. -
Staging layer dbt models clean and standardize raw source tables.
-
Intermediate transformations Business logic and joins are applied to prepare analytics-ready tables.
-
Data marts / fact tables Aggregated datasets optimized for analytics and reporting.
-
Orchestration Airflow schedules and executes dbt runs.
The result is a clean analytics-ready fact table (fact_orders) containing order-level metrics.
Below is the transformation lineage generated by dbt.
The pipeline follows a layered transformation approach:
Source Tables (Snowflake TPCH)
│
▼
Staging Models
(stg_tpch_orders, stg_tpch_line_items)
│
▼
Intermediate Models
(int_order_items, int_order_items_summary)
│
▼
Fact Table
(fact_orders)
The pipeline is orchestrated using Apache Airflow, which schedules and runs the dbt workflow.
The DAG performs:
- dbt dependency installation
- dbt model execution
- dbt tests
This ensures transformations and validations are executed automatically.
| Tool | Purpose |
|---|---|
| Snowflake | Cloud data warehouse |
| dbt | Data transformation and modeling |
| Apache Airflow 3 | Workflow orchestration |
| Astronomer Cosmos | Integrates dbt with Airflow |
| Python | Pipeline orchestration |
| uv | Python dependency and virtual environment management |
| Docker | Containerized environment |
project-root
│
├── dags/
│ └── dbt_dag.py
│
├── dbt_project/
│ ├── models/
│ │ ├── staging/
│ │ └── marts/
│ │
│ ├── macros/
│ └── tests/
│
├── media/
│ ├── airflow_dag.png
│ └── dbt_lineage.png
│
├── Dockerfile
├── requirements.txt
└── README.md
The pipeline uses Snowflake's TPCH sample dataset:
orderslineitem
These tables act as raw source data.
The staging models standardize column naming and structure.
Example:
stg_tpch_orders
stg_tpch_line_items
Key operations:
- column renaming
- surrogate key generation
- source tests (not null, uniqueness)
Intermediate models combine and enrich staging tables.
Example:
int_order_items
int_order_items_summary
Key operations:
- joins between orders and line items
- calculation of discount metrics
The final analytics model:
fact_orders
This table includes:
- order information
- aggregated item sales
- discount calculations
The pipeline includes both generic and singular dbt tests.
uniquenot_nullrelationshipsaccepted_values
Custom SQL tests validate business logic:
- discount values cannot be positive
- order dates must be within valid ranges
git clone https://github.com/maithtruong/sales_pipeline.git
cd sales_pipelineThis project uses uv for environment management.
uv venv
source .venv/bin/activateInstall dependencies:
uv pip install -r requirements.txtRun the following SQL to create required resources:
- warehouse
- database
- role
- schema
(see SQL script in the project documentation)
Update profiles.yml with your Snowflake credentials.
Example configuration:
warehouse: dbt_wh
database: dbt_db
schema: dbt_schema
role: dbt_role
Install dependencies:
dbt depsRun models:
dbt runExecute tests:
dbt testBuild containers and start Airflow:
docker compose up --buildOpen the Airflow UI:
http://localhost:8080
Trigger the dbt_dag.
This implementation includes several modern improvements:
| Change | Description |
|---|---|
| uv instead of pip/venv | Faster dependency management |
| Latest dbt version | Updated syntax and compatibility |
| Airflow 3 | New scheduler and runtime improvements |
This project demonstrates:
- building a modern ELT pipeline
- dbt modeling best practices
- data testing and validation
- workflow orchestration with Airflow
- integration between Airflow and dbt
This project is based on the following tutorial:
https://www.youtube.com/watch?v=OLXkGB7krGo
Many thanks to the author for the excellent guide on building an ELT pipeline with dbt, Snowflake, and Airflow.
This implementation follows the tutorial while introducing some updates, including:
- uv-based Python environment management
- latest dbt version
- Apache Airflow 3

