"Es ist mir wurst." Germans 🇩🇪
It is a German expression meaning that "It doesn't matter to me", literally translated as "This is sausage to me". 🌭 (yeah no there is no sausage emoji so I am putting a hotdog here instead)
Casual in attitude, serious about DWH architecture.
This is a monolithic repository of a DWH for personal data analytics and AI. Everything in this repo is expected to be 100% running locally.
A very core idea here is tools agnostic. Any tooling in modern data stack will be abstracted, and materialize in places like folder structure. Open source tooling will be prioritized.
The pholosophy behind can be found in Data Biz.
Run the following command to spin up the database, pull the AI model, and launch the AI Agent interface:
make upThis will:
- Start Postgres (Docker).
- Pull the required LLM (
qwen2.5:3b). - Launch the Streamlit web interface at
http://localhost:8501.
You can connect to the local PostgreSQL instance with:
- Host:
localhost| Port:5432| User/Pass:jimwurst_user/jimwurst_password
The following schemas are initialized by default:
marts,intermediate,staging, ands_<app_name>(ODS).
This project uses uv for fast, reliable Python dependency management.
curl -LsSf https://astral.sh/uv/install.sh | shAfter installation, restart your shell or run:
source $HOME/.local/bin/envCreate a virtual environment:
uv venvInstall dependencies:
uv pip install -r requirements.txtAdd a new dependency:
uv pip install <package>
uv pip freeze > requirements.txtSync dependencies (ensure exact match with requirements.txt):
uv pip sync requirements.txt- Containerization: Docker
- CI/CD: Github Actions
- Job Orchestration: Python / Makefile
- DWH: Postgres
- Package Manager: uv
- Data Ingestion: Python / SQL
- Data Transformation: dbt Core
- Data Activation:
For larger-scale data operations, the following tools can be integrated:
- Job Orchestration: Apache Airflow
- Data Ingestion: Airbyte
Each application follows a strict modular structure using snake_case. Tooling is materialized through structure:
.
├── .github/ # GitHub Actions workflows and CI config
├── apps/ # Tool-specific configurations and deployments
│ ├── data_ingestion/ # Ingestion tools
│ │ └── airbyte/
│ ├── data_transformation/ # Transformation tools
│ │ └── dbt/ # Central dbt project
│ ├── data_activation/ # BI & activation tools
│ │ └── metabase/
│ └── job_orchestration/ # Orchestration tools
│ └── airflow/
├── docker/ # Local orchestration (Docker Compose, .env)
├── docs/ # Documentation, diagrams, and architecture RFCs
├── prompts/ # AI system prompts and LLM context files
└── utils/ # Shared internal packages (Python utils, custom operators)