Thanks to visit codestin.com
Credit goes to github.com

Skip to content

davnnis2003/jimwurst

Repository files navigation

jimwurst

jimwurst

"Es ist mir wurst." Germans 🇩🇪

It is a German expression meaning that "It doesn't matter to me", literally translated as "This is sausage to me". 🌭 (yeah no there is no sausage emoji so I am putting a hotdog here instead)

Casual in attitude, serious about DWH architecture.

Table of Contents

Tenet

This is a monolithic repository of a DWH for personal data analytics and AI. Everything in this repo is expected to be 100% running locally.

A very core idea here is tools agnostic. Any tooling in modern data stack will be abstracted, and materialize in places like folder structure. Open source tooling will be prioritized.

The pholosophy behind can be found in Data Biz.

Getting Started

1. Prerequisites

  • Docker must be installed and running.
  • Ollama must be installed.

2. Launch Everything

Run the following command to spin up the database, pull the AI model, and launch the AI Agent interface:

make up

This will:

  1. Start Postgres (Docker).
  2. Pull the required LLM (qwen2.5:3b).
  3. Launch the Streamlit web interface at http://localhost:8501.

3. Connectivity & Schemas

You can connect to the local PostgreSQL instance with:

  • Host: localhost | Port: 5432 | User/Pass: jimwurst_user/jimwurst_password

The following schemas are initialized by default:

  • marts, intermediate, staging, and s_<app_name> (ODS).

Python Environment Setup

This project uses uv for fast, reliable Python dependency management.

Installing uv

curl -LsSf https://astral.sh/uv/install.sh | sh

After installation, restart your shell or run:

source $HOME/.local/bin/env

Managing Dependencies

Create a virtual environment:

uv venv

Install dependencies:

uv pip install -r requirements.txt

Add a new dependency:

uv pip install <package>
uv pip freeze > requirements.txt

Sync dependencies (ensure exact match with requirements.txt):

uv pip sync requirements.txt

Abstraction & Toolings

Data Ops

Data Engineering

Scalability (Optional)

For larger-scale data operations, the following tools can be integrated:

🏗 Folder Structure

Each application follows a strict modular structure using snake_case. Tooling is materialized through structure:

.
├── .github/                 # GitHub Actions workflows and CI config
├── apps/                    # Tool-specific configurations and deployments
│   ├── data_ingestion/      # Ingestion tools
│   │   └── airbyte/
│   ├── data_transformation/ # Transformation tools
│   │   └── dbt/             # Central dbt project
│   ├── data_activation/     # BI & activation tools
│   │   └── metabase/
│   └── job_orchestration/   # Orchestration tools
│       └── airflow/
├── docker/                  # Local orchestration (Docker Compose, .env)
├── docs/                    # Documentation, diagrams, and architecture RFCs
├── prompts/                 # AI system prompts and LLM context files
└── utils/                   # Shared internal packages (Python utils, custom operators)

About

A monolithc reposiory of a DWH for personal data analytics and AI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published