This project is a modular, privacy-first AI coding assistant designed for construction project management and general coding tasks. It builds on the local-first AI philosophy, running entirely on your machine with no cloud APIs or external dependencies. The agent can read, list, and edit files, and perform essential math operations for sizing and calculations in construction workflows.
- Privacy-First Architecture: 100% local execution with no cloud dependencies - your construction data never leaves your machine
- Digital Twin Memory: Each workspace maintains an intelligent context summary, giving the AI long-term understanding of your projects without loading full conversation history
- Construction-Focused: Built-in mathematical tools specifically designed for engineering calculations (stability analysis, material sizing, etc.)
- True Multi-Workspace: Isolated project contexts with independent memory - switch between different construction projects seamlessly
- Extensible by Design: Registry-based tool system lets you add domain-specific tools without modifying core agent code
- Dual Interface: Choose between lightweight CLI or modern web UI depending on your workflow
- Ollama: Download from ollama.com to run local LLMs (e.g., qwen3:4b)
- qwen3:4b model: Efficient 4B parameter model (auto-downloaded by Ollama)
- uv package manager: For Python dependency management (installation guide)
- Python 3.12+: Managed automatically by uv
- RAM: 4GB+ recommended
- Storage: ~3GB for model files
- OS: Linux, macOS, or Windows
- Internet: Only needed for initial setup/model download
- Local-First AI: No data leaves your machine; complete privacy and zero API costs
- Modular Design: Tools are externalized in
tools.pyand loaded via a registry for easy extension and review - Async Handling: The agent and main loop are fully async, supporting streaming responses for fast, responsive interaction.
- Streamlit UI: Modern chat interface with sticky input, chat bubbles, sidebar controls, and history management.
- Long-Term History: All chat history is stored in a local SQLite database (
chat_history.sqlite) for persistence across sessions, managed viahistory.py. - Configurable System Prompt: Easily change the assistant's behavior via the
.envfile - Logging: All interactions are logged for traceability and debugging
- No Manual Dependency Installation: Uses uv's inline dependencies in script headers
Create a .env file in the project root to configure the assistant. The following variables are supported:
ENDPOINT: The URL of your local or remote Ollama server (e.g.,http://localhost:11434).MODEL: The default model to use (e.g.,qwen3:4b,qwen3:30b).SYSTEM_PROMPT: The system prompt that controls the assistant's behavior and tone.
Example .env:
ENDPOINT=http://localhost:11434
MODEL=qwen3:30b
SYSTEM_PROMPT=You are a helpful coding assistant operating in a terminal environment. Output only plain text without markdown formatting, as your responses appear directly in the terminal. Be concise but thorough, providing clear and practical advice with a friendly tone. Don't use any asterisk characters in your responses.
You can use either uv (recommended for speed and reproducibility) or standard Python venv with requirements.txt.
uv venv .venv
source .venv/bin/activate
uv pip install -r requirements.txtpython3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtNote: Always activate your virtual environment before running Streamlit or the agent.
-
Install uv
- Linux/macOS:
curl -LsSf https://astral.sh/uv/install.sh | sh - Windows:
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
- Linux/macOS:
-
Install Ollama
- Linux/macOS:
curl -fsSL https://ollama.com/install.sh | sh - Windows: Download from ollama.com
- Linux/macOS:
-
Ensure Ollama server is running and pull the model
- Make sure the Ollama server is running on the port specified in your
.envfile (default:http://localhost:11434). - You can check if Ollama is running with:
curl http://localhost:11434/api/tags
- If not running, start it with:
ollama serve
- Then check if the model you want to use (e.g.,
qwen3:4borqwen3:30b) is already available:If the model is not listed, pull it with:ollama list | grep qwen3:4b ollama list | grep qwen3:8b ollama list | grep qwen3:30b ollama list | grep gpt-oss:20b
ollama pull qwen3:4b # or ollama pull qwen3:8b # or ollama pull qwen3:30b # or ollama pull gpt-oss:20b
- Make sure the Ollama server is running on the port specified in your
-
Run the agent in CLI
uv run main.py
-
Or you can run the modern chat interface with:
uv run streamlit_app.pyThis project is based on and inspired by:
- single-file-ai-agent-tutorial (starting point for this codebase)
- Dave Ebbelaar's implementation
- Francis Beeson's implementation
- Thorsten Ball's tutorial
- File Operations: Read, list, and edit files directly from the chat interface
- Math Tools: Add, subtract, multiply, divide, sqrt, power for construction calculations
- Persistent History: SQLite database stores all conversations for long-term retention across sessions
- Context Summaries: Automatic workspace summarization for efficient memory management
- Async Architecture: Fully async agent and streaming responses for real-time interaction
- Modern Streamlit UI: Chat bubbles, sticky input bar, sidebar navigation, and workspace management
- Workspace Operations: Create, switch, rename, and delete workspaces with independent contexts
- Configurable Behavior: Customize system prompt, model, and endpoint via
.envfile - Comprehensive Logging: All interactions logged for traceability and debugging
- Zero Setup Friction: Uses uv for dependency management with inline script headers
Apache License 2.0. See LICENSE for details.