2 unstable releases
| 0.2.0 | May 14, 2026 |
|---|---|
| 0.1.0 | May 14, 2026 |
#335 in Filesystem
69KB
1.5K
SLoC
knowledge-db
Self-organizing Markdown knowledge store for AI agents.
Agents write facts with ## field headers. Rust parses, indexes, and auto-organizes them into a directory tree. No SQL. No embeddings. No schema. The filesystem is the database.
knowledge append pitfalls '## tool
fastapi
## severity
high
## source
UVicorn timeout causes 504 on slow async endpoints
## fix
Set timeout_keep_alive=300'
knowledge search pitfalls 'tool:fastapi timeout'
knowledge read pitfalls
Rust stores it as pitfalls/fastapi/high/uvicorn-timeout-causes-504-on.md. Auto-splits directories when they grow too large. The agent never sees the filesystem — just reads and searches the logical category.
Install
From crates.io
cargo install knowledge-db
From source
git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
cargo build --release
# Binary at ./target/release/knowledge
Python fallback
If the Rust binary isn't available, use the Python fallback script. It's feature-complete and drop-in compatible:
# Make executable and add to PATH, or run directly:
python3 knowledge-py append pitfalls '## tool\nfastapi\n...'
python3 knowledge-py read pitfalls
python3 knowledge-py search pitfalls 'fastapi timeout'
python3 knowledge-py stats
The Python fallback uses the same store directory and file format. Agents built on the Python fallback will seamlessly upgrade to the Rust binary later.
Categories (logical files)
| Category | What goes in it |
|---|---|
pitfalls |
Bugs, gotchas, lessons learned |
fixes |
Solutions, workarounds, patches |
workflows |
Procedures, patterns, recipes |
facts |
General observations, preferences |
user |
User profile data, preferences |
You can create arbitrary categories — just use the name. Categories are logical: an agent reads pitfalls and gets all pitfalls assembled as one document, regardless of how many files or directories are behind it.
Field format
Every entry uses ## field headers followed by values:
## tool
fastapi
## severity
high
## source
What happened, what was observed
## fix
How to fix or work around it
Common fields by category
Pitfalls:
| Field | Meaning |
|---|---|
tool |
Framework/library involved |
severity |
high / medium / low |
source |
What happened |
fix |
How to resolve it |
Fixes:
| Field | Meaning |
|---|---|
tool |
Framework/library |
problem |
What was fixed |
solution |
How it was fixed |
Facts:
| Field | Meaning |
|---|---|
topic |
Subject area |
detail |
The fact itself |
Workflows:
| Field | Meaning |
|---|---|
task |
What this workflow does |
steps |
Ordered procedure |
Rules
- First line starting with
##begins a new field - Everything until the next
##line belongs to the current field - Keys are lowercased; values preserve original casing
- Duplicate field names: last one wins
- Lines before the first
##heading are ignored - Empty fields (heading with no content) are skipped
- Values can be multi-line
Commands
append — Write an entry
knowledge append <category> '<markdown content>'
# From stdin:
echo '## tool\ndocker\n\n## source\nContainer leak' | knowledge append pitfalls
# From file:
knowledge append pitfalls "$(cat entry.md)"
Returns the relative path where the entry was stored.
Field tool, severity, domain, and type affect directory nesting — entries with those fields get organized into subdirectories (e.g. pitfalls/fastapi/high/timeout-bug.md).
read — Assemble a category
knowledge read <category>
knowledge read pitfalls
Walks the directory tree and assembles all .md files into a single output. Entries separated by ---.
Sub-path read for scoped access:
knowledge read pitfalls fastapi
# Only reads pitfalls/fastapi/...
search — Find entries
knowledge search <category> '<query>'
# Plain tokens — content match:
knowledge search pitfalls 'fastapi timeout'
# Field filters:
knowledge search pitfalls 'tool:fastapi'
# Mixed — field filter + content:
knowledge search pitfalls 'tool:fastapi timeout'
# Severity filter:
knowledge search pitfalls 'severity:high'
Syntax:
word— matches content (case-insensitive)field:value— exact field value match (case-insensitive)- Multiple tokens combined with AND logic
Results ranked by match count, returned as assembled markdown.
stats — Store statistics
knowledge stats # All categories
knowledge stats pitfalls # Single category breakdown
Shows entry counts, field distribution, and top field values per category.
dedup — Find duplicates
knowledge dedup --dry-run # Show what would merge
knowledge dedup --threshold 0.75 # Custom similarity (default 0.85)
knowledge dedup # Execute merge
Finds entries with high word-overlap similarity. Merges duplicates by adding ## merged_from to the survivor.
prune — Remove old entries
knowledge prune --dry-run # Show what would delete
knowledge prune --days 30 # Remove entries older than 30 days
knowledge prune --days 90 # Default: 90 days
mount — Virtual filesystem
knowledge mount /tmp/knowledge-fs
knowledge unmount /tmp/knowledge-fs
Creates a read-optimized directory tree:
pitfalls.md— assembled markdown of all pitfallspitfalls/tool/severity/entry.md— individual entry stubs
Useful for tools that expect a filesystem (grep, editors, backup scripts).
daemon (optional, requires watch feature)
cargo install knowledge-db --features watch
knowledge daemon --mount /tmp/knowledge-fs
Watches the store directory and auto-rebuilds the virtual filesystem on changes. Useful for dashboards and live editors.
Agent setup guide
For AI agent tool definitions
If you're building an AI agent that needs persistent knowledge, expose these three tools:
# Tool: knowledge_write
def knowledge_write(category: str, content: str) -> str:
"""Append a markdown entry to the knowledge store."""
return run(f"knowledge append {category} {shlex.quote(content)}")
# Tool: knowledge_read
def knowledge_read(category: str, path: str = None) -> str:
"""Read all entries in a category."""
if path:
return run(f"knowledge read {category} {path}")
return run(f"knowledge read {category}")
# Tool: knowledge_search
def knowledge_search(category: str, query: str) -> str:
"""Search entries by tokens and field filters."""
return run(f"knowledge search {category} {shlex.quote(query)}")
System prompt instructions
Include this in your agent's system prompt:
## Knowledge store usage
You have a persistent knowledge store at ~/.hermes/knowledge/.
Use it to remember facts, pitfalls, fixes, and workflows across sessions.
When to write:
- You discover a bug, tricky edge case, or gotcha → knowledge_write("pitfalls", ...)
- You find a solution or workaround → knowledge_write("fixes", ...)
- You learn a multi-step procedure → knowledge_write("workflows", ...)
- You observe a fact or user preference → knowledge_write("facts", ...)
Entry format uses ## field headers:
## tool\nfastapi\n\n## severity\nhigh\n\n## source\nDescription\n\n## fix\nSolution
Always search the store before troubleshooting: check if this problem has been seen before.
Hermes agent integration
If you use Hermes, the knowledge tool is built-in. Configure your agent to use knowledge-db as the backend:
# config.yaml
knowledge:
backend: knowledge-db
store_path: ~/.hermes/knowledge
The Hermes knowledge tool maps directly to knowledge-db commands — action=write maps to append, action=search maps to search, action=read maps to read.
Standalone agent integration
For agents not using Hermes, wrap the CLI:
# Write
echo '## tool
docker
## severity
medium
## source
Container memory leak under load
## fix
Set --memory limit in docker run' | knowledge append pitfalls
# Search
knowledge search pitfalls 'docker memory leak'
# Read
knowledge read pitfalls
Testing your integration
# 1. Write a test entry
knowledge append test '## topic\ntesting\n\n## detail\nIntegration works'
# 2. Read it back
knowledge read test
# 3. Search it
knowledge search test 'integration'
# 4. Check stats
knowledge stats test
# 5. Clean up (delete the test category directory)
rm -rf ~/.hermes/knowledge/test
How it works
Entry storage
Each append does this:
- Parse
## fieldheaders from the markdown - Resolve directory path from key fields (
tool,severity,domain,type) - Generate filename slug from
title→source→fix→ first long-enough value - Acquire file lock, write
.mdfile, update flat-text index, release lock - Check if parent directory exceeds 50 files → auto-split if needed
Auto-split engine
When a directory hits 50+ .md files, the engine:
- Scans all entries in the directory
- Picks the best field to split on (cardinality ratio 0.05–0.3 — produces 5–50 subdirectories)
- Creates subdirectories per distinct field value
- Moves files into their subdirectories
- Rebuilds the index
The agent never notices. knowledge read pitfalls works identically before and after splitting.
Index
A flat-text index file at .index maps relative paths to field:value pairs. Rebuilt on startup if missing. Format:
pitfalls/fastapi/high/uvicorn-timeout.md → tool: fastapi, severity: high, source: UVicorn timeout causes 504
Concurrency
File locking via flock (fs2 crate). Multiple agents writing simultaneously won't corrupt each other's entries. Lock contention returns an error — agents should retry.
Architecture
knowledge append → markdown file → directory tree
knowledge search → flat-text index → assembled markdown
knowledge read → tree walk → assembled markdown
| Component | Lines | Purpose |
|---|---|---|
parser.rs |
127 | ## field → value extraction |
store.rs |
882 | Index, append, search, read, dedup, prune, split |
main.rs |
493 | CLI, stats, mount, daemon |
knowledge-py |
180 | Standalone Python fallback |
- Binary size: ~1.4 MB release build
- Dependencies: slug, fs2, serde, clap, chrono (+ notify for watch feature)
- Runtime: zero runtime deps, no daemon required, no SQL, no embeddings
Why not SQLite?
At 500 files, rg beats SQLite. At 5,000 files, the auto-split engine keeps directories small. The filesystem IS the index — path components encode structure, grep handles search, ls handles listing.
Benefits:
- Git-friendly: the store is just markdown files — version control works naturally
- No migration: add a new field, old entries just lack it; no ALTER TABLE
- No lock contention: per-file locking, not database-level
- Transparent:
ls,grep,catall work directly on the store - Zero setup: no daemon, no connection string, no migrations
If you need JOINs, aggregates, or complex queries — add a SQLite read cache. But you probably don't.
Development
# Clone
git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
# Build
cargo build
# Run tests
cargo test
# Release build
cargo build --release
# Run locally
cargo run -- append test '## topic\ntesting\n\n## detail\nhello'
# With watch feature
cargo run --features watch -- daemon
Publishing to crates.io
cargo publish
License
MIT
Dependencies
~2–15MB
~118K SLoC