2 unstable releases

0.2.0	May 14, 2026
0.1.0	May 14, 2026

#335 in Filesystem

MIT license

69KB
1.5K SLoC

knowledge-db

Self-organizing Markdown knowledge store for AI agents.

Agents write facts with ## field headers. Rust parses, indexes, and auto-organizes them into a directory tree. No SQL. No embeddings. No schema. The filesystem is the database.

knowledge append pitfalls '## tool
fastapi

## severity
high

## source
UVicorn timeout causes 504 on slow async endpoints

## fix
Set timeout_keep_alive=300'

knowledge search pitfalls 'tool:fastapi timeout'
knowledge read pitfalls

Rust stores it as pitfalls/fastapi/high/uvicorn-timeout-causes-504-on.md. Auto-splits directories when they grow too large. The agent never sees the filesystem — just reads and searches the logical category.

Install

From crates.io

cargo install knowledge-db

From source

git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db
cargo build --release
# Binary at ./target/release/knowledge

Python fallback

If the Rust binary isn't available, use the Python fallback script. It's feature-complete and drop-in compatible:

# Make executable and add to PATH, or run directly:
python3 knowledge-py append pitfalls '## tool\nfastapi\n...'
python3 knowledge-py read pitfalls
python3 knowledge-py search pitfalls 'fastapi timeout'
python3 knowledge-py stats

The Python fallback uses the same store directory and file format. Agents built on the Python fallback will seamlessly upgrade to the Rust binary later.

Categories (logical files)

Category	What goes in it
`pitfalls`	Bugs, gotchas, lessons learned
`fixes`	Solutions, workarounds, patches
`workflows`	Procedures, patterns, recipes
`facts`	General observations, preferences
`user`	User profile data, preferences

You can create arbitrary categories — just use the name. Categories are logical: an agent reads pitfalls and gets all pitfalls assembled as one document, regardless of how many files or directories are behind it.

Field format

Every entry uses ## field headers followed by values:

## tool
fastapi

## severity
high

## source
What happened, what was observed

## fix
How to fix or work around it

Common fields by category

Pitfalls:

Field	Meaning
`tool`	Framework/library involved
`severity`	high / medium / low
`source`	What happened
`fix`	How to resolve it

Fixes:

Field	Meaning
`tool`	Framework/library
`problem`	What was fixed
`solution`	How it was fixed

Facts:

Field	Meaning
`topic`	Subject area
`detail`	The fact itself

Workflows:

Field	Meaning
`task`	What this workflow does
`steps`	Ordered procedure

Rules

First line starting with ## begins a new field
Everything until the next ## line belongs to the current field
Keys are lowercased; values preserve original casing
Duplicate field names: last one wins
Lines before the first ## heading are ignored
Empty fields (heading with no content) are skipped
Values can be multi-line

Commands

`append` — Write an entry

knowledge append <category> '<markdown content>'

# From stdin:
echo '## tool\ndocker\n\n## source\nContainer leak' | knowledge append pitfalls

# From file:
knowledge append pitfalls "$(cat entry.md)"

Returns the relative path where the entry was stored.

Field tool, severity, domain, and type affect directory nesting — entries with those fields get organized into subdirectories (e.g. pitfalls/fastapi/high/timeout-bug.md).

`read` — Assemble a category

knowledge read <category>
knowledge read pitfalls

Walks the directory tree and assembles all .md files into a single output. Entries separated by ---.

Sub-path read for scoped access:

knowledge read pitfalls fastapi
# Only reads pitfalls/fastapi/...

`search` — Find entries

knowledge search <category> '<query>'

# Plain tokens — content match:
knowledge search pitfalls 'fastapi timeout'

# Field filters:
knowledge search pitfalls 'tool:fastapi'

# Mixed — field filter + content:
knowledge search pitfalls 'tool:fastapi timeout'

# Severity filter:
knowledge search pitfalls 'severity:high'

Syntax:

word — matches content (case-insensitive)
field:value — exact field value match (case-insensitive)
Multiple tokens combined with AND logic

Results ranked by match count, returned as assembled markdown.

`stats` — Store statistics

knowledge stats              # All categories
knowledge stats pitfalls     # Single category breakdown

Shows entry counts, field distribution, and top field values per category.

`dedup` — Find duplicates

knowledge dedup --dry-run                # Show what would merge
knowledge dedup --threshold 0.75         # Custom similarity (default 0.85)
knowledge dedup                          # Execute merge

Finds entries with high word-overlap similarity. Merges duplicates by adding ## merged_from to the survivor.

`prune` — Remove old entries

knowledge prune --dry-run                # Show what would delete
knowledge prune --days 30                # Remove entries older than 30 days
knowledge prune --days 90                # Default: 90 days

`mount` — Virtual filesystem

knowledge mount /tmp/knowledge-fs
knowledge unmount /tmp/knowledge-fs

Creates a read-optimized directory tree:

pitfalls.md — assembled markdown of all pitfalls
pitfalls/tool/severity/entry.md — individual entry stubs

Useful for tools that expect a filesystem (grep, editors, backup scripts).

`daemon` (optional, requires `watch` feature)

cargo install knowledge-db --features watch
knowledge daemon --mount /tmp/knowledge-fs

Watches the store directory and auto-rebuilds the virtual filesystem on changes. Useful for dashboards and live editors.

Agent setup guide

For AI agent tool definitions

If you're building an AI agent that needs persistent knowledge, expose these three tools:

# Tool: knowledge_write
def knowledge_write(category: str, content: str) -> str:
    """Append a markdown entry to the knowledge store."""
    return run(f"knowledge append {category} {shlex.quote(content)}")

# Tool: knowledge_read
def knowledge_read(category: str, path: str = None) -> str:
    """Read all entries in a category."""
    if path:
        return run(f"knowledge read {category} {path}")
    return run(f"knowledge read {category}")

# Tool: knowledge_search
def knowledge_search(category: str, query: str) -> str:
    """Search entries by tokens and field filters."""
    return run(f"knowledge search {category} {shlex.quote(query)}")

System prompt instructions

Include this in your agent's system prompt:

## Knowledge store usage

You have a persistent knowledge store at ~/.hermes/knowledge/.
Use it to remember facts, pitfalls, fixes, and workflows across sessions.

When to write:
- You discover a bug, tricky edge case, or gotcha → knowledge_write("pitfalls", ...)
- You find a solution or workaround → knowledge_write("fixes", ...)
- You learn a multi-step procedure → knowledge_write("workflows", ...)
- You observe a fact or user preference → knowledge_write("facts", ...)

Entry format uses ## field headers:
  ## tool\nfastapi\n\n## severity\nhigh\n\n## source\nDescription\n\n## fix\nSolution

Always search the store before troubleshooting: check if this problem has been seen before.

Hermes agent integration

If you use Hermes, the knowledge tool is built-in. Configure your agent to use knowledge-db as the backend:

# config.yaml
knowledge:
  backend: knowledge-db
  store_path: ~/.hermes/knowledge

The Hermes knowledge tool maps directly to knowledge-db commands — action=write maps to append, action=search maps to search, action=read maps to read.

Standalone agent integration

For agents not using Hermes, wrap the CLI:

# Write
echo '## tool
docker
## severity
medium
## source
Container memory leak under load
## fix
Set --memory limit in docker run' | knowledge append pitfalls

# Search
knowledge search pitfalls 'docker memory leak'

# Read
knowledge read pitfalls

Testing your integration

# 1. Write a test entry
knowledge append test '## topic\ntesting\n\n## detail\nIntegration works'

# 2. Read it back
knowledge read test

# 3. Search it
knowledge search test 'integration'

# 4. Check stats
knowledge stats test

# 5. Clean up (delete the test category directory)
rm -rf ~/.hermes/knowledge/test

How it works

Entry storage

Each append does this:

Parse ## field headers from the markdown
Resolve directory path from key fields (tool, severity, domain, type)
Generate filename slug from title → source → fix → first long-enough value
Acquire file lock, write .md file, update flat-text index, release lock
Check if parent directory exceeds 50 files → auto-split if needed

Auto-split engine

When a directory hits 50+ .md files, the engine:

Scans all entries in the directory
Picks the best field to split on (cardinality ratio 0.05–0.3 — produces 5–50 subdirectories)
Creates subdirectories per distinct field value
Moves files into their subdirectories
Rebuilds the index

The agent never notices. knowledge read pitfalls works identically before and after splitting.

Index

A flat-text index file at .index maps relative paths to field:value pairs. Rebuilt on startup if missing. Format:

pitfalls/fastapi/high/uvicorn-timeout.md → tool: fastapi, severity: high, source: UVicorn timeout causes 504

Concurrency

File locking via flock (fs2 crate). Multiple agents writing simultaneously won't corrupt each other's entries. Lock contention returns an error — agents should retry.

Architecture

knowledge append → markdown file → directory tree
knowledge search → flat-text index → assembled markdown
knowledge read   → tree walk → assembled markdown

Component	Lines	Purpose
`parser.rs`	127	`## field` → value extraction
`store.rs`	882	Index, append, search, read, dedup, prune, split
`main.rs`	493	CLI, stats, mount, daemon
`knowledge-py`	180	Standalone Python fallback

Binary size: ~1.4 MB release build
Dependencies: slug, fs2, serde, clap, chrono (+ notify for watch feature)
Runtime: zero runtime deps, no daemon required, no SQL, no embeddings

Why not SQLite?

At 500 files, rg beats SQLite. At 5,000 files, the auto-split engine keeps directories small. The filesystem IS the index — path components encode structure, grep handles search, ls handles listing.

Benefits:

Git-friendly: the store is just markdown files — version control works naturally
No migration: add a new field, old entries just lack it; no ALTER TABLE
No lock contention: per-file locking, not database-level
Transparent: ls, grep, cat all work directly on the store
Zero setup: no daemon, no connection string, no migrations

If you need JOINs, aggregates, or complex queries — add a SQLite read cache. But you probably don't.

Development

# Clone
git clone https://github.com/workswithagents/knowledge-db
cd knowledge-db

# Build
cargo build

# Run tests
cargo test

# Release build
cargo build --release

# Run locally
cargo run -- append test '## topic\ntesting\n\n## detail\nhello'

# With watch feature
cargo run --features watch -- daemon

Publishing to crates.io

cargo publish

License

MIT

Dependencies

~2–15MB
~118K SLoC