GrayMatter

Three lines of code to give your AI agents persistent memory and cut token usage by 90%.

One binary. Drop it in. Run it. No Docker, no databases, no config files, no cloud accounts, no bullshit.

General-purpose MCP server. Zero vendor lock-in.
Works with Claude Code, Cursor, Codex, OpenCode, Antigravity — and any MCP-compatible client.
Also a plain Go library if you don't use MCP.

Free. Offline. No account required.

ctx := context.Background()
mem := graymatter.New(".graymatter")
mem.Remember(ctx, "agent", "user prefers bullet points, hates long intros")
facts, _ := mem.Recall(ctx, "agent", "how should I format this response?")
// ["user prefers bullet points, hates long intros"]

Why

Every AI agent is stateless by default. Each run re-injects the full conversation history — and that history grows linearly. Two prompts in and you've already burned half of your daily quota.

That's not just a memory problem. That's a money and performance problem.

Mem0, Zep, Supermemory solve this — but they're Python/TypeScript-only and require a running server. The Go ecosystem has no production-ready, embeddable, zero-dependency memory layer for agents.

That gap is GrayMatter.

~97% reduction in context tokens — versus full-history injection.
Context quality improves over time as consolidation surfaces only what matters.
No Docker. No Redis. No API key required for storage.

Drop it in once. It auto-connects to Claude Code, Cursor, Codex, OpenCode, Antigravity — any MCP-compatible client picks it up automatically.

Observability

You can't improve what you can't see.

graymatter tui opens a live terminal dashboard with everything your agent memory is doing — no extra setup required.

What you get at a glance:

Facts — total stored, distributed across agents
Memory cost — KB on disk (text + embeddings), not tokens
Recalls — cumulative access count across all sessions
Health — percentage of facts above relevance threshold (weight > 0.5)
Token cost (30d) — real spend breakdown by model, with cache hit rate
Agent activity — facts vs recalls per agent, side by side
Weight distribution — how consolidated your memory is over time
Activity timeline — facts created per day, last 30 days

The dashboard auto-refreshes every 5 seconds. Press 1–4 to switch tabs, r to force refresh, q to quit.

Install

Binary (recommended):

# Linux (x86_64)
curl -sSL -o graymatter.tar.gz https://github.com/angelnicolasc/graymatter/releases/download/v0.5.1/graymatter_0.5.1_linux_amd64.tar.gz
tar -xzf graymatter.tar.gz
sudo mv graymatter /usr/local/bin/

# Linux (ARM64)
curl -sSL -o graymatter.tar.gz https://github.com/angelnicolasc/graymatter/releases/download/v0.5.1/graymatter_0.5.1_linux_arm64.tar.gz
tar -xzf graymatter.tar.gz
sudo mv graymatter /usr/local/bin/

# macOS (Apple Silicon)
curl -sSL -o graymatter.tar.gz https://github.com/angelnicolasc/graymatter/releases/download/v0.5.1/graymatter_0.5.1_darwin_arm64.tar.gz
tar -xzf graymatter.tar.gz
sudo mv graymatter /usr/local/bin/

# Windows (PowerShell)
iwr https://github.com/angelnicolasc/graymatter/releases/download/v0.5.1/graymatter_0.5.1_windows_amd64.zip -OutFile graymatter.zip
Expand-Archive graymatter.zip -DestinationPath .\graymatter_cli

Go install:

go install github.com/angelnicolasc/graymatter/cmd/graymatter@latest

Library:

go get github.com/angelnicolasc/graymatter

MCP clients (drop-in)

graymatter init

One command auto-wires GrayMatter into every supported client at once. Existing entries from other MCP servers are merged, not overwritten — safe to run in any repo.

Client	Config file auto-wired	Scope
Claude Code	`.mcp.json`	project
Cursor	`.cursor/mcp.json`	project
Codex (OpenAI)	`~/.codex/config.toml`	home
OpenCode	`opencode.jsonc`	project
Antigravity (Google)	`mcp_config.json`	project (opt-in: `--with-antigravity`)

Narrow down what gets wired:

graymatter init --only claudecode,cursor     # whitelist
graymatter init --skip-codex --skip-opencode # blacklist
graymatter init --with-antigravity           # include opt-in clients

Then restart your editor (or toggle the MCP server off/on in its settings). Five tools become available:

Tool	What it does
`memory_search`	Recall facts for a query
`memory_add`	Store a new fact
`checkpoint_save`	Snapshot current session
`checkpoint_resume`	Restore last checkpoint
`memory_reflect`	Add / update / forget / link memories (agent self-edit)

Agents using these tools should read docs/AGENTS.md — when to store vs. checkpoint, query patterns, anti-patterns, and the exact per-tool parameter names (heads-up: memory_reflect uses agent, the other four use agent_id).

Any other MCP-compatible client

GrayMatter speaks plain MCP. If your client isn't on the table above, point it at the binary:

graymatter mcp serve              # stdio transport
graymatter mcp serve --http :8080 # HTTP transport

The schema is identical to every other MCP server — command + args: ["mcp", "serve"]. No proprietary glue.

Global install (all projects)

If you'd rather not run graymatter init in every repo, drop the same JSON into the editor's global config — ~/.cursor/mcp.json for Cursor, ~/.claude/mcp.json for Claude Code:

{
  "mcpServers": {
    "graymatter": {
      "command": "graymatter",
      "args": ["mcp", "serve"]
    }
  }
}

graymatter must be on PATH. The init command handles this automatically on Windows via the User PATH registry; on macOS / Linux the recommended install path /usr/local/bin is already on PATH.

How memories get stored

There are four ways a fact ends up in the store. You don't have to pick one — they compose:

Path	Who calls it	When to use
`mem.Remember(ctx, agent, text)`	Your code, explicitly	You already know the exact string worth keeping.
`mem.RememberExtracted(ctx, agent, llmResponse)`	Your code, on raw LLM output	You want GrayMatter to pull atomic facts out of a full response for you (LLM-assisted; falls back to storing the raw text if no API key is set).
`memory_reflect` (MCP tool)	The LLM itself, mid-session	Claude Code / Cursor agents self-curate: add, update, forget, or link memories when they notice a contradiction, finish a task, or learn a preference.
`Consolidate` (async, on by default)	Background goroutine	Summarises, decays, and prunes over time. Runs automatically after writes once `ConsolidateThreshold` is hit.

Forgetting a single Remember call is not fatal. memory_reflect lets the agent fix its own memory as it works, and Consolidate curates the store over time. That's why long interactive sessions in Claude Code Desktop and Cursor are a sweet spot for GrayMatter — not only 24/7 autonomous agents. The LLM maintains its own memory through MCP.

Library usage

Three functions cover 95% of use cases. All methods accept context.Context as the first argument so timeouts and cancellation propagate end-to-end — no wrappers needed.

import "github.com/angelnicolasc/graymatter"

ctx := context.Background()

// Open (or create) a memory store in the given directory.
mem := graymatter.New(".graymatter")
defer mem.Close()

// Always check health in production — New() never panics, but it may degrade
// to no-op mode if the data dir is unwritable or bbolt fails to open.
if !mem.Healthy() {
    log.Fatalf("graymatter: %v", mem.Status().InitError)
}

// Store an observation.
mem.Remember(ctx, "sales-closer", "Maria didn't reply Wednesday. Third touchpoint due Friday.")

// Retrieve relevant context for a query.
facts, _ := mem.Recall(ctx, "sales-closer", "follow up Maria")
// ["Maria didn't reply Wednesday. Third touchpoint due Friday."]

Context propagates everywhere — timeouts and traces work as expected:

ctx, cancel := context.WithTimeout(context.Background(), 2*time.Second)
defer cancel()

if err := mem.Remember(ctx, "agent", "observation"); err != nil { ... }
results, err := mem.Recall(ctx, "agent", "query")

Full agent pattern

ctx := context.Background()
mem := graymatter.New(project.Root + "/.graymatter")
defer mem.Close()
if !mem.Healthy() {
    log.Fatalf("graymatter: %v", mem.Status().InitError)
}

// 1. Recall before calling the LLM.
memCtx, _ := mem.Recall(ctx, skill.Name, task.Description)

messages := []anthropic.MessageParam{
    {Role: "system", Content: skill.Identity + "\n\n## Memory\n" + strings.Join(memCtx, "\n")},
    {Role: "user",   Content: task.Description},
}

// 2. Call your LLM.
response, _ := client.Messages.New(ctx, anthropic.MessageNewParams{...})

// 3a. If you already have a clean string worth keeping, store it directly.
mem.Remember(ctx, skill.Name, "Maria prefers Slack over email; replies within 2h.")

// 3b. Or let GrayMatter pull atomic facts out of the raw response for you.
//     Uses ANTHROPIC_API_KEY if set; otherwise stores the raw text as a single fact.
mem.RememberExtracted(ctx, skill.Name, responseText)

Inside Claude Code / Cursor you don't need either call — the LLM uses the memory_reflect MCP tool to self-curate. See Claude Code / Cursor (MCP) below.

Config

mem, err := graymatter.NewWithConfig(graymatter.Config{
    DataDir:          ".graymatter",
    TopK:             8,
    EmbeddingMode:    graymatter.EmbeddingAuto,  // Ollama → OpenAI → Anthropic → keyword
    OllamaURL:        "http://localhost:11434",
    OllamaModel:      "nomic-embed-text",
    AnthropicAPIKey:  os.Getenv("ANTHROPIC_API_KEY"),
    OpenAIAPIKey:     os.Getenv("OPENAI_API_KEY"),
    DecayHalfLife:    30 * 24 * time.Hour,        // 30 days
    AsyncConsolidate: true,
})

CLI

graymatter init                                    # create .graymatter/ + .mcp.json
graymatter remember "agent" "text to remember"    # store a fact
graymatter remember --shared "text"               # store in shared namespace (all agents)
graymatter recall   "agent" "query"               # print context
graymatter recall   --all "agent" "query"         # merge agent + shared memory
graymatter checkpoint list    "agent"             # show saved checkpoints
graymatter checkpoint resume  "agent"             # print latest checkpoint as JSON
graymatter mcp serve                              # start MCP server (Claude Code / Cursor)
graymatter mcp serve --http :8080                 # HTTP transport
graymatter export --format obsidian --out ~/vault # dump to Obsidian vault
graymatter tui                                    # 4-view terminal UI
graymatter run agent.md [--background]            # run a SKILL.md agent file
graymatter sessions list                          # list managed agent sessions
graymatter plugin install manifest.json           # install a plugin
graymatter server --addr :8080                    # REST API server

Global flags: --dir (data dir), --quiet, --json

Memory lifecycle

Recall(agent, task)          ← hybrid: vector + keyword + recency → top-8 facts
    ↓
Inject into system prompt    ← your 3 lines of code
    ↓
Agent runs
    ↓
Remember(agent, observation) ← store key facts during/after run
    ↓
Consolidate() [async]        ← summarise + decay + prune (LLM optional)

Consolidation is the only "smart" step. Everything else is deterministic. Without consolidation, GrayMatter still works — it just doesn't compress over time.

Consolidation auto-enables when ANTHROPIC_API_KEY is set. To use Ollama:

cfg := graymatter.DefaultConfig()
cfg.ConsolidateLLM = "ollama"

Token efficiency

Numbers produced by go run ./benchmarks/token_count — real Recall calls, keyword embedder, no LLM required:

Sessions	Full injection	GrayMatter	Reduction
1	~80 tokens	~80 tokens	0%
10	~630 tokens	~550 tokens	12%
30	~1,880 tokens	~550 tokens	71%
100	~6,960 tokens	~670 tokens	90%

Each "session" = one paragraph-length agent observation (~60 words). GrayMatter always injects only the top-8 most relevant observations for the query. With vector embeddings the recall precision improves, maintaining similar reduction ratios.

Reproduce locally:

go run ./benchmarks/token_count

Storage

Layer	Tech	What it holds
KV store	bbolt (pure Go, ACID)	Sessions, checkpoints, facts, metadata, KG
Vector index	chromem-go (pure Go)	Semantic embeddings, hybrid retrieval
Export	Markdown files	Human-readable, git-friendly, Obsidian-compatible

Single file: ~/.graymatter/gray.db
Single folder: .graymatter/vectors/

No migrations. No schema versions. Append-only with decay-based eviction.

Embeddings

GrayMatter degrades gracefully. It works without any embedding model.

Mode	When
Ollama (default)	Machine has Ollama running with `nomic-embed-text`
OpenAI	`OPENAI_API_KEY` set, Ollama not available
Anthropic	`ANTHROPIC_API_KEY` set, Ollama and OpenAI not available
Keyword-only	No embedding available — TF-IDF + recency, zero deps

Auto-detection order in EmbeddingAuto mode: Ollama → OpenAI → Anthropic → keyword.

# Pull the embedding model once (Ollama):
ollama pull nomic-embed-text

# Or set an API key (OpenAI or Anthropic):
export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...

Testing

The full test suite requires no LLM and no network — every test uses t.TempDir() with a keyword embedder or injected stubs. Runs clean on Linux, macOS, and Windows in CI.

# Core library
go test -count=1 -timeout=120s ./pkg/memory/...

# CLI / server / plugins
cd cmd/graymatter && go test -count=1 -timeout=120s ./internal/...

Package	Tests	What's covered
`pkg/memory`	42 unit tests + 3 fuzz targets	Store lifecycle, hybrid recall, RRF fusion, decay math, semaphore, concurrent writes, vector paths, dimension guard
`internal/harness`	21	Agent file parsing, retry/backoff, session recovery
`internal/kg`	21	Graph CRUD, entity extraction, weight decay, Obsidian export
`internal/server`	11	All REST endpoints, concurrent remember/recall, cancelled-context requests
`internal/plugin`	10	Install, list, remove, E2E echo plugin binary

Fuzz targets (pkg/memory): FuzzTokenize, FuzzUnmarshalFact, FuzzKeywordScore — each with a seeded corpus so they run deterministically in CI and can be extended with go test -fuzz.

Core library coverage: 73.5% (CI gate: ≥ 70%). Measured without mocks — real bbolt + chromem-go instances in a temp directory.

Token-reduction benchmark (also zero deps):

go run ./benchmarks/token_count

Build from source

git clone https://github.com/angelnicolasc/graymatter
cd graymatter
CGO_ENABLED=0 go build -ldflags="-s -w -X main.version=dev" -o graymatter ./cmd/graymatter

Output: single static binary, ~10 MB, no runtime dependencies.

Metrics & APM hooks

The REST server (graymatter server) exposes a /metrics endpoint powered by Go's standard expvar package — zero extra dependencies.

GET /metrics

{
  "requests_total":     {"remember": 120, "recall": 340, "healthz": 5},
  "request_latency_us": {"remember": 4200, "recall": 1800},
  "facts_total":        {"stored": 120},
  "recall_total":       {"served": 340}
}

For library users, memory.StoreConfig exposes hooks for APM integration:

store, err := memory.Open(memory.StoreConfig{
    DataDir:       ".graymatter",
    DecayHalfLife: 30 * 24 * time.Hour,

    // Called after every Recall with agent ID, query, result count, and latency.
    OnRecall: func(agentID, query string, n int, d time.Duration) {
        metrics.RecordHistogram("graymatter.recall.latency", d.Seconds())
    },

    // Called after every successful Put with agent ID, fact ID, and latency.
    OnPut: func(agentID, factID string, d time.Duration) {
        metrics.Increment("graymatter.facts.stored")
    },

    // Called when a vector upsert fails after the bbolt write succeeded.
    // The fact is durably queued and retried on the next reconcile tick.
    OnVectorIndexError: func(agentID, factID string, err error) {
        log.Printf("vector index lag: agent=%s fact=%s err=%v", agentID, factID, err)
    },

    // How often to drain the pending-vector queue (default 30s, 0 disables).
    VectorReconcileInterval: 30 * time.Second,

    // Routes internal log events to any standard logger.
    Logger: slog.NewLogLogger(slog.Default().Handler(), slog.LevelDebug),

    // Swap the vector backend entirely — bring your own Qdrant, pgvector, etc.
    VectorBackend: myQdrantAdapter,
})

What GrayMatter is NOT

Not tied to any vendor. It's an MCP server + Go library — not a Claude-Code-only or Cursor-only tool.
Not a framework. Not an agent runner. Not a replacement for your existing tooling.
Not a hosted service. Not a SaaS. Not a cloud product.
Not a knowledge base UI. Not Notion. Not Obsidian.
Not trying to win the enterprise memory market.

It is exactly one thing: the missing stateful layer for Go CLI agents, packaged as a library you import in three lines.

Roadmap

GrayMatter — v0.5.1 — April 2026

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.github		.github
benchmarks/token_count		benchmarks/token_count
cmd/graymatter		cmd/graymatter
docs		docs
examples		examples
pkg		pkg
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
.mcp.json		.mcp.json
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
config.go		config.go
go.mod		go.mod
go.sum		go.sum
go.work		go.work
graymatter.go		graymatter.go
graymatter_test.go		graymatter_test.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GrayMatter

Why

Observability

Install

MCP clients (drop-in)

Any other MCP-compatible client

Global install (all projects)

How memories get stored

Library usage

Full agent pattern

Config

CLI

Memory lifecycle

Token efficiency

Storage

Embeddings

Testing

Build from source

Metrics & APM hooks

What GrayMatter is NOT

Roadmap

About

Uh oh!

Releases 7

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GrayMatter

Why

Observability

Install

MCP clients (drop-in)

Any other MCP-compatible client

Global install (all projects)

How memories get stored

Library usage

Full agent pattern

Config

CLI

Memory lifecycle

Token efficiency

Storage

Embeddings

Testing

Build from source

Metrics & APM hooks

What GrayMatter is NOT

Roadmap

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Contributors

Uh oh!

Languages