Augmented Memory Protocol (AMP)

A specification for how persistent memory gets injected into LLM calls.

AMP defines the context assembly pipeline — the layer between memory retrieval and prompt construction. It standardizes how external memory enters the model's context, regardless of how that memory is stored or fetched.

The Problem

LLMs have a context window — the tokens they can see in a single call. Memory systems extend this with a context space — the full persistent memory available for retrieval and injection. But there is no standard for how context moves from the space into the window.

Today, every memory system invents its own injection approach:

Some append plain text to the last user message
Some prepend to the system prompt
Some use tool calls to let the model pull memory on demand
Some use a multi-turn loop where the model iteratively queries memory

These approaches have different tradeoffs in latency, quality, and cost. AMP standardizes the interface so memory providers and LLM clients can interoperate.

Where AMP Fits

┌─────────────────────────────────────────────────┐
│                  LLM Provider                    │
│              (Anthropic, OpenAI, etc.)           │
└──────────────────────┬──────────────────────────┘
                       │
              ┌────────▼────────┐
              │   AMP Layer     │  ← context assembly,
              │                 │    gating, budgeting,
              │  (this spec)    │    injection, provenance
              └────────┬────────┘
                       │
              ┌────────▼────────┐
              │  Memory Source   │  ← MCP servers, REST APIs,
              │                 │    vector DBs, knowledge graphs,
              │ (any protocol)  │    local files, etc.
              └─────────────────┘

AMP operates above data fetching protocols like MCP and below the LLM API call. It is transport-agnostic and storage-agnostic.

Core Concepts

Context Window

The number of tokens an LLM can process in a single call. This is a model constraint (e.g., 200K tokens for Claude, 128K for GPT-4).

Context Space

The total persistent memory available for retrieval and injection. This is a system property, not a model property. A context space might contain millions of chunks spanning years of conversations, documents, and knowledge — far exceeding any model's context window.

Context Assembly

The process of selecting, ranking, budgeting, and formatting memory from the context space for injection into the context window. This is what AMP standardizes.

What AMP Specifies

1. Injection Strategies

How memory enters the prompt. AMP defines three standard strategies:

Append — memory appended to the last user message as formatted text
System — memory injected into the system prompt
Tool — memory delivered via tool call results in a multi-turn loop

Each strategy has defined behavior, tradeoffs, and compatibility notes.

2. Quality Gating

When to skip injection entirely. AMP defines a gating pipeline:

Gate 1: Rule-based — skip for greetings, commands, trivial queries (0ms)
Gate 2: Semantic routing — embedding similarity to determine if memory is relevant (0ms marginal)
Gate 3: Post-retrieval — relevance threshold on retrieved results

3. Token Budgeting

How to fit context space content into the context window:

Maximum injection budget (absolute tokens or percentage of window)
Chunk prioritization (relevance score, recency, diversity)
Truncation strategy (per-chunk limit, total limit)

4. Provenance

How injected content is attributed:

Source identifier (which memory system provided it)
Retrieval metadata (score, timestamp, session origin)
Injection metadata (strategy used, budget consumed)

5. Multi-turn Retrieval

The iterative approach where the model queries memory via tool calls:

Tool schema for memory search
Maximum round limit
Termination conditions
Fallback to single-shot injection

6. Session Exclusion

Preventing the current conversation from being injected as context:

Session identification
Recency filters
Deduplication rules

AMP vs MCP

AMP and MCP (Model Context Protocol) are complementary specifications that operate at different layers. They do not compete.

	MCP	AMP
Purpose	How clients discover and fetch data from servers	How fetched data gets assembled and injected into LLM prompts
Scope	Server discovery, resources, tools, prompts, sampling	Context assembly, gating, budgeting, injection, provenance
Layer	Between LLM client and data/tool servers	Between memory retrieval and the LLM API call
Transport	JSON-RPC 2.0 over stdio/HTTP	Transport-agnostic (works with any retrieval mechanism)
Question answered	"How do I get memory data?"	"How does memory data enter the prompt?"

MCP gets data to the client but does not specify what happens next. AMP picks up where MCP stops — defining how retrieved memory is selected, ranked, budgeted, formatted, and placed into the LLM request.

An AMP host can use MCP to fetch memory from servers, or it can use REST APIs, local vector databases, or any other retrieval mechanism. The two protocols compose naturally:

LLM Client → AMP Host → MCP Server → Vector DB
                 ↓
            LLM Provider

For the full technical comparison, see Appendix A of the specification.

Specification

See spec/ for the full specification.

Reference Implementation

Memoryport implements AMP across its proxy, MCP server, and hosted API.

Status

AMP is in early development. The specification is not yet stable.

License

This specification is licensed under Apache-2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
spec		spec
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Augmented Memory Protocol (AMP)

The Problem

Where AMP Fits

Core Concepts

Context Window

Context Space

Context Assembly

What AMP Specifies

1. Injection Strategies

2. Quality Gating

3. Token Budgeting

4. Provenance

5. Multi-turn Retrieval

6. Session Exclusion

AMP vs MCP

Specification

Reference Implementation

Status

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Augmented Memory Protocol (AMP)

The Problem

Where AMP Fits

Core Concepts

Context Window

Context Space

Context Assembly

What AMP Specifies

1. Injection Strategies

2. Quality Gating

3. Token Budgeting

4. Provenance

5. Multi-turn Retrieval

6. Session Exclusion

AMP vs MCP

Specification

Reference Implementation

Status

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages