Thanks to visit codestin.com
Credit goes to github.com

Skip to content

t8/amp-spec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Augmented Memory Protocol (AMP)

A specification for how persistent memory gets injected into LLM calls.

AMP defines the context assembly pipeline — the layer between memory retrieval and prompt construction. It standardizes how external memory enters the model's context, regardless of how that memory is stored or fetched.

The Problem

LLMs have a context window — the tokens they can see in a single call. Memory systems extend this with a context space — the full persistent memory available for retrieval and injection. But there is no standard for how context moves from the space into the window.

Today, every memory system invents its own injection approach:

  • Some append plain text to the last user message
  • Some prepend to the system prompt
  • Some use tool calls to let the model pull memory on demand
  • Some use a multi-turn loop where the model iteratively queries memory

These approaches have different tradeoffs in latency, quality, and cost. AMP standardizes the interface so memory providers and LLM clients can interoperate.

Where AMP Fits

┌─────────────────────────────────────────────────┐
│                  LLM Provider                    │
│              (Anthropic, OpenAI, etc.)           │
└──────────────────────┬──────────────────────────┘
                       │
              ┌────────▼────────┐
              │   AMP Layer     │  ← context assembly,
              │                 │    gating, budgeting,
              │  (this spec)    │    injection, provenance
              └────────┬────────┘
                       │
              ┌────────▼────────┐
              │  Memory Source   │  ← MCP servers, REST APIs,
              │                 │    vector DBs, knowledge graphs,
              │ (any protocol)  │    local files, etc.
              └─────────────────┘

AMP operates above data fetching protocols like MCP and below the LLM API call. It is transport-agnostic and storage-agnostic.

Core Concepts

Context Window

The number of tokens an LLM can process in a single call. This is a model constraint (e.g., 200K tokens for Claude, 128K for GPT-4).

Context Space

The total persistent memory available for retrieval and injection. This is a system property, not a model property. A context space might contain millions of chunks spanning years of conversations, documents, and knowledge — far exceeding any model's context window.

Context Assembly

The process of selecting, ranking, budgeting, and formatting memory from the context space for injection into the context window. This is what AMP standardizes.

What AMP Specifies

1. Injection Strategies

How memory enters the prompt. AMP defines three standard strategies:

  • Append — memory appended to the last user message as formatted text
  • System — memory injected into the system prompt
  • Tool — memory delivered via tool call results in a multi-turn loop

Each strategy has defined behavior, tradeoffs, and compatibility notes.

2. Quality Gating

When to skip injection entirely. AMP defines a gating pipeline:

  • Gate 1: Rule-based — skip for greetings, commands, trivial queries (0ms)
  • Gate 2: Semantic routing — embedding similarity to determine if memory is relevant (0ms marginal)
  • Gate 3: Post-retrieval — relevance threshold on retrieved results

3. Token Budgeting

How to fit context space content into the context window:

  • Maximum injection budget (absolute tokens or percentage of window)
  • Chunk prioritization (relevance score, recency, diversity)
  • Truncation strategy (per-chunk limit, total limit)

4. Provenance

How injected content is attributed:

  • Source identifier (which memory system provided it)
  • Retrieval metadata (score, timestamp, session origin)
  • Injection metadata (strategy used, budget consumed)

5. Multi-turn Retrieval

The iterative approach where the model queries memory via tool calls:

  • Tool schema for memory search
  • Maximum round limit
  • Termination conditions
  • Fallback to single-shot injection

6. Session Exclusion

Preventing the current conversation from being injected as context:

  • Session identification
  • Recency filters
  • Deduplication rules

AMP vs MCP

AMP and MCP (Model Context Protocol) are complementary specifications that operate at different layers. They do not compete.

MCP AMP
Purpose How clients discover and fetch data from servers How fetched data gets assembled and injected into LLM prompts
Scope Server discovery, resources, tools, prompts, sampling Context assembly, gating, budgeting, injection, provenance
Layer Between LLM client and data/tool servers Between memory retrieval and the LLM API call
Transport JSON-RPC 2.0 over stdio/HTTP Transport-agnostic (works with any retrieval mechanism)
Question answered "How do I get memory data?" "How does memory data enter the prompt?"

MCP gets data to the client but does not specify what happens next. AMP picks up where MCP stops — defining how retrieved memory is selected, ranked, budgeted, formatted, and placed into the LLM request.

An AMP host can use MCP to fetch memory from servers, or it can use REST APIs, local vector databases, or any other retrieval mechanism. The two protocols compose naturally:

LLM Client → AMP Host → MCP Server → Vector DB
                 ↓
            LLM Provider

For the full technical comparison, see Appendix A of the specification.

Specification

See spec/ for the full specification.

Reference Implementation

Memoryport implements AMP across its proxy, MCP server, and hosted API.

Status

AMP is in early development. The specification is not yet stable.

License

This specification is licensed under Apache-2.0.

About

Augmented Memory Protocol — an open protocol for AI memory

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors