Summarization middleware for automatic and tool-based conversation compaction.
This module provides two middleware classes:
SummarizationMiddleware — automatically compacts the conversation when token
usage exceeds a configurable threshold.
Older messages are summarized via an LLM call and the full history is offloaded to a backend for later retrieval.
SummarizationToolMiddleware — exposes a compact_conversation tool that
lets the agent (or a human-in-the-loop approval flow) trigger compaction on
demand.
Composes with a SummarizationMiddleware instance and reuses its
summarization engine.
from deepagents import create_deep_agent
from deepagents.middleware.summarization import (
SummarizationMiddleware,
SummarizationToolMiddleware,
)
from deepagents.backends import FilesystemBackend
backend = FilesystemBackend(root_dir="/data")
summ = SummarizationMiddleware(
model="gpt-4o-mini",
backend=backend,
trigger=("fraction", 0.85),
keep=("fraction", 0.10),
)
tool_mw = SummarizationToolMiddleware(summ)
agent = create_deep_agent(middleware=[summ, tool_mw])
Offloaded messages are stored as markdown at /conversation_history/{thread_id}.md.
Each summarization event appends a new section to this file, creating a running lo of all evicted messages.
Append text to a system message.
Compute default summarization settings based on model profile.
Create a SummarizationMiddleware with model-aware defaults.
Computes trigger, keep, and truncation settings from the model's profile (or uses fixed-token fallbacks) and returns a configured middleware.
Protocol for pluggable memory backends (single, unified).
Backends can store files in different locations (state, filesystem, database, etc.) and provide a uniform interface for file operations.
All file data is represented as dicts with the following structure: { "content": list[str], # Lines of text content "created_at": str, # ISO format timestamp "modified_at": str, # ISO format timestamp }
Represents a summarization event.
Settings for truncating large tool-call arguments in older messages.
This is a lightweight, pre-summarization optimization that fires at a lower
token threshold than full conversation compaction. When triggered, only the
args values on AIMessage.tool_calls in messages before the keep window
are shortened — recent messages are left intact.
Typical large arguments include write_file content, edit_file patches,
and verbose execute outputs.
State for the summarization middleware.
Extends AgentState with a private field for tracking summarization events.
Default settings computed from model profile.
Middleware that provides a compact_conversation tool for manual compaction.
This middleware composes with a SummarizationMiddleware instance, reusing
its summarization engine (model, backend, trigger thresholds) to let the
agent proactively compact its own context window.
The tool and auto-summarization share the same _summarization_event state
key, so they interoperate correctly.