A memory-based, continual-learning framework that helps LLM agents improve from experience without updating model weights.
PlannerβExecutor Architecture β’ Case-Based Reasoning β’ MCP Tooling β’ Memory-Augmented Learning
|
Memento vs. Baselines on GAIA validation and test sets. |
Ablation study of Memento across benchmarks. |
|
Continual learning curves across memory designs. |
Mementoβs accuracy improvement on OOD datasets. |
- [2025.08.27] Thanks for your interest in our work! Weβll release our CBR code next week and our Parametric Memory code next month. Weβll keep updating on our further development.
- [2025.08.27] We add a new Crawler MCP in
server/ai_crawler.pyfor web crawling and query-aware content compression to reduce token cost. - [2025.08.26] We add the SerpAPI (https://serpapi.com/search-api) MCP tool to help you avoid using the search Docker and speed up development.
- No LLM weight updates. Memento reframes continual learning as memory-based online reinforcement learning over a memory-augmented MDP. A neural case-selection policy guides actions; experiences are stored and reused via efficient Read/Write operations.
- Two-stage plannerβexecutor loop. A CBR-driven Planner decomposes tasks and retrieves relevant cases; an Executor runs each subtask as an MCP client, orchestrating tools and writing back outcomes.
- Comprehensive tool ecosystem. Built-in support for web search, document processing, code execution, image/video analysis, and more through a unified MCP interface.
- Strong benchmark performance. Achieves competitive results across GAIA, DeepResearcher, SimpleQA, and HLE benchmarks.
Learn from experiences, not gradients. Memento logs successful & failed trajectories into a Case Bank and retrieves by value to steer planning and executionβenabling low-cost, transferable, and online continual learning.
- Meta-Planner: Breaks down high-level queries into executable subtasks using GPT-4.1
- Executor: Executes individual subtasks using o3 or other models via MCP tools
- Case Memory: Stores final-step tuples (s_T, a_T, r_T) for experience replay
- MCP Tool Layer: Unified interface for external tools and services
- Web Research: Live search and controlled crawling via SearxNG
- Document Processing: Multi-format support (PDF, Office, images, audio, video)
- Code Execution: Sandboxed Python workspace with security controls
- Data Analysis: Excel processing, mathematical computations
- Media Analysis: Image captioning, video narration, audio transcription
- Python 3.10+
- OpenAI API key (or compatible API endpoint)
- SearxNG instance for web search
# Create and activate conda environment
git clone https://github.com/Agent-on-the-Fly/Memento
cd Memento
conda create -n Memento python=3.11 -y
conda activate Memento
# Navigate to client directory
cd Memento/client
# Create environment file
touch .envAfter creating the .env file, you need to configure the following API keys and service endpoints:
# OPENAI API
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_BASE_URL=https://api.openai.com/v1 # or your custom endpoint
#===========================================
# Tools & Services API
#===========================================
# Chunkr API (https://chunkr.ai/)
CHUNKR_API_KEY=your_chunkr_api_key_here
# Jina API
JINA_API_KEY=your_jina_api_key_here
# ASSEMBLYAI API
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_hereNote: Replace your_*_api_key_here with your actual API keys. Some services are optional depending on which tools you plan to use.
# Web crawling and search capabilities
pip install -U crawl4ai
crawl4ai-setup
crawl4ai-doctor
playwright installpip install -r requirements.txtFor web search capabilities, set up SearxNG: You can follow https://github.com/searxng/searxng-docker/ to set the docker and use our setting.
# In a new terminal
cd ./Memento/searxng-docker
docker compose up -dpython client/agent.py- Planner Model: Defaults to
gpt-4.1for task decomposition - Executor Model: Defaults to
o3for task execution - Custom Models: Support for any OpenAI-compatible API
- Search: Configure SearxNG instance URL
- Code Execution: Customize import whitelist and security settings
- Document Processing: Set cache directories and processing limits
- GAIA: 87.88% (Val, Pass@3 Top-1) and 79.40% (Test)
- DeepResearcher: 66.6% F1 / 80.4% PM, with +4.7β9.6 absolute gains on OOD datasets
- SimpleQA: 95.0%
- HLE: 24.4% PM (close to GPT-5 at 25.32%)
- Small, high-quality memory works best: Retrieval K=4 yields peak F1/PM
- Planning + CBR consistently improves performance
- Concise, structured planning outperforms verbose deliberation
Memento/
βββ client/ # Main agent implementation
β βββ agent.py # Hierarchical client with planner-executor
βββ server/ # MCP tool servers
β βββ code_agent.py # Code execution and workspace management
β βββ search_tool.py # Web search via SearxNG
β βββ documents_tool.py # Multi-format document processing
β βββ image_tool.py # Image analysis and captioning
β βββ video_tool.py # Video processing and narration
β βββ excel_tool.py # Spreadsheet processing
β βββ math_tool.py # Mathematical computations
β βββ craw_page.py # Web page crawling
βββ interpreters/ # Code execution backends
βββ docker_interpreter.py
βββ e2b_interpreter.py
βββ internal_python_interpreter.py
βββ subprocess_interpreter.py
- Create a new FastMCP server in the
server/directory - Implement your tool functions with proper error handling
- Register the tool with the MCP protocol
- Update the client's server list in
agent.py
Extend the interpreters/ module to add new execution backends:
from interpreters.base import BaseInterpreter
class CustomInterpreter(BaseInterpreter):
async def execute(self, code: str) -> str:
# Your custom execution logic
pass- Add Case Bank Reasoning: Implement memory-based case retrieval and reasoning system
- Add User Personal Memory Mechanism: Implement user-preference search
- Refine Tools & Add More Tools: Enhance existing tools and expand the tool ecosystem
- Test More New Benchmarks: Evaluate performance on additional benchmark datasets
- Long-horizon tasks: GAIA Level-3 remains challenging due to compounding errors
- Frontier knowledge: HLE performance limited by tooling alone
- Open-source coverage: Limited executor validation in fully open pipelines
- Some parts of the code in the toolkits and interpreters are adapted from Camel-AI.
If Memento helps your work, please cite:
@techreport{Memento2025,
title = {Memento: Fine-tuning LLM Agents without Fine-tuning LLMs},
author = {Huichi Zhou and Yihang Chen and Siyuan Guo and Xue Yan and
Kin Hei Lee and Zihan Wang and Ka Yiu Lee and Guchun Zhang and
Kun Shao and Linyi Yang and Jun Wang},
year = {2025},
github = {https://github.com/Agent-on-the-Fly/Memento}
}We welcome contributions! Please see our contributing guidelines for:
- Bug reports and feature requests
- Code contributions and pull requests
- Documentation improvements
- Tool and interpreter extensions
Thanks to the open-source community and contributors who made this project possible.