This repo is the official code implementation of MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents
[arXiv] | [Project Site]
Zijian Zhou^, Ao Qu^, Zhaoxuan Wu, Sunghwan Kim, Alok Prakash, Daniela Rus, Jinhua Zhao, and Bryan Kian Hsiang Low, Paul Pu Liang
⭐️ Please star this repository if you find it helpful!
Abstract: Modern language agents must operate over long-horizon, multi-turn interactions, where they retrieve external information, adapt to observations, and answer interdependent queries. Yet, most LLM systems rely on full-context prompting, appending all past turns regardless of their relevance. This leads to unbounded memory growth, increased computational costs, and degraded reasoning performance on out-of-distribution input lengths. We introduce MEM1, an end-to-end reinforcement learning framework that enables agents to operate with constant memory across long multi-turn tasks. At each turn, MEM1 updates a compact shared internal state that jointly supports memory consolidation and reasoning. This state integrates prior memory with new observations from the environment while strategically discarding irrelevant or redundant information. To support training in more realistic and compositional settings, we propose a simple yet effective and scalable approach to constructing multi-turn environments by composing existing datasets into arbitrarily complex task sequences. Experiments across three domains, including internal retrieval QA, open-domain web QA, and multi-turn web shopping, show that MEM1-7B improves performance by 3.5x while reducing memory usage by 3.7x compared to Qwen2.5-14B-Instruct on a 16-objective multi-hop QA task, and generalizes beyond the training horizon. Our results demonstrate the promise of reasoning-driven memory consolidation as a scalable alternative to existing solutions for training long-horizon interactive agents, where both efficiency and performance are optimized.
mem1_vid.mp4
- (Oct 2025) MEM1 has been accepted at NeurIPS 2025 Workshop MTI-LLM for oral presentation! (2 out of 230 submissions)
- (Aug 2025) MEM1 has been accepted at COLM 2025 Workshop RAM2 for oral presentation!
# Install requirements. This may take a few minutes to complete.
conda create -n mem1 python=3.9
conda activate mem1
# install torch [or you can skip this step and let vllm to install the correct version for you]
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu121
# install vllm
pip3 install vllm==0.6.3 # or you can install 0.5.4, 0.4.2 and 0.3.1
# install other dependencies
pip install -r requirements.txt
# install verl locally
cd Mem1/train
pip install -e .
# install flash attn
pip3 install flash-attn --no-build-isolationIf you would like to call a local retriever as the search engine, you can install the environment as follows. (We recommend using a separate environment.)
conda create -n retriever python=3.10
conda activate retriever
# we recommend installing torch with conda for faiss-gpu
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install transformers datasets pyserini
## install the gpu version faiss to guarantee efficient RL rollout
conda install -c pytorch -c nvidia faiss-gpu=1.8.0
## API function
pip install uvicorn fastapi- Download the necessary data:
cd setup/
python download.py- Pre-process the multi-objective QA dataset (change the batch size for varying number of objectives)
cd Mem1/
bash gen_data/scripts/data_process_multi.sh --batch_size 2- Launch a local retrieval server:
# First remember to change the `file_path` field of `Mem1/train/retrieval_launch.sh` to the path of your RAG files
cd Mem1/train/
bash retrieval_launch.sh- Train the Mem1 model:
cd Mem1/train/
bash train_ppo.sh- To evaluate MEM1 model, we need to first serve the model:
cd Mem1/inference
# Change the variables in start_vllm.sh
bash start_vllm.sh-
Wait for the model to be served. Next, we also need to serve the retriever. Follow Step 3 for how to serve the retriever.
-
Collect the trajectories generated by Mem1 by running
cd Mem1
python inference/generate_rollout.py --model Mem-Lab/Qwen2.5-7B-RL-RAG-Q2-EM-Release --use_mem1 --data_file train/data/nq_hotpotqa_train_multi_2/test.parquet- With the collected trajectories, we can evaluate the performance
cd Mem1
python inference/eval.py --eval_file_path [PATH_TO_TRAJ_FILE]- wandb: https://api.wandb.ai/links/Mem1/vl5osiui
- 🤗 HF Checkpoint: https://huggingface.co/Mem-Lab/Qwen2.5-7B-RL-RAG-Q2-EM-Release
Please use the following bibtex citation format for your reference.
@misc{2025mem1learningsynergizememory,
title = {MEM1: Learning to Synergize Memory and Reasoning for Efficient Long-Horizon Agents},
author = {Zhou, Zijian and Qu, Ao and Wu, Zhaoxuan and Kim, Sunghwan and Prakash, Alok and Rus, Daniela and Zhao, Jinhua and Low, Bryan Kian Hsiang and Liang, Paul Pu},
year = {2025},
archivePrefix= {arXiv},
primaryClass = {cs.CL},
url = {https://arxiv.org/abs/2506.15841},
}
The codebase has referred to the following repositories: