Beyond the context window — bounded only by budget and patience.
What “unlimited” means (and doesn’t):
This tool is not “infinite tokens in a single prompt.” It keeps the corpus external and uses iterative probing + recursion so you can work over context far larger than the model window.
Hard limits: budget, latency, recursion/iteration caps, rate limits, and model capability.
Recursive Language Model (RLM) style context-unlimited reasoning for coding agents (Claude, Cursor, Codex, OpenCode, MCP).
| Mode | API Keys | Automation | Best For |
|---|---|---|---|
| Server-side (MCP) | Required | Full | Production workloads, cross-agent portability |
| Local (Agent Skills) | Zero | Semi | High-speed repo-wide extraction in Cursor/Claude Code |
- Server-side: Configure as an MCP server.
- Local: Use Integrated Skills (Cursor/Claude Code).
context-unlimited treats long prompts as an external environment and let an LLM programmatically probe → recurse → synthesize over context far beyond the model window.
Paper: Recursive Language Models (Zhang, Kraska, Khattab, 2025).
arXiv: https://arxiv.org/abs/2512.24601 | HTML: https://arxiv.org/html/2512.24601v1
Why this repo exists (vs one-shot long-context prompting)
RLMs avoid forcing the entire context through the model window at once. Instead, the model writes code to selectively inspect the context and can recursively call itself on targeted snippets.
This repo exposes that workflow as a reusable MCP tool: solve.
When you should use RLM
Use solve when:
- baseline one-shot truncates or becomes unreliable
- you need multi-step investigation across many files/logs
- you want evidence-backed extraction (paths + excerpts + hashes)
When you should NOT use RLM
Don’t use it for:
- small, trivial queries that fit comfortably in the context window
- tasks where
grep/ripgrep is obviously sufficient - situations where multiple recursive API calls would be cost-prohibitive
If you build on this work, cite the RLM paper:
@article{zhang2025rlm,
title={Recursive Language Models},
author={Zhang, Alex L. and Kraska, Tim and Khattab, Omar},
journal={arXiv preprint arXiv:2512.24601},
year={2025}
}For a streamlined experience using your existing IDE plan (Claude Code, Cursor, etc.):
- Skills Integration: We provide pre-configured skills and commands for major agents.
- See Claude Code Integration
- See Local Probing Toolbox for a "mechanical truth" extraction guide.
- Persistent REPL: You can run
python integrations/claude_code/rlm_runner.pyto maintain a stateful probing session directly in your terminal.
1. Prerequisites
- Python 3.12+
- uv (recommended) or pip
- Docker (recommended for safe sandboxing)
2. Installation
We recommend using uv for automatic dependency and virtualenv management:
# Sync dependencies (RLM is automatically installed from upstream)
uv sync
# Build Docker Sandbox (recommended for safety)
docker build -t rlm-sandbox -f Dockerfile.rlm-sandbox .Note: The project uses hatchling as the build backend for proper package discovery.
RLM method (in 30 seconds)
RLMs treat your repo/logs as an external environment. The model writes code to:
- ingest context (files/globs/text)
- probe specific subsets (regex/AST/search)
- recursively call itself on small chunks
- synthesize a final structured answer
To add RLM to Cursor as an MCP server:
- Open Cursor Settings -> Models -> MCP.
- Click + Add New MCP Server.
- Set Name to
rlmand Type tocommand. - Use the following Command (replace with your absolute path):
uv --directory /path/to/context-unlimited run context_unlimited/server.py
Example mcp.json Configuration
If you prefer editing your mcp.json directly (usually found at ~/Library/Application Support/Cursor/User/globalStorage/moe.eucerin.cursor/mcp.json on macOS), here is the canonical configuration:
{
"mcpServers": {
"rlm": {
"command": "uv",
"args": [
"--directory",
"/path/to/context-unlimited",
"run",
"context_unlimited/server.py"
],
"env": {
"OPENAI_API_KEY": "sk-...",
"OPENROUTER_API_KEY": "sk-or-v1-...",
"RLM_DEFAULT_RECURSION_MODEL": "openai/gpt-4o-mini"
}
}
}
}
Handling API Keys in Cursor
Cursor does not automatically inherit environment variables from your shell. Use one of these methods:
- Method A (.env file - Recommended): Create a
.envfile in the project root with your keys:Note: OpenRouter usesOPENAI_API_KEY=sk-... OPENROUTER_API_KEY=sk-...
Authorization: Bearer <OPENROUTER_API_KEY>for its API (see OpenRouter Authentication). - Method B (Direct JSON): Add keys directly to the
envobject in yourmcp.json(found via Cursor settings). - Method C (Terminal Launch): Launch Cursor from your terminal using
cursor .after exporting keys in your shell.
[!IMPORTANT] Common Failure (Environment Variables): Some MCP clients (including Claude Code and Cursor) may occasionally fail to pass variables from the
envblock to the subprocess. If you see "API Key not found" errors, we recommend using Method A or Method C.
Claude Code
claude mcp add --scope project --env OPENAI_API_KEY=sk-... rlm -- python -u context_unlimited/server.pyCodex
Add to ~/.codex/config.toml:
[context_unlimited.rlm]
command = "python"
args = ["-u", "/path/to/context-unlimited/context_unlimited/server.py"]
cwd = "/path/to/context-unlimited"
[context_unlimited.rlm.env]
OPENAI_API_KEY = "..."context-unlimited supports any OpenAI-compatible endpoint. Configure via provider_preset or an explicit provider object in the tool arguments. By default, ollama_local is used for a "no-secrets" quickstart experience.
| Provider | Preset | Default Model | Required Env Var |
|---|---|---|---|
| OpenAI | openai |
gpt-4o-mini |
OPENAI_API_KEY |
| OpenRouter | openrouter |
google/gemini-2.0-flash-lite |
OPENROUTER_API_KEY |
| Ollama | ollama_local (Default) |
qwen-2.5-coder-32b |
None (set environment: local) |
| vLLM | vllm_local |
qwen-2.5-coder-32b |
VLLM_API_KEY (if required) |
| LiteLLM | litellm_proxy |
qwen-2.5-coder-32b |
Proxy-specific |
Tip
OpenRouter "Bench" Keys: If you are using OpenRouter for benchmarks, we recommend creating a "restricted" API key with a low spending limit (e.g., $1.00) to prevent unexpected costs. See the OpenRouter Keys Dashboard.
Caution
Cost Warning: RLM works by making multiple recursive calls to the LLM. Using large models like gpt-4o or claude-sonnet-4.5 can lead to significant API costs. Always monitor your usage and consider using a cheaper recursion model (e.g., gpt-4o-mini) via RLM_DEFAULT_RECURSION_MODEL.
Choosing & Overriding Models
You can configure global defaults via environment variables or override them per-request.
RLM_DEFAULT_MODEL: The primary model to use if none is specified.RLM_DEFAULT_RECURSION_MODEL: The model used for recursive sub-calls (highly recommended to use a cheaper model here).
Use the model_name argument in your prompt. RLM also supports an other_model_name for the recursive "sub-probes".
Example: Use Claude on OpenRouter
"Analyze the auth flow using RLM with
model_name='anthropic/claude-3.5-sonnet'"
Example: Use a cheaper model for recursion
"Find the bug using RLM with
model_name='gpt-4o'andother_model_name='gpt-4o-mini'"
{
"query": "Analyze the project structure",
"globs": ["**/*.py"],
"provider_preset": "openai",
"model_name": "gpt-4o"
}{
"query": "Deep investigation of the codebase",
"provider_preset": "openrouter",
"model_name": "anthropic/claude-sonnet-4.5"
}{
"query": "Analyze this file",
"provider_preset": "ollama_local",
"model_name": "qwen2.5-coder-32b",
"environment": "local"
}[!NOTE] Disclosure: These specific model combinations are provided as general recommendations based on current pricing and performance (January 2026). They have not been formally tested in every environment; your results may vary.
| Strategy | Primary Model (model_name) |
Recursive Model (other_model_name) |
Cost Tier |
|---|---|---|---|
| Balanced Power | anthropic/claude-3.5-sonnet |
openai/gpt-4o-mini |
Moderate |
| High Intelligence | anthropic/claude-sonnet-4.5 |
anthropic/claude-3.5-sonnet |
High |
| Ultra Budget | deepseek/deepseek-r1 |
google/gemini-2.0-flash-lite |
Very Low |
[!WARNING] 404 Model Not Found: If you receive a 404 error, ensure the model ID is correct for your provider. For OpenRouter, some models have
:freesuffixes or might be temporarily unavailable. Check the OpenRouter Models List for the latest IDs.
Example: Balanced Power Setup (mcp.json)
{
"env": {
"RLM_DEFAULT_MODEL": "openai/gpt-5.2-codex",
"RLM_DEFAULT_RECURSION_MODEL": "deepseek/deepseek-r1-0528:free"
}
}context-unlimited is designed for high-reliability agentic integration. It supports MCP Structured Outputs, allowing IDEs and CLI agents (like Cursor, Claude Code, and Codex) to consume machine-readable data directly via the structuredContent field.
- Reliability: Eliminates "JSON in prose" parsing errors by using a dedicated data channel.
- Contractability: Clients can generate type-safe APIs from the server's
outputSchema. All results conform to therlm.solve.result.v1schema. - Operational efficiency: structured outputs can avoid re-injecting large tool results into the agent’s conversation history when the client supports it.
When calling the solve tool, the server returns a CallToolResult that contains:
- Human-readable text in
content(for the user). - Structured JSON in
meta.structured_content(for the agent).
Example TypeScript consumption:
const result = await client.callTool({ name: "solve", arguments: { query: "..." } });
// Standard MCP structured result
const structured = (result._meta as any).structured_content;
const answerJson = structured.answer_json; // Canonical parsed valueNote: For clients that do not yet fully support the MCP structuredContent spec, the server also embeds the JSON in the human-readable text block as a fallback.
See examples/code_mode/client.ts for a full implementation.
Compare a "one-shot" call vs RLM's recursive probing:
uv run python bench/bench_tokens.py \
--query "Find all provider presets and required env vars. Return JSON." \
--globs "context_unlimited/**/*.py" "schemas/*.json" \
--provider_preset openrouter \
--model <your-model-id>Benchmarking with Ollama (Context Limits)
Ollama's default context (4096 tokens) can make comparisons unfair. Use a custom model file:
ollama create qwen2.5-coder-32k -f Modelfile.qwen3-32k- Run benchmark with
--model qwen2.5-coder-32k.
Architecture Diagram
flowchart LR
A[Agent framework<br/>Cursor / Claude Code / Codex] -->|MCP stdio| B[context-unlimited server]
B --> C[Ingest: files/text/globs]
C --> D[RLM loop<br/>probe → recurse → synthesize]
D --> E[Provider<br/>OpenAI / OpenRouter / Ollama / vLLM]
E --> D --> B --> A
context_unlimited/server.py: FastMCP server entrypoint.context_unlimited/ingest.py: Secure file ingestion with repo boundaries.context_unlimited/validate.py: JSON schema validation.schemas/: JSON schemas for request/result.
Disclaimer & User Responsibility
[!IMPORTANT] By using this software, you acknowledge and agree to the following:
- Financial Responsibility: RLM works by making multiple recursive calls to LLM APIs. You are solely responsible for all costs incurred on your API accounts. We highly recommend setting usage limits on your provider dashboards.
- Verification Mandate: AI-generated outputs (especially code and technical analysis) can contain errors, omissions, or "hallucinations." You must manually verify all outputs before relying on them or applying them to production systems.
- Execution Risk: RLM executes LLM-generated code to probe your context. While we provide a Docker sandbox, no sandbox is 100% secure. You assume all risk associated with the execution of AI-generated code on your infrastructure.
- Data Privacy: Do not input sensitive personal data, trade secrets, or highly confidential information unless you have verified the privacy policy of your chosen LLM provider.
- No Liability: This software is provided "as is," without warranty of any kind. The authors and contributors are not liable for any damages, financial losses, or security breaches resulting from the use of this tool.
Security Policy & Sandboxing
- Execution Environment: Default execution uses a Docker sandbox. This is the primary defense against malicious or buggy AI-generated code. Always prefer Docker for untrusted queries.
- Local Execution: Running with
environment: localexecutes code directly on your host machine. Use this only for trusted local development on non-sensitive codebases. - API Keys: Keys should be scoped to least privilege. Never pass keys directly in tool arguments; use environment variables or a
.envfile.
- A generic shell executor: RLM is optimized for code search and reasoning, not general-purpose automation.
- A replacement for grep: For simple searches, use grep. RLM is for complex "Where is the logic for X and how does it relate to Y" queries.
- 100% Secure Local Execution: While we provide sandboxing, LLM-generated code is inherently risky. Always prefer the Docker environment for untrusted queries.
To ensure stability, we pin the MCP SDK and other critical dependencies in pyproject.toml. If you encounter issues with newer MCP versions, stick to:
mcp>=1.25.0anthropic>=0.76.0