Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Unlimited* context windows using Recursive Language Models (RLM)

Notifications You must be signed in to change notification settings

wx-b/context-unlimited

Repository files navigation

context-unlimited

Beyond the context window — bounded only by budget and patience.

What “unlimited” means (and doesn’t):
This tool is not “infinite tokens in a single prompt.” It keeps the corpus external and uses iterative probing + recursion so you can work over context far larger than the model window.
Hard limits: budget, latency, recursion/iteration caps, rate limits, and model capability.

Recursive Language Model (RLM) style context-unlimited reasoning for coding agents (Claude, Cursor, Codex, OpenCode, MCP).

Integration Options

Mode API Keys Automation Best For
Server-side (MCP) Required Full Production workloads, cross-agent portability
Local (Agent Skills) Zero Semi High-speed repo-wide extraction in Cursor/Claude Code

MCP RLM


context-unlimited treats long prompts as an external environment and let an LLM programmatically probe → recurse → synthesize over context far beyond the model window.

Paper: Recursive Language Models (Zhang, Kraska, Khattab, 2025).
arXiv: https://arxiv.org/abs/2512.24601 | HTML: https://arxiv.org/html/2512.24601v1


Why this repo exists (vs one-shot long-context prompting)

RLMs avoid forcing the entire context through the model window at once. Instead, the model writes code to selectively inspect the context and can recursively call itself on targeted snippets.

This repo exposes that workflow as a reusable MCP tool: solve.

When you should use RLM

Use solve when:

  • baseline one-shot truncates or becomes unreliable
  • you need multi-step investigation across many files/logs
  • you want evidence-backed extraction (paths + excerpts + hashes)
When you should NOT use RLM

Don’t use it for:

  • small, trivial queries that fit comfortably in the context window
  • tasks where grep/ripgrep is obviously sufficient
  • situations where multiple recursive API calls would be cost-prohibitive

Citation

If you build on this work, cite the RLM paper:

@article{zhang2025rlm,
  title={Recursive Language Models},
  author={Zhang, Alex L. and Kraska, Tim and Khattab, Omar},
  journal={arXiv preprint arXiv:2512.24601},
  year={2025}
}

Integrated Skills (Local Runtime)

For a streamlined experience using your existing IDE plan (Claude Code, Cursor, etc.):

  1. Skills Integration: We provide pre-configured skills and commands for major agents.
  2. Persistent REPL: You can run python integrations/claude_code/rlm_runner.py to maintain a stateful probing session directly in your terminal.

MCP Server Setup (Server-side)

1. Prerequisites
  • Python 3.12+
  • uv (recommended) or pip
  • Docker (recommended for safe sandboxing)
2. Installation

We recommend using uv for automatic dependency and virtualenv management:

# Sync dependencies (RLM is automatically installed from upstream)
uv sync

# Build Docker Sandbox (recommended for safety)
docker build -t rlm-sandbox -f Dockerfile.rlm-sandbox .

Note: The project uses hatchling as the build backend for proper package discovery.

RLM method (in 30 seconds)

RLMs treat your repo/logs as an external environment. The model writes code to:

  1. ingest context (files/globs/text)
  2. probe specific subsets (regex/AST/search)
  3. recursively call itself on small chunks
  4. synthesize a final structured answer

See: https://arxiv.org/abs/2512.24601

MCP Integration (Cursor / Claude Code / Codex) — RLM Tool

1. Cursor (Recommended)

To add RLM to Cursor as an MCP server:

  1. Open Cursor Settings -> Models -> MCP.
  2. Click + Add New MCP Server.
  3. Set Name to rlm and Type to command.
  4. Use the following Command (replace with your absolute path):
    uv --directory /path/to/context-unlimited run context_unlimited/server.py
Example mcp.json Configuration

If you prefer editing your mcp.json directly (usually found at ~/Library/Application Support/Cursor/User/globalStorage/moe.eucerin.cursor/mcp.json on macOS), here is the canonical configuration:

{
  "mcpServers": {
    "rlm": {
      "command": "uv",
      "args": [
        "--directory",
        "/path/to/context-unlimited",
        "run",
        "context_unlimited/server.py"
      ],
      "env": {
        "OPENAI_API_KEY": "sk-...",
        "OPENROUTER_API_KEY": "sk-or-v1-...",
        "RLM_DEFAULT_RECURSION_MODEL": "openai/gpt-4o-mini"
      }
    }
  }
}
Handling API Keys in Cursor

Cursor does not automatically inherit environment variables from your shell. Use one of these methods:

  • Method A (.env file - Recommended): Create a .env file in the project root with your keys:
    OPENAI_API_KEY=sk-...
    OPENROUTER_API_KEY=sk-...
    Note: OpenRouter uses Authorization: Bearer <OPENROUTER_API_KEY> for its API (see OpenRouter Authentication).
  • Method B (Direct JSON): Add keys directly to the env object in your mcp.json (found via Cursor settings).
  • Method C (Terminal Launch): Launch Cursor from your terminal using cursor . after exporting keys in your shell.

[!IMPORTANT] Common Failure (Environment Variables): Some MCP clients (including Claude Code and Cursor) may occasionally fail to pass variables from the env block to the subprocess. If you see "API Key not found" errors, we recommend using Method A or Method C.

2. Other Clients

Claude Code
claude mcp add --scope project --env OPENAI_API_KEY=sk-... rlm -- python -u context_unlimited/server.py
Codex

Add to ~/.codex/config.toml:

[context_unlimited.rlm]
command = "python"
args = ["-u", "/path/to/context-unlimited/context_unlimited/server.py"]
cwd = "/path/to/context-unlimited"
[context_unlimited.rlm.env]
OPENAI_API_KEY = "..."

Provider Configuration

context-unlimited supports any OpenAI-compatible endpoint. Configure via provider_preset or an explicit provider object in the tool arguments. By default, ollama_local is used for a "no-secrets" quickstart experience.

Provider Preset Default Model Required Env Var
OpenAI openai gpt-4o-mini OPENAI_API_KEY
OpenRouter openrouter google/gemini-2.0-flash-lite OPENROUTER_API_KEY
Ollama ollama_local (Default) qwen-2.5-coder-32b None (set environment: local)
vLLM vllm_local qwen-2.5-coder-32b VLLM_API_KEY (if required)
LiteLLM litellm_proxy qwen-2.5-coder-32b Proxy-specific

Tip

OpenRouter "Bench" Keys: If you are using OpenRouter for benchmarks, we recommend creating a "restricted" API key with a low spending limit (e.g., $1.00) to prevent unexpected costs. See the OpenRouter Keys Dashboard.

Caution

Cost Warning: RLM works by making multiple recursive calls to the LLM. Using large models like gpt-4o or claude-sonnet-4.5 can lead to significant API costs. Always monitor your usage and consider using a cheaper recursion model (e.g., gpt-4o-mini) via RLM_DEFAULT_RECURSION_MODEL.

Choosing & Overriding Models

You can configure global defaults via environment variables or override them per-request.

Global Defaults (via .env)

  • RLM_DEFAULT_MODEL: The primary model to use if none is specified.
  • RLM_DEFAULT_RECURSION_MODEL: The model used for recursive sub-calls (highly recommended to use a cheaper model here).

Per-Request Overrides

Use the model_name argument in your prompt. RLM also supports an other_model_name for the recursive "sub-probes".

Example: Use Claude on OpenRouter

"Analyze the auth flow using RLM with model_name='anthropic/claude-3.5-sonnet'"

Example: Use a cheaper model for recursion

"Find the bug using RLM with model_name='gpt-4o' and other_model_name='gpt-4o-mini'"

Usage Examples (JSON)

OpenAI
{
  "query": "Analyze the project structure",
  "globs": ["**/*.py"],
  "provider_preset": "openai",
  "model_name": "gpt-4o"
}
OpenRouter
{
  "query": "Deep investigation of the codebase",
  "provider_preset": "openrouter",
  "model_name": "anthropic/claude-sonnet-4.5"
}
Local Models (Ollama)
{
  "query": "Analyze this file",
  "provider_preset": "ollama_local",
  "model_name": "qwen2.5-coder-32b",
  "environment": "local"
}

Cost-Efficient OpenRouter Strategies

[!NOTE] Disclosure: These specific model combinations are provided as general recommendations based on current pricing and performance (January 2026). They have not been formally tested in every environment; your results may vary.

Strategy Primary Model (model_name) Recursive Model (other_model_name) Cost Tier
Balanced Power anthropic/claude-3.5-sonnet openai/gpt-4o-mini Moderate
High Intelligence anthropic/claude-sonnet-4.5 anthropic/claude-3.5-sonnet High
Ultra Budget deepseek/deepseek-r1 google/gemini-2.0-flash-lite Very Low

[!WARNING] 404 Model Not Found: If you receive a 404 error, ensure the model ID is correct for your provider. For OpenRouter, some models have :free suffixes or might be temporarily unavailable. Check the OpenRouter Models List for the latest IDs.

Example: Balanced Power Setup (mcp.json)

{
  "env": {
    "RLM_DEFAULT_MODEL": "openai/gpt-5.2-codex",
    "RLM_DEFAULT_RECURSION_MODEL": "deepseek/deepseek-r1-0528:free"
  }
}

Code Mode (Structured Outputs) — RLM Results as typed data

context-unlimited is designed for high-reliability agentic integration. It supports MCP Structured Outputs, allowing IDEs and CLI agents (like Cursor, Claude Code, and Codex) to consume machine-readable data directly via the structuredContent field.

Benefits

  • Reliability: Eliminates "JSON in prose" parsing errors by using a dedicated data channel.
  • Contractability: Clients can generate type-safe APIs from the server's outputSchema. All results conform to the rlm.solve.result.v1 schema.
  • Operational efficiency: structured outputs can avoid re-injecting large tool results into the agent’s conversation history when the client supports it.

Consuming Structured Output

When calling the solve tool, the server returns a CallToolResult that contains:

  1. Human-readable text in content (for the user).
  2. Structured JSON in meta.structured_content (for the agent).

Example TypeScript consumption:

const result = await client.callTool({ name: "solve", arguments: { query: "..." } });
// Standard MCP structured result
const structured = (result._meta as any).structured_content;
const answerJson = structured.answer_json; // Canonical parsed value

Note: For clients that do not yet fully support the MCP structuredContent spec, the server also embeds the JSON in the human-readable text block as a fallback.

See examples/code_mode/client.ts for a full implementation.

Benchmarking

Token benchmark (30 seconds)

Compare a "one-shot" call vs RLM's recursive probing:

uv run python bench/bench_tokens.py \
  --query "Find all provider presets and required env vars. Return JSON." \
  --globs "context_unlimited/**/*.py" "schemas/*.json" \
  --provider_preset openrouter \
  --model <your-model-id>
Benchmarking with Ollama (Context Limits)

Ollama's default context (4096 tokens) can make comparisons unfair. Use a custom model file:

  1. ollama create qwen2.5-coder-32k -f Modelfile.qwen3-32k
  2. Run benchmark with --model qwen2.5-coder-32k.

Architecture & Security

Architecture Diagram
flowchart LR
  A[Agent framework<br/>Cursor / Claude Code / Codex] -->|MCP stdio| B[context-unlimited server]
  B --> C[Ingest: files/text/globs]
  C --> D[RLM loop<br/>probe → recurse → synthesize]
  D --> E[Provider<br/>OpenAI / OpenRouter / Ollama / vLLM]
  E --> D --> B --> A
Loading
  • context_unlimited/server.py: FastMCP server entrypoint.
  • context_unlimited/ingest.py: Secure file ingestion with repo boundaries.
  • context_unlimited/validate.py: JSON schema validation.
  • schemas/: JSON schemas for request/result.
Disclaimer & User Responsibility

[!IMPORTANT] By using this software, you acknowledge and agree to the following:

  1. Financial Responsibility: RLM works by making multiple recursive calls to LLM APIs. You are solely responsible for all costs incurred on your API accounts. We highly recommend setting usage limits on your provider dashboards.
  2. Verification Mandate: AI-generated outputs (especially code and technical analysis) can contain errors, omissions, or "hallucinations." You must manually verify all outputs before relying on them or applying them to production systems.
  3. Execution Risk: RLM executes LLM-generated code to probe your context. While we provide a Docker sandbox, no sandbox is 100% secure. You assume all risk associated with the execution of AI-generated code on your infrastructure.
  4. Data Privacy: Do not input sensitive personal data, trade secrets, or highly confidential information unless you have verified the privacy policy of your chosen LLM provider.
  5. No Liability: This software is provided "as is," without warranty of any kind. The authors and contributors are not liable for any damages, financial losses, or security breaches resulting from the use of this tool.
Security Policy & Sandboxing
  • Execution Environment: Default execution uses a Docker sandbox. This is the primary defense against malicious or buggy AI-generated code. Always prefer Docker for untrusted queries.
  • Local Execution: Running with environment: local executes code directly on your host machine. Use this only for trusted local development on non-sensitive codebases.
  • API Keys: Keys should be scoped to least privilege. Never pass keys directly in tool arguments; use environment variables or a .env file.

What this is NOT

  • A generic shell executor: RLM is optimized for code search and reasoning, not general-purpose automation.
  • A replacement for grep: For simple searches, use grep. RLM is for complex "Where is the logic for X and how does it relate to Y" queries.
  • 100% Secure Local Execution: While we provide sandboxing, LLM-generated code is inherently risky. Always prefer the Docker environment for untrusted queries.

Version Pinning

To ensure stability, we pin the MCP SDK and other critical dependencies in pyproject.toml. If you encounter issues with newer MCP versions, stick to:

  • mcp>=1.25.0
  • anthropic>=0.76.0
🔎 Keywords

RLM, Recursive Language Models, long context, MCP, Model Context Protocol, REPL, agentic coding, structured outputs, code mode, OpenRouter, Ollama, vLLM.

About

Unlimited* context windows using Recursive Language Models (RLM)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published