Autocache is a smart proxy server that automatically injects cache-control fields into Anthropic Claude API requests, reducing costs by up to 90% and latency by up to 85% while providing detailed ROI analytics via response headers.
Modern AI agent platforms like n8n, Flowise, Make.com, and even popular frameworks like LangChain and LlamaIndex don't support Anthropic's prompt cachingβdespite users building increasingly complex agents with:
- π Large system prompts (1,000-5,000+ tokens)
- π οΈ 10+ tool definitions (5,000-15,000+ tokens)
- π Repeated agent interactions (same context, different queries)
When you build a complex agent in n8n with a detailed system prompt and multiple tools, every API call sends the full context againβcosting 10x more than necessary. For example:
- Without caching: 15,000 token agent β $0.045 per request
- With caching: Same agent β $0.0045 per request after first call (90% savings)
The AI community has been requesting this feature:
- π n8n GitHub Issue #13231 - "Anthropic model not caching system prompt"
- π Flowise Issue #4289 - "Support for Anthropic Prompt Caching"
- π n8n Community Request - Multiple requests for caching support
- π LangChain Issue #26701 - Implementation difficulties
Autocache works as a transparent proxy that automatically analyzes your requests and injects cache-control headers at optimal breakpointsβno code changes required. Just point your existing n8n/Flowise/Make.com workflows to Autocache instead of directly to Anthropic's API.
Result: Same agents, 90% lower costs, 85% lower latencyβautomatically.
Several tools offer prompt caching support, but Autocache is unique in combining zero-config transparent proxy with intelligent ROI analytics:
Solution | Type | Auto-Injection | Intelligence | ROI Analytics | Drop-in for n8n/Flowise |
---|---|---|---|---|---|
Autocache | Proxy | β Fully automatic | β Token analysis + ROI scoring | β Response headers | β Yes |
LiteLLM | Proxy | β Rule-based | β No | β Yes | |
langchain-smart-cache | Library | β Fully automatic | β Priority-based | β Statistics | β LangChain only |
anthropic-cost-tracker | Library | β Unclear | β Unknown | β Dashboard | β Python only |
OpenRouter | Service | β No | β No | β Yes | |
AWS Bedrock | Cloud | β ML-based | β Yes | β AWS only | β AWS only |
Autocache is the only solution that combines:
- π Transparent Proxy - Works with any tool (n8n, Flowise, Make.com) without code changes
- π§ Intelligent Analysis - Automatic token counting, ROI scoring, and optimal breakpoint placement
- π Real-time ROI - Cost savings and break-even analysis in every response header
- π Self-Hosted - No external dependencies or cloud vendor lock-in
- βοΈ Zero Config - Works out of the box with multiple strategies (conservative/moderate/aggressive)
Other solutions require configuration (LiteLLM), framework lock-in (langchain-smart-cache), or don't provide transparent proxy functionality for agent builders.
β¨ Drop-in Replacement: Simply change your API URL and get automatic caching π ROI Analytics: Detailed cost savings and break-even analysis via headers π― Smart Caching: Intelligent placement of cache breakpoints using multiple strategies β‘ High Performance: Supports both streaming and non-streaming requests π§ Configurable: Multiple caching strategies and customizable thresholds π³ Docker Ready: Easy deployment with Docker and docker-compose π Comprehensive Logging: Detailed request/response logging with structured output
The fastest way to start using Autocache is with the published Docker image from GitHub Container Registry:
1. Run the container:
# Option A: With API key in environment variable
docker run -d -p 8080:8080 \
-e ANTHROPIC_API_KEY=sk-ant-... \
--name autocache \
ghcr.io/montevive/autocache:latest
# Option B: Without API key (pass it per-request in headers)
docker run -d -p 8080:8080 \
--name autocache \
ghcr.io/montevive/autocache:latest
2. Verify it's running:
curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.1","strategy":"moderate"}
3. Test with a request:
# If using Option A (env var), no header needed:
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-haiku-20241022",
"max_tokens": 50,
"messages": [{"role": "user", "content": "Hello!"}]
}'
# If using Option B (no env var), pass API key in header:
curl http://localhost:8080/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: sk-ant-..." \
-H "anthropic-version: 2023-06-01" \
-d '{
"model": "claude-3-5-haiku-20241022",
"max_tokens": 50,
"messages": [{"role": "user", "content": "Hello!"}]
}'
4. Check the cache metadata in response headers:
X-Autocache-Injected: true
X-Autocache-Cache-Ratio: 0.750
X-Autocache-ROI-Percent: 85.2
X-Autocache-Savings-100req: $1.75
latest
- Latest stable release (recommended)v1.0.1
- Specific version tag1.0.1
,1.0
,1
- Semantic version aliases
- Registry:
ghcr.io/montevive/autocache
- Architectures:
linux/amd64
,linux/arm64
- Size: ~29 MB (optimized Alpine-based image)
- Source: https://github.com/montevive/autocache
- For production deployments with docker-compose, see the Quick Start section below
- For configuration options, see Configuration
- For n8n integration, see our n8n setup guide
- Clone and configure:
git clone <repository-url>
cd autocache
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY (optional - can pass in headers instead)
- Start the proxy:
docker-compose up -d
- Use in your application:
# Change your API base URL from:
# https://api.anthropic.com
# To:
# http://localhost:8080
- Build and run:
go mod download
go build -o autocache ./cmd/autocache
# Option 1: Set API key via environment (optional)
ANTHROPIC_API_KEY=sk-ant-... ./autocache
# Option 2: Run without API key (pass it in request headers)
./autocache
- Configure your client:
# Python example - API key passed in headers
import anthropic
client = anthropic.Anthropic(
api_key="sk-ant-...", # This will be forwarded to Anthropic
base_url="http://localhost:8080" # Point to autocache
)
Variable | Default | Description |
---|---|---|
PORT |
8080 |
Server port |
ANTHROPIC_API_KEY |
- | Your Anthropic API key (optional if passed in request headers) |
CACHE_STRATEGY |
moderate |
Caching strategy:conservative /moderate /aggressive |
LOG_LEVEL |
info |
Log level:debug /info /warn /error |
MAX_CACHE_BREAKPOINTS |
4 |
Maximum cache breakpoints (1-4) |
TOKEN_MULTIPLIER |
1.0 |
Token threshold multiplier |
The Anthropic API key can be provided in three ways (in order of precedence):
-
Request headers (recommended for multi-tenant scenarios):
Authorization: Bearer sk-ant-... # or x-api-key: sk-ant-...
-
Environment variable:
ANTHROPIC_API_KEY=sk-ant-... ./autocache
-
.env
file:ANTHROPIC_API_KEY=sk-ant-...
π‘ Tip: For multi-user environments (like n8n with multiple API keys), pass the key in request headers and leave the environment variable unset.
- Focus: System prompts and tools only
- Breakpoints: Maximum 2
- Best For: Cost-sensitive applications with predictable content
- Focus: System, tools, and large content blocks
- Breakpoints: Maximum 3
- Best For: Most applications balancing savings and efficiency
- Focus: Maximum caching coverage
- Breakpoints: All 4 available
- Best For: High-volume applications with repeated content
Autocache provides detailed ROI metrics via response headers:
Header | Description |
---|---|
X-Autocache-Injected |
Whether caching was applied (true /false ) |
X-Autocache-Cache-Ratio |
Percentage of tokens cached (0.0-1.0) |
X-Autocache-ROI-Percent |
Percentage savings at scale |
X-Autocache-ROI-BreakEven |
Requests needed to break even |
X-Autocache-Savings-100req |
Total savings after 100 requests |
X-Autocache-Injected: true
X-Autocache-Total-Tokens: 5120
X-Autocache-Cached-Tokens: 4096
X-Autocache-Cache-Ratio: 0.800
X-Autocache-ROI-FirstCost: $0.024
X-Autocache-ROI-Savings: $0.0184
X-Autocache-ROI-BreakEven: 2
X-Autocache-ROI-Percent: 85.2
X-Autocache-Breakpoints: system:2048:1h,tools:1024:1h,content:1024:5m
X-Autocache-Savings-100req: $1.75
POST /v1/messages
Drop-in replacement for Anthropic's /v1/messages
endpoint with automatic cache injection.
GET /health
Returns server health and configuration status.
GET /metrics
Returns supported models, strategies, and cache limits.
GET /savings
Returns comprehensive ROI analytics and caching statistics:
Response includes:
- Recent Requests: Full history of recent requests with cache metadata
- Aggregated Stats:
- Total requests processed
- Requests with cache applied
- Total tokens processed and cached
- Average cache ratio
- Projected savings after 10 and 100 requests
- Debug Info:
- Breakpoints by type (system, tools, content)
- Average tokens per breakpoint type
- Configuration: Current cache strategy and history size
Example usage:
curl http://localhost:8080/savings | jq '.aggregated_stats'
Example response:
{
"aggregated_stats": {
"total_requests": 25,
"requests_with_cache": 20,
"total_tokens_processed": 125000,
"total_tokens_cached": 95000,
"average_cache_ratio": 0.76,
"total_savings_after_10_reqs": "$1.85",
"total_savings_after_100_reqs": "$18.50"
},
"debug_info": {
"breakpoints_by_type": {
"system": 15,
"tools": 12,
"content": 8
},
"average_tokens_by_type": {
"system": 2048,
"tools": 1536,
"content": 1200
}
}
}
Use cases:
- π Monitor cache effectiveness over time
- π Debug cache injection decisions
- π° Track actual cost savings
- π Analyze which content types benefit most from caching
Add these headers to skip cache injection:
X-Autocache-Bypass: true
# or
X-Autocache-Disable: true
# Aggressive caching with debug logging
CACHE_STRATEGY=aggressive LOG_LEVEL=debug ./autocache
# Conservative caching with higher thresholds
CACHE_STRATEGY=conservative TOKEN_MULTIPLIER=1.5 ./autocache
# docker-compose.prod.yml
version: '3.8'
services:
autocache:
image: autocache:latest
environment:
- LOG_JSON=true
- LOG_LEVEL=info
- CACHE_STRATEGY=aggressive
ports:
- "8080:8080"
restart: unless-stopped
- Request Analysis: Analyzes incoming Anthropic API requests
- Token Counting: Uses approximated tokenization to identify cacheable content
- Smart Injection: Places cache-control fields at optimal breakpoints:
- System prompts (1h TTL)
- Tool definitions (1h TTL)
- Large content blocks (5m TTL)
- ROI Calculation: Computes cost savings and break-even analysis
- Request Forwarding: Sends enhanced request to Anthropic API
- Response Enhancement: Adds ROI metadata to response headers
- β System messages
- β Tool definitions
- β Text content blocks
- β Message content
- β Images (not cacheable per Anthropic limits)
- Most models: 1024 tokens minimum
- Haiku models: 2048 tokens minimum
- Breakpoint limit: 4 per request
- 5 minutes: Dynamic content, frequent changes
- 1 hour: Stable content (system prompts, tools)
Request: 8,000 tokens (6,000 cached system prompt + 2,000 user question)
Cost without caching: $0.024 per request
Cost with caching:
- First request: $0.027 (includes cache write)
- Subsequent requests: $0.0066 (90% savings)
- Break-even: 2 requests
- Savings after 100 requests: $1.62
Request: 12,000 tokens (10,000 cached codebase + 2,000 review request)
Cost without caching: $0.036 per request
Cost with caching:
- First request: $0.045 (includes cache write)
- Subsequent requests: $0.009 (75% savings)
- Break-even: 1 request
- Savings after 100 requests: $2.61
# Debug mode for detailed cache decisions
LOG_LEVEL=debug ./autocache
# JSON logging for production
LOG_JSON=true LOG_LEVEL=info ./autocache
cache_injected
: Whether caching was appliedcache_ratio
: Percentage of tokens cachedbreakpoints
: Number of cache breakpoints usedroi_percent
: Percentage savings achieved
# Check proxy health
curl http://localhost:8080/health
# Get metrics and configuration
curl http://localhost:8080/metrics
# Get comprehensive savings analytics
curl http://localhost:8080/savings | jq .
# Monitor aggregated statistics
curl http://localhost:8080/savings | jq '.aggregated_stats'
# Check breakpoint distribution
curl http://localhost:8080/savings | jq '.debug_info.breakpoints_by_type'
β No caching applied
- Check token counts meet minimums (1024/2048)
- Verify content is cacheable (not images)
- Review cache strategy configuration
β High break-even point
- Content may be too small for effective caching
- Consider more conservative strategy
- Check token multiplier setting
β API key errors
- Ensure
ANTHROPIC_API_KEY
is set or passed in headers - Verify API key format:
sk-ant-...
LOG_LEVEL=debug ./autocache
Provides detailed information about:
- Token counting decisions
- Cache breakpoint placement
- ROI calculations
- Request/response processing
Autocache follows Go best practices with a clean, modular architecture:
autocache/
βββ cmd/
β βββ autocache/ # Application entry point
βββ internal/
β βββ types/ # Shared data models
β βββ config/ # Configuration management
β βββ tokenizer/ # Token counting (heuristic, offline, API-based)
β βββ pricing/ # Cost calculations and ROI
β βββ client/ # Anthropic API client
β βββ cache/ # Cache injection logic
β βββ server/ # HTTP handlers and routing
βββ test_fixtures.go # Shared test utilities
- Server: HTTP request handler with streaming support
- Cache Injector: Intelligent cache breakpoint placement with ROI scoring
- Tokenizer: Multiple implementations (heuristic, offline tokenizer, real API)
- Pricing Calculator: ROI and cost-benefit analysis
- API Client: Anthropic API communication with header management
For detailed architecture documentation, see CLAUDE.md.
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass:
go test ./...
- Submit a pull request
MIT License - see LICENSE file for details.
- π§ Email: [email protected]
- π¬ Issues: GitHub Issues
- π Documentation: GitHub Wiki
Autocache - Maximize your Anthropic API efficiency with intelligent caching π