Autocache

Intelligent Anthropic API Cache Proxy with ROI Analytics

Autocache is a smart proxy server that automatically injects cache-control fields into Anthropic Claude API requests, reducing costs by up to 90% and latency by up to 85% while providing detailed ROI analytics via response headers.

Motivation

Modern AI agent platforms like n8n, Flowise, Make.com, and even popular frameworks like LangChain and LlamaIndex don't support Anthropic's prompt caching—despite users building increasingly complex agents with:

📝 Large system prompts (1,000-5,000+ tokens)
🛠️ 10+ tool definitions (5,000-15,000+ tokens)
🔄 Repeated agent interactions (same context, different queries)

The Problem

When you build a complex agent in n8n with a detailed system prompt and multiple tools, every API call sends the full context again—costing 10x more than necessary. For example:

Without caching: 15,000 token agent → $0.045 per request
With caching: Same agent → $0.0045 per request after first call (90% savings)

Real User Pain Points

The AI community has been requesting this feature:

🔗 n8n GitHub Issue #13231 - "Anthropic model not caching system prompt"
🔗 Flowise Issue #4289 - "Support for Anthropic Prompt Caching"
🔗 n8n Community Request - Multiple requests for caching support
🔗 LangChain Issue #26701 - Implementation difficulties

The Solution

Autocache works as a transparent proxy that automatically analyzes your requests and injects cache-control headers at optimal breakpoints—no code changes required. Just point your existing n8n/Flowise/Make.com workflows to Autocache instead of directly to Anthropic's API.

Result: Same agents, 90% lower costs, 85% lower latency—automatically.

Alternatives & Comparison

Several tools offer prompt caching support, but Autocache is unique in combining zero-config transparent proxy with intelligent ROI analytics:

Existing Solutions

Solution	Type	Auto-Injection	Intelligence	ROI Analytics	Drop-in for n8n/Flowise
Autocache	Proxy	✅ Fully automatic	✅ Token analysis + ROI scoring	✅ Response headers	✅ Yes
LiteLLM	Proxy	⚠️ Requires config	❌ Rule-based	❌ No	✅ Yes
langchain-smart-cache	Library	✅ Fully automatic	✅ Priority-based	✅ Statistics	❌ LangChain only
anthropic-cost-tracker	Library	❓ Unclear	❓ Unknown	✅ Dashboard	❌ Python only
OpenRouter	Service	⚠️ Provider-dependent	❌ No	❌ No	✅ Yes
AWS Bedrock	Cloud	✅ ML-based	✅ Yes	✅ AWS only	❌ AWS only

What Makes Autocache Different

Autocache is the only solution that combines:

🔄 Transparent Proxy - Works with any tool (n8n, Flowise, Make.com) without code changes
🧠 Intelligent Analysis - Automatic token counting, ROI scoring, and optimal breakpoint placement
📊 Real-time ROI - Cost savings and break-even analysis in every response header
🏠 Self-Hosted - No external dependencies or cloud vendor lock-in
⚙️ Zero Config - Works out of the box with multiple strategies (conservative/moderate/aggressive)

Other solutions require configuration (LiteLLM), framework lock-in (langchain-smart-cache), or don't provide transparent proxy functionality for agent builders.

Features

✨ Drop-in Replacement: Simply change your API URL and get automatic caching 📊 ROI Analytics: Detailed cost savings and break-even analysis via headers 🎯 Smart Caching: Intelligent placement of cache breakpoints using multiple strategies ⚡ High Performance: Supports both streaming and non-streaming requests 🔧 Configurable: Multiple caching strategies and customizable thresholds 🐳 Docker Ready: Easy deployment with Docker and docker-compose 📋 Comprehensive Logging: Detailed request/response logging with structured output

Getting Started with Docker

The fastest way to start using Autocache is with the published Docker image from GitHub Container Registry:

Quick Start (30 seconds)

1. Run the container:

# Option A: With API key in environment variable
docker run -d -p 8080:8080 \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  --name autocache \
  ghcr.io/montevive/autocache:latest

# Option B: Without API key (pass it per-request in headers)
docker run -d -p 8080:8080 \
  --name autocache \
  ghcr.io/montevive/autocache:latest

2. Verify it's running:

curl http://localhost:8080/health
# {"status":"healthy","version":"1.0.1","strategy":"moderate"}

3. Test with a request:

# If using Option A (env var), no header needed:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

# If using Option B (no env var), pass API key in header:
curl http://localhost:8080/v1/messages \
  -H "Content-Type: application/json" \
  -H "x-api-key: sk-ant-..." \
  -H "anthropic-version: 2023-06-01" \
  -d '{
    "model": "claude-3-5-haiku-20241022",
    "max_tokens": 50,
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

4. Check the cache metadata in response headers:

X-Autocache-Injected: true
X-Autocache-Cache-Ratio: 0.750
X-Autocache-ROI-Percent: 85.2
X-Autocache-Savings-100req: $1.75

Available Docker Tags

latest - Latest stable release (recommended)
v1.0.1 - Specific version tag
1.0.1, 1.0, 1 - Semantic version aliases

Docker Image Details

Registry: ghcr.io/montevive/autocache
Architectures: linux/amd64, linux/arm64
Size: ~29 MB (optimized Alpine-based image)
Source: https://github.com/montevive/autocache

Next Steps

For production deployments with docker-compose, see the Quick Start section below
For configuration options, see Configuration
For n8n integration, see our n8n setup guide

Quick Start

Using Docker Compose (Recommended)

Clone and configure:

git clone <repository-url>
cd autocache
cp .env.example .env
# Edit .env with your ANTHROPIC_API_KEY (optional - can pass in headers instead)

Start the proxy:

docker-compose up -d

Use in your application:

# Change your API base URL from:
# https://api.anthropic.com
# To:
# http://localhost:8080

Direct Usage

Build and run:

go mod download
go build -o autocache ./cmd/autocache
# Option 1: Set API key via environment (optional)
ANTHROPIC_API_KEY=sk-ant-... ./autocache
# Option 2: Run without API key (pass it in request headers)
./autocache

Configure your client:

# Python example - API key passed in headers
import anthropic

client = anthropic.Anthropic(
    api_key="sk-ant-...",  # This will be forwarded to Anthropic
    base_url="http://localhost:8080"  # Point to autocache
)

Configuration

Environment Variables

Variable	Default	Description
`PORT`	`8080`	Server port
`ANTHROPIC_API_KEY`	-	Your Anthropic API key (optional if passed in request headers)
`CACHE_STRATEGY`	`moderate`	Caching strategy:`conservative`/`moderate`/`aggressive`
`LOG_LEVEL`	`info`	Log level:`debug`/`info`/`warn`/`error`
`MAX_CACHE_BREAKPOINTS`	`4`	Maximum cache breakpoints (1-4)
`TOKEN_MULTIPLIER`	`1.0`	Token threshold multiplier

API Key Configuration

The Anthropic API key can be provided in three ways (in order of precedence):

Request headers (recommended for multi-tenant scenarios):

Authorization: Bearer sk-ant-...
# or
x-api-key: sk-ant-...

Environment variable:

ANTHROPIC_API_KEY=sk-ant-... ./autocache

.env file:
```
ANTHROPIC_API_KEY=sk-ant-...
```

💡 Tip: For multi-user environments (like n8n with multiple API keys), pass the key in request headers and leave the environment variable unset.

Cache Strategies

🛡️ Conservative

Focus: System prompts and tools only
Breakpoints: Maximum 2
Best For: Cost-sensitive applications with predictable content

⚖️ Moderate (Default)

Focus: System, tools, and large content blocks
Breakpoints: Maximum 3
Best For: Most applications balancing savings and efficiency

🚀 Aggressive

Focus: Maximum caching coverage
Breakpoints: All 4 available
Best For: High-volume applications with repeated content

ROI Analytics

Autocache provides detailed ROI metrics via response headers:

Key Headers

Header	Description
`X-Autocache-Injected`	Whether caching was applied (`true`/`false`)
`X-Autocache-Cache-Ratio`	Percentage of tokens cached (0.0-1.0)
`X-Autocache-ROI-Percent`	Percentage savings at scale
`X-Autocache-ROI-BreakEven`	Requests needed to break even
`X-Autocache-Savings-100req`	Total savings after 100 requests

Example Response Headers

X-Autocache-Injected: true
X-Autocache-Total-Tokens: 5120
X-Autocache-Cached-Tokens: 4096
X-Autocache-Cache-Ratio: 0.800
X-Autocache-ROI-FirstCost: $0.024
X-Autocache-ROI-Savings: $0.0184
X-Autocache-ROI-BreakEven: 2
X-Autocache-ROI-Percent: 85.2
X-Autocache-Breakpoints: system:2048:1h,tools:1024:1h,content:1024:5m
X-Autocache-Savings-100req: $1.75

API Endpoints

Main Endpoint

POST /v1/messages

Drop-in replacement for Anthropic's /v1/messages endpoint with automatic cache injection.

Health Check

GET /health

Returns server health and configuration status.

Metrics

GET /metrics

Returns supported models, strategies, and cache limits.

Savings Analytics

GET /savings

Returns comprehensive ROI analytics and caching statistics:

Response includes:

Recent Requests: Full history of recent requests with cache metadata
Aggregated Stats:
- Total requests processed
- Requests with cache applied
- Total tokens processed and cached
- Average cache ratio
- Projected savings after 10 and 100 requests
Debug Info:
- Breakpoints by type (system, tools, content)
- Average tokens per breakpoint type
Configuration: Current cache strategy and history size

Example usage:

curl http://localhost:8080/savings | jq '.aggregated_stats'

Example response:

{
  "aggregated_stats": {
    "total_requests": 25,
    "requests_with_cache": 20,
    "total_tokens_processed": 125000,
    "total_tokens_cached": 95000,
    "average_cache_ratio": 0.76,
    "total_savings_after_10_reqs": "$1.85",
    "total_savings_after_100_reqs": "$18.50"
  },
  "debug_info": {
    "breakpoints_by_type": {
      "system": 15,
      "tools": 12,
      "content": 8
    },
    "average_tokens_by_type": {
      "system": 2048,
      "tools": 1536,
      "content": 1200
    }
  }
}

Use cases:

📊 Monitor cache effectiveness over time
🔍 Debug cache injection decisions
💰 Track actual cost savings
📈 Analyze which content types benefit most from caching

Advanced Usage

Bypass Caching

Add these headers to skip cache injection:

X-Autocache-Bypass: true
# or
X-Autocache-Disable: true

Custom Configuration

# Aggressive caching with debug logging
CACHE_STRATEGY=aggressive LOG_LEVEL=debug ./autocache

# Conservative caching with higher thresholds
CACHE_STRATEGY=conservative TOKEN_MULTIPLIER=1.5 ./autocache

Production Deployment

# docker-compose.prod.yml
version: '3.8'
services:
  autocache:
    image: autocache:latest
    environment:
      - LOG_JSON=true
      - LOG_LEVEL=info
      - CACHE_STRATEGY=aggressive
    ports:
      - "8080:8080"
    restart: unless-stopped

How It Works

Request Analysis: Analyzes incoming Anthropic API requests
Token Counting: Uses approximated tokenization to identify cacheable content
Smart Injection: Places cache-control fields at optimal breakpoints:
- System prompts (1h TTL)
- Tool definitions (1h TTL)
- Large content blocks (5m TTL)
ROI Calculation: Computes cost savings and break-even analysis
Request Forwarding: Sends enhanced request to Anthropic API
Response Enhancement: Adds ROI metadata to response headers

Cache Control Details

Supported Content Types

✅ System messages
✅ Tool definitions
✅ Text content blocks
✅ Message content
❌ Images (not cacheable per Anthropic limits)

Token Requirements

Most models: 1024 tokens minimum
Haiku models: 2048 tokens minimum
Breakpoint limit: 4 per request

TTL Options

5 minutes: Dynamic content, frequent changes
1 hour: Stable content (system prompts, tools)

Cost Savings Examples

Example 1: Documentation Chat

Request: 8,000 tokens (6,000 cached system prompt + 2,000 user question)
Cost without caching: $0.024 per request
Cost with caching:
  - First request: $0.027 (includes cache write)
  - Subsequent requests: $0.0066 (90% savings)
  - Break-even: 2 requests
  - Savings after 100 requests: $1.62

Example 2: Code Review Assistant

Request: 12,000 tokens (10,000 cached codebase + 2,000 review request)
Cost without caching: $0.036 per request
Cost with caching:
  - First request: $0.045 (includes cache write)
  - Subsequent requests: $0.009 (75% savings)
  - Break-even: 1 request
  - Savings after 100 requests: $2.61

Monitoring and Debugging

Logging

# Debug mode for detailed cache decisions
LOG_LEVEL=debug ./autocache

# JSON logging for production
LOG_JSON=true LOG_LEVEL=info ./autocache

Key Log Fields

cache_injected: Whether caching was applied
cache_ratio: Percentage of tokens cached
breakpoints: Number of cache breakpoints used
roi_percent: Percentage savings achieved

Health Monitoring

# Check proxy health
curl http://localhost:8080/health

# Get metrics and configuration
curl http://localhost:8080/metrics

# Get comprehensive savings analytics
curl http://localhost:8080/savings | jq .

# Monitor aggregated statistics
curl http://localhost:8080/savings | jq '.aggregated_stats'

# Check breakpoint distribution
curl http://localhost:8080/savings | jq '.debug_info.breakpoints_by_type'

Troubleshooting

Common Issues

❌ No caching applied

Check token counts meet minimums (1024/2048)
Verify content is cacheable (not images)
Review cache strategy configuration

❌ High break-even point

Content may be too small for effective caching
Consider more conservative strategy
Check token multiplier setting

❌ API key errors

Ensure ANTHROPIC_API_KEY is set or passed in headers
Verify API key format: sk-ant-...

Debug Mode

LOG_LEVEL=debug ./autocache

Provides detailed information about:

Token counting decisions
Cache breakpoint placement
ROI calculations
Request/response processing

Architecture

Autocache follows Go best practices with a clean, modular architecture:

autocache/
├── cmd/
│   └── autocache/           # Application entry point
├── internal/
│   ├── types/              # Shared data models
│   ├── config/             # Configuration management
│   ├── tokenizer/          # Token counting (heuristic, offline, API-based)
│   ├── pricing/            # Cost calculations and ROI
│   ├── client/             # Anthropic API client
│   ├── cache/              # Cache injection logic
│   └── server/             # HTTP handlers and routing
└── test_fixtures.go        # Shared test utilities

Key Components

Server: HTTP request handler with streaming support
Cache Injector: Intelligent cache breakpoint placement with ROI scoring
Tokenizer: Multiple implementations (heuristic, offline tokenizer, real API)
Pricing Calculator: ROI and cost-benefit analysis
API Client: Anthropic API communication with header management

For detailed architecture documentation, see CLAUDE.md.

Contributing

Fork the repository
Create a feature branch
Add tests for new functionality
Ensure all tests pass: go test ./...
Submit a pull request

License

MIT License - see LICENSE file for details.

Support

📧 Email: [email protected]
💬 Issues: GitHub Issues
📖 Documentation: GitHub Wiki

Autocache - Maximize your Anthropic API efficiency with intelligent caching 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
cmd/autocache		cmd/autocache
internal		internal
media		media
test_data		test_data
.env.example		.env.example
.gitignore		.gitignore
API_KEY_HANDLING.md		API_KEY_HANDLING.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
README_REAL_TESTING.md		README_REAL_TESTING.md
docker-compose.yml		docker-compose.yml
go.mod		go.mod
go.sum		go.sum
run_file_tests.sh		run_file_tests.sh
run_precision_tests.sh		run_precision_tests.sh
run_tests.sh		run_tests.sh
setup_real_test.sh		setup_real_test.sh
test_agent_conversation.sh		test_agent_conversation.sh
test_real.sh		test_real.sh
test_savings_endpoint.sh		test_savings_endpoint.sh

License

montevive/autocache

Folders and files

Latest commit

History

Repository files navigation

Autocache

Motivation

The Problem

Real User Pain Points

The Solution

Alternatives & Comparison

Existing Solutions

What Makes Autocache Different

Features

Getting Started with Docker

Quick Start (30 seconds)

Available Docker Tags

Docker Image Details

Next Steps

Quick Start

Using Docker Compose (Recommended)

Direct Usage

Configuration

Environment Variables

API Key Configuration

Cache Strategies

🛡️ Conservative

⚖️ Moderate (Default)

🚀 Aggressive

ROI Analytics

Key Headers

Example Response Headers

API Endpoints

Main Endpoint

Health Check

Metrics

Savings Analytics

Advanced Usage

Bypass Caching

Custom Configuration

Production Deployment

How It Works

Cache Control Details

Supported Content Types

Token Requirements

TTL Options

Cost Savings Examples

Example 1: Documentation Chat

Example 2: Code Review Assistant

Monitoring and Debugging

Logging

Key Log Fields

Health Monitoring

Troubleshooting

Common Issues

Debug Mode

Architecture

Key Components

Contributing

License

Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Languages

Packages