LLM-TXT Generator

A tool that crawls documentation websites and generates LLM-friendly summaries in llm.txt format.

Features

🕷️ Smart Crawling: Discovers URLs via sitemaps and crawls politely
🤖 AI-Powered: Uses Cohere for intelligent summarization
📏 Size Management: Keeps outputs within configurable size limits
🚫 Respectful: Honors robots.txt and implements crawl delays
🔄 API & CLI: Both REST API and command-line interfaces
⚡ Fast: Async processing with concurrent job handling

Quick Start

Installation

# Clone the repository
git clone https://github.com/sshtomar/llm-txt.git
cd llm-txt

# Install dependencies
make install

# Copy environment file and add your Cohere API key
cp .env.example .env
# Edit .env and add your COHERE_API_KEY

CLI Usage

# Generate llm.txt for a documentation site
llm-txt generate --url https://docs.example.com

# Generate both regular and full versions
llm-txt generate --url https://docs.example.com --full

# Customize crawl parameters
llm-txt generate \
  --url https://docs.example.com \
  --max-pages 50 \
  --max-depth 2 \
  --max-kb 300 \
  --output my-docs.txt

API Usage

# Start the API server
make dev

# Create a generation job
curl -X POST "http://localhost:8000/v1/generations" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://docs.example.com",
    "max_pages": 100,
    "max_depth": 3,
    "full_version": false
  }'

# Check job status
curl "http://localhost:8000/v1/generations/{job_id}"

# Download results
curl "http://localhost:8000/v1/generations/{job_id}/download/llm.txt"

Configuration

Set these environment variables in .env:

# Required
COHERE_API_KEY=your_api_key_here

# Optional (with defaults)
MAX_PAGES=100
MAX_DEPTH=3
MAX_KB=500
REQUEST_DELAY=1.0
USER_AGENT=llm-txt-generator/0.1.0

Development

# Install development dependencies
make install

# Run tests
make test

# Type checking
make typecheck

# Format code
make fmt

# Start development server
make dev

Payment System (Dodo Payments + DynamoDB)

The application includes a secure payment system that:

✅ Verifies webhooks from Dodo Payments with HMAC signatures
✅ Stores entitlements persistently in DynamoDB
✅ Uses one-time tokens (not static secrets) for upgrade success
✅ Checks database for active Pro status (not just cookies)
✅ Collects email before checkout to associate payments with users
✅ Prevents cookie-based entitlement bypass attacks

Quick Setup:

Set up DynamoDB tables:

./scripts/setup-dynamodb-tables.sh us-east-1

Configure environment variables (see .env.example):

DODO_WEBHOOK_SECRET=your-webhook-secret
ENTITLEMENT_SECRET=$(openssl rand -hex 32)
ENTITLEMENT_ALLOW_UNVERIFIED=false  # true for dev only
AWS_REGION=us-east-1
ENTITLEMENTS_TABLE=llmxt-entitlements
PAYMENTS_TABLE=llmxt-payments

Configure Dodo webhook:
- URL: https://your-domain.com/api/webhooks/dodo
- Events: payment.succeeded, payment.failed, subscription.cancelled
Install dependencies:
```
cd web_codex && npm install
```

📘 Full setup guide: See PAYMENT_SETUP.md for detailed instructions, security checklist, and troubleshooting.

Dev testing: Set ENTITLEMENT_ALLOW_UNVERIFIED=true in .env.local and visit /upgrade/success to unlock Pro locally without payment.

Optional: Enable Google/GitHub Auth in the Frontend

The Next.js frontend (web_codex) supports sign-in with Google and GitHub via NextAuth.

Create OAuth apps:
- GitHub: callback URL http://localhost:3000/api/auth/callback/github
- Google: authorized redirect URI http://localhost:3000/api/auth/callback/google
Create web_codex/.env.local (or set envs) with:
- NEXTAUTH_URL=http://localhost:3000
- NEXTAUTH_SECRET=<random 32+ char string>
- GITHUB_ID, GITHUB_SECRET
- GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET
Start frontend: cd web_codex && npm install && npm run dev
Use the Sign in button in the header. The Generate action requires authentication.

Architecture

/api          # FastAPI REST API endpoints
/worker       # Background job processing
/crawler      # Web crawling and content extraction
/composer     # LLM-powered content generation
/cli          # Command-line interface

License

MIT License - see LICENSE file for details.

MCP Server (IDE Integration)

Expose llm.txt generation to MCP-capable IDEs/tools (Cursor, Claude Code, Codex) via a thin MCP server that calls this repo’s FastAPI API.

Install: make install (adds the llm-txt-mcp console script)
Env vars (optional):
- LLM_TXT_API_BASE_URL (default: http://localhost:8000)
- LLM_TXT_API_TOKEN (optional Bearer token)
- LLM_TXT_API_TIMEOUT (default: 180 seconds)

Configure MCP clients

Cursor: Settings → MCP Servers → add a server with command llm-txt-mcp and set env vars.
Claude Desktop (mcp.json example):

{
  "mcpServers": {
    "llm-txt": {
      "command": "llm-txt-mcp",
      "env": {
        "LLM_TXT_API_BASE_URL": "http://localhost:8000"
      }
    }
  }
}

Sample config files you can copy:

configs/mcp/claude.mcp.json
configs/mcp/cursor.mcp.json
configs/mcp/codex.mcp.json

Codex CLI

Project-level config file added: .codex/mcp.json
- Uses absolute command to your venv: /Users/explorer/llm-txt/venv/bin/llm-txt-mcp
- Edit if your path differs.
Start backend: source venv/bin/activate && make dev
In Codex, the MCP server llm-txt should appear with tool generate_llms_txt.
- Example request (from Codex prompt):
  - “Use tool llm-txt.generate_llms_txt with url=https://docs.example.com, full=true, max_pages=50.”

Tools exposed

generate_llms_txt(url, full?, max_pages?, max_depth?, max_kb?, respect_robots?, language?, wait_seconds?)
- Starts a job (POST /v1/generations), polls status, downloads results, and returns content and metadata.
get_generation_status(job_id)
cancel_generation(job_id)

Backend remains unchanged; this MCP server is just a translation layer.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.claude/commands		.claude/commands
.codex		.codex
.github/workflows		.github/workflows
configs/mcp		configs/mcp
llm_txt		llm_txt
scripts		scripts
web_codex		web_codex
.dockerignore		.dockerignore
.env.aws		.env.aws
.env.example		.env.example
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
CLAUDE_FRONTEND.md		CLAUDE_FRONTEND.md
Dockerfile.api		Dockerfile.api
GITHUB_SECRETS_SETUP.md		GITHUB_SECRETS_SETUP.md
MCP_API_MAPPING.md		MCP_API_MAPPING.md
MCP_README.md		MCP_README.md
Makefile		Makefile
PAYMENT_SETUP.md		PAYMENT_SETUP.md
PUBLISHING.md		PUBLISHING.md
README.md		README.md
SECURITY_FIXES.md		SECURITY_FIXES.md
amplify.yml		amplify.yml
apprunner-create.json		apprunner-create.json
apprunner.yaml		apprunner.yaml
openapi.yaml		openapi.yaml
permissions-policy.json		permissions-policy.json
pyproject.toml		pyproject.toml
secrets-policy.json		secrets-policy.json
trust-instance.json		trust-instance.json
trust-policy.json		trust-policy.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-TXT Generator

Features

Quick Start

Installation

CLI Usage

API Usage

Configuration

Development

Payment System (Dodo Payments + DynamoDB)

Optional: Enable Google/GitHub Auth in the Frontend

Architecture

License

MCP Server (IDE Integration)

Configure MCP clients

Codex CLI

Tools exposed

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

sshtomar/llm-txt

Folders and files

Latest commit

History

Repository files navigation

LLM-TXT Generator

Features

Quick Start

Installation

CLI Usage

API Usage

Configuration

Development

Payment System (Dodo Payments + DynamoDB)

Optional: Enable Google/GitHub Auth in the Frontend

Architecture

License

MCP Server (IDE Integration)

Configure MCP clients

Codex CLI

Tools exposed

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages