Check if a document meets the requirements specified in another document using AI-powered analysis.
SpecTater is a Flask web application that uses AWS Bedrock (Claude or Amazon Nova models) to automatically evaluate whether submission documents (like Certificates of Insurance) meet requirements specified in policy documents. The application returns detailed, structured JSON results with pass/fail analysis for each requirement.
- Intelligent Document Analysis - Uses LLM to compare submissions against policy requirements
- Multiple Format Support - Accepts PDF, TXT, and Markdown files
- AWS Textract Integration - Automatically processes PDFs with tables using OCR
- Structured JSON Output - Returns detailed pass/fail for each requirement
- Session-Based Caching - Reuses policy documents across submissions
- IP Subnet Restriction - Optional access control by IP range
- CLI Testing Tools - Fast iteration with markdown files, wildcard support
- Customizable UI - Configure title, introduction, and static requirements
# Clone repository
git clone https://github.com/dirkpetersen/spectater.git
cd spectater
# Install dependencies
pip install -r requirements.txt
# Configure environment (copy and edit)
cp .env.default .env
# Edit .env to set your AWS credentials and MODEL_IDKey settings in .env:
# AWS Bedrock Model (use inference profile for Sonnet 4)
MODEL_ID='us.anthropic.claude-sonnet-4-5-20250929-v1:0'
# Application Settings
FLASK_DEBUG=True
FLASK_PORT=5000
# Optional: IP Subnet Restriction
# SUBNET_ONLY='192.168.0.0/16' # Allow only this subnet
# SUBNET_ONLY='127.0.0.1/32' # Localhost only (default)
# Optional: UI Customization
# TITLE='My Custom Title'
# INTRODUCTION='intro.txt' # Show intro text from file
# REQUIREMENTS='requirements.md' # Use static requirements (hides upload)python app.py
# Access at http://localhost:5000-
Upload Policy Document (PDF, TXT, or MD)
- Contains the requirements to check against
- Cached per session for reuse
- Or use
REQUIREMENTS=file.mdin .env for static requirements
-
Upload Submission Document (PDF or MD)
- The document to be evaluated
- PDFs with tables automatically use Textract OCR
- Auto-submits and shows progress messages
-
View Results
- Status Box: GREEN (pass) or RED (fail)
- Detailed Table: Pass/fail for each requirement with color coding
- Raw JSON: Complete structured output with model ID
Fast iteration without web browser:
# Basic usage
./testeval.py --spec requirements.md --submit document.md
# With debug output (shows JSON parsing, model info)
./testeval.py --spec requirements.md --submit document.md --debug
# Batch testing with wildcards (runs separate evaluation for each)
./testeval.py --spec requirements.md --submit 'cache/*/submission_*.md' --debug# Convert PDF to markdown
./pdf2md.py certificate.pdf certificate.md
# Auto-generate output filename
./pdf2md.py certificate.pdfThe LLM returns structured JSON:
{
"summary": {
"statement": "Overall compliance assessment",
"totalChecks": 10,
"passed": 8,
"failed": 2
},
"requirements": [
{
"requirement": "General Liability Limit",
"policyRequirement": "$2,000,000 per occurrence",
"submissionValue": "$1,000,000 per occurrence",
"pass": false,
"notes": "Does not meet minimum requirement"
}
]
}Status Determination:
- Automatically checks all
passfields in requirements array - RED if any requirement has
pass: false - GREEN if all requirements have
pass: true - Validates summary counts match actual requirements (shows warning if inconsistent)
When a PDF contains tables, the app automatically:
- Creates temporary S3 bucket (if needed)
- Uploads PDF to S3
- Runs Textract OCR analysis
- Converts results to markdown
- Cleans up S3 objects
- Deletes bucket on app shutdown
- Session-based: Flask sessions (not cookies) track users
- Cache Structure:
cache/{session_id}/policy.mdandcache/{session_id}/submission_{hash}.md - Debug Mode: Submission documents cached only when
FLASK_DEBUG=True - Hash-based: Submission cache uses filename hash for consistent lookups
- Automatically calculates max_tokens as 2/3 of input bytes
- Minimum 5000 tokens to prevent JSON truncation
- Uses
MAX_TOKENSfrom .env as floor if configured - Extended 1M token context for Claude Sonnet 4 (when tokens > 200,000)
Restrict access by IP address:
SUBNET_ONLY='10.10.0.0/16' # Allow 10.10.x.x
SUBNET_ONLY='192.168.1.0/24' # Allow 192.168.1.x
SUBNET_ONLY='127.0.0.1/32' # Localhost only (default)
SUBNET_ONLY='0.0.0.0/0' # Allow all (no restriction)Unauthorized users see: "You are not authorized to access this application from your location. You may need to connect via VPN with split tunneling DISABLED."
Create a customized experience:
# Custom title
TITLE='OSU Insurance Compliance Checker'
# Show introduction text
INTRODUCTION='welcome.txt'
# Use static requirements (hides policy upload field)
REQUIREMENTS='osu-requirements.md'- Policy Files: PDF, TXT, MD → Always cached
- Submission Files: PDF (with Textract for tables), MD → Cached in debug mode
- Character Limiting: Applied after markdown conversion (configurable)
- Temperature: Hardcoded to 0 (deterministic output)
- Cache Control: Disabled (
cache_control: {"type": "ephemeral"}) - Prompt Template:
analysis-prompt.md(uses string replacement, not .format())
When processing documents, users see rotating status messages:
- Uploading documents...
- Reading PDF files...
- Detecting tables and complex elements...
- Processing tables with OCR if needed...
- Analyzing submission against requirements...
- (14 messages total, changing every 3 seconds)
If you see "Expecting ',' delimiter" errors:
- Increase MAX_TOKENS in .env (or remove to use dynamic calculation)
- Try a model with higher output capacity (Sonnet vs Haiku)
- Check debug output to see where JSON truncates
Claude Sonnet 4 requires inference profile ARN:
# Use this (with inference profile)
MODEL_ID='us.anthropic.claude-sonnet-4-5-20250929-v1:0'
# Not this (direct model ID won't work)
MODEL_ID='anthropic.claude-sonnet-4-5-20250929-v1:0'If summary counts don't match requirements:
- Review
analysis-prompt.mdfor clarity - Check if policy document has ambiguous requirements
- Enable debug mode to see actual vs summary counts
When FLASK_DEBUG=True:
- Submission documents cached for faster re-testing
- Console shows LLM response preview
- JSON parsing details printed
- Summary count validation logged
- Use
testeval.pywith markdown files for rapid prompt iteration - Use wildcard patterns to batch-test multiple submissions
- Enable
--debugflag to see complete JSON parsing flow - Check cache files in
cache/{session_id}/to verify extraction
MIT License - see LICENSE file
- Python 3.12+
- AWS account with Bedrock access
- AWS credentials configured (via ~/.aws/credentials or environment variables)
- For Textract: S3 and Textract permissions
The application requires the following AWS IAM permissions:
For basic PDF analysis without Textract:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.nova-*"
]
}
]
}For complete functionality including table extraction from PDFs:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.nova-*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:CreateBucket",
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:DeleteBucket",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::spectater-textract-*",
"arn:aws:s3:::spectater-textract-*/*"
]
},
{
"Effect": "Allow",
"Action": [
"textract:AnalyzeDocument",
"textract:GetDocumentAnalysis"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"sts:GetCallerIdentity"
],
"Resource": "*"
}
]
}Note: The S3 bucket is created automatically with the naming pattern {APP_NAME}-textract-{account_id}-{region} and is cleaned up when the application shuts down.