Context-Aware Learning and Intelligent Command Orchestrator
AI-powered browser automation with GPT reasoning, visual intelligence, and distributed architecture.
Tags: ai-automation browser-automation gpt-4 playwright web-scraping ocr intelligent-agents python celery redis
# Install dependencies
pip install -r requirements/base.requirements.txt
# Set API key
export OPENAI_API_KEY=sk-your-key
# Launch interactive shell
./bin/calico- AI-Powered: GPT-4 integration for intelligent task planning and execution
- Visual Intelligence: OCR support for CAPTCHAs, images, and complex layouts using Tesseract and Google Cloud Vision
- Interactive CLI: Active shell with tab completion and real-time monitoring
- Anti-Detection: Enhanced browser fingerprinting evasion using Patchright
- Session Management: Organized storage for screenshots, logs, and training data
- Distributed Architecture: Scalable design with Celery workers and Redis for session isolation
- MCP Integration: Browser control via Model Context Protocol (MCP) WebSocket service
Calico supports two browser control methods:
- Python Server (default): Direct Playwright integration with built-in stealth features
- MCP Server (alternative): Model Context Protocol WebSocket service for remote browser control
Key environment variables:
# AI Services
OPENAI_API_KEY=sk-... # Required: GPT-4 API key
GOOGLE_APPLICATION_CREDENTIALS=./keys/ # Optional: Google Vision OCR
# Browser Automation (Python Server - default)
PLAYWRIGHT_HEADLESS=true # Browser display mode
# Browser Automation (MCP Server - alternative)
MCP_WS_URL=ws://localhost:7001 # MCP WebSocket service URL
# Session Storage
SESSION_STORAGE_DIR=./sessions # Base directory for session dataSession storage structure:
./sessions/{session-uuid}/
├── metadata.json
├── photos/
├── logs/
└── data/
- Calico Core: Main automation engine with GPT-4 reasoning
- Browser Layer: Playwright/Patchright for stealth automation (default: Python server, alternative: MCP WebSocket server)
- Vision Layer: Multi-provider OCR (Tesseract, Google Cloud Vision)
- Task Queue: Celery with Redis for distributed processing
- Session Storage: UUID-based isolation for screenshots, logs, and data
MIT License - see LICENSE file for details.