A comprehensive Bun-based CLI and Web application for OCR processing using multiple AI models.
- Dual Mode Operation: Works as both a CLI tool and a web application
- Multiple AI Models: Support for Mistral AI OCR and Gemini Pro 2.5
- PDF Processing: Converts PDF files to images and performs OCR
- Real-time Progress: WebSocket-based progress updates in web mode
- Persistent Output: Web mode saves results to permanent folders
- Modern UI: React frontend with Material-UI components
- Bun runtime installed
-
Install dependencies:
bun install
-
Set up API keys:
cp .env.example .env # Edit .env with your API keys -
Build the application:
bun run build:full
-
Run in web mode:
bun run dev # Opens http://localhost:3001 automatically
# Basic usage
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --output ./output
# With Gemini model and custom prompt
bun run dev --pdf input.pdf --model gemini-pro-2.5 --api-key YOUR_KEY --prompt "Extract all text with formatting" --output ./output
# Extract images separately
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --get-images --output ./output# Start web server (opens browser automatically)
bun run dev
# Or build and run production version
bun run build:full
bun run startThe web interface provides:
- Model selection (Mistral AI OCR, Gemini Pro 2.5)
- API key management (stored in browser)
- Custom prompt input
- Drag-and-drop PDF upload
- Real-time progress updates
- Side-by-side PDF viewer and markdown results
- Copy markdown button
- Open output folder button
bun-ai-ocr-tester/
├── public/ # Built React app and static assets
│ ├── index.html # Main HTML file
│ ├── index.js # Bundled React app
│ └── pdf.worker.min.js # PDF.js worker
├── src/
│ ├── index.ts # Main entry point (CLI/Web router)
│ ├── cli.ts # CLI mode handler
│ ├── server/ # Web server components
│ │ ├── index.ts # Express server setup
│ │ └── ws.ts # WebSocket handler
│ ├── models/ # OCR model implementations
│ │ ├── modelInterface.ts
│ │ ├── mistralOCR.ts
│ │ └── geminiOCR.ts
│ ├── utils/ # Utility functions
│ │ ├── apiKeyManager.ts
│ │ ├── outputManager.ts
│ │ ├── pdfProcessor.ts
│ │ └── progressLogger.ts
│ ├── web/ # React frontend source
│ │ ├── src/
│ │ │ ├── App.tsx
│ │ │ ├── components/
│ │ │ ├── hooks/
│ │ │ └── services/
│ │ └── public/
│ └── types/ # TypeScript definitions
├── ocr-outputs/ # Web mode output directory
├── package.json
├── tsconfig.json
└── README.md
- Visit https://console.mistral.ai/
- Create an account and get your API key
- Use model:
mistral-ai-ocr
- Visit https://makersuite.google.com/app/apikey
- Create an account and get your API key
- Use model:
gemini-pro-2.5
MISTRAL_API_KEY: Default API key for Mistral AIGEMINI_API_KEY: Default API key for Gemini ProPORT: Custom port for web server (default: 3001)
bun run build:full- Clean and full buildbun run build:frontend- Build React app onlybun run build:backend- Build backend onlybun run clean- Remove build artifacts
# Install dependencies
bun install
# Start in development mode (web)
bun run dev
# Test CLI mode
bun run dev --pdf sample.pdf --model mistral-ai-ocr --api-key YOUR_KEY
# Build for production
bun run build:full- Process PDFs with command-line arguments
- Support for both Mistral and Gemini models
- Configurable output directory
- Optional image extraction
- Console progress logging
- Modern React interface with Material-UI
- Real-time WebSocket progress updates
- Persistent browser settings (API keys, prompts)
- Side-by-side PDF viewer and results
- Permanent output folders with "Open Folder" feature
- Copy markdown to clipboard
- API keys stored locally (browser localStorage or environment)
- Path validation for file serving
- Input sanitization
- CORS protection
MIT