Bun AI OCR Tester

A comprehensive Bun-based CLI and Web application for OCR processing using multiple AI models.

Features

Dual Mode Operation: Works as both a CLI tool and a web application
Multiple AI Models: Support for Mistral AI OCR and Gemini Pro 2.5
PDF Processing: Converts PDF files to images and performs OCR
Real-time Progress: WebSocket-based progress updates in web mode
Persistent Output: Web mode saves results to permanent folders
Modern UI: React frontend with Material-UI components

Prerequisites

Bun runtime installed

Quick Start

Install dependencies:
```
bun install
```

Set up API keys:

cp .env.example .env
# Edit .env with your API keys

Build the application:
```
bun run build:full
```

Run in web mode:

bun run dev
# Opens http://localhost:3001 automatically

Usage

CLI Mode

# Basic usage
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --output ./output

# With Gemini model and custom prompt
bun run dev --pdf input.pdf --model gemini-pro-2.5 --api-key YOUR_KEY --prompt "Extract all text with formatting" --output ./output

# Extract images separately
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --get-images --output ./output

Web Mode

# Start web server (opens browser automatically)
bun run dev

# Or build and run production version
bun run build:full
bun run start

The web interface provides:

Model selection (Mistral AI OCR, Gemini Pro 2.5)
API key management (stored in browser)
Custom prompt input
Drag-and-drop PDF upload
Real-time progress updates
Side-by-side PDF viewer and markdown results
Copy markdown button
Open output folder button

Project Structure

bun-ai-ocr-tester/
├── public/                 # Built React app and static assets
│   ├── index.html          # Main HTML file
│   ├── index.js            # Bundled React app
│   └── pdf.worker.min.js   # PDF.js worker
├── src/
│   ├── index.ts           # Main entry point (CLI/Web router)
│   ├── cli.ts             # CLI mode handler
│   ├── server/            # Web server components
│   │   ├── index.ts       # Express server setup
│   │   └── ws.ts          # WebSocket handler
│   ├── models/            # OCR model implementations
│   │   ├── modelInterface.ts
│   │   ├── mistralOCR.ts
│   │   └── geminiOCR.ts
│   ├── utils/             # Utility functions
│   │   ├── apiKeyManager.ts
│   │   ├── outputManager.ts
│   │   ├── pdfProcessor.ts
│   │   └── progressLogger.ts
│   ├── web/               # React frontend source
│   │   ├── src/
│   │   │   ├── App.tsx
│   │   │   ├── components/
│   │   │   ├── hooks/
│   │   │   └── services/
│   │   └── public/
│   └── types/             # TypeScript definitions
├── ocr-outputs/           # Web mode output directory
├── package.json
├── tsconfig.json
└── README.md

API Keys

Mistral AI

Visit https://console.mistral.ai/
Create an account and get your API key
Use model: mistral-ai-ocr

Gemini Pro

Visit https://makersuite.google.com/app/apikey
Create an account and get your API key
Use model: gemini-pro-2.5

Environment Variables

MISTRAL_API_KEY: Default API key for Mistral AI
GEMINI_API_KEY: Default API key for Gemini Pro
PORT: Custom port for web server (default: 3001)

Build Scripts

bun run build:full - Clean and full build
bun run build:frontend - Build React app only
bun run build:backend - Build backend only
bun run clean - Remove build artifacts

Development

# Install dependencies
bun install

# Start in development mode (web)
bun run dev

# Test CLI mode
bun run dev --pdf sample.pdf --model mistral-ai-ocr --api-key YOUR_KEY

# Build for production
bun run build:full

Features in Detail

CLI Mode

Process PDFs with command-line arguments
Support for both Mistral and Gemini models
Configurable output directory
Optional image extraction
Console progress logging

Web Mode

Modern React interface with Material-UI
Real-time WebSocket progress updates
Persistent browser settings (API keys, prompts)
Side-by-side PDF viewer and results
Permanent output folders with "Open Folder" feature
Copy markdown to clipboard

Security

API keys stored locally (browser localStorage or environment)
Path validation for file serving
Input sanitization
CORS protection

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
.gitignore		.gitignore
README.md		README.md
bun.lock		bun.lock
package.json		package.json
testme.pdf		testme.pdf
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Bun AI OCR Tester

Features

Prerequisites

Quick Start

Usage

CLI Mode

Web Mode

Project Structure

API Keys

Mistral AI

Gemini Pro

Environment Variables

Build Scripts

Development

Features in Detail

CLI Mode

Web Mode

Security

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

hatton/ai-ocr-tests

Folders and files

Latest commit

History

Repository files navigation

Bun AI OCR Tester

Features

Prerequisites

Quick Start

Usage

CLI Mode

Web Mode

Project Structure

API Keys

Mistral AI

Gemini Pro

Environment Variables

Build Scripts

Development

Features in Detail

CLI Mode

Web Mode

Security

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages