Thanks to visit codestin.com
Credit goes to github.com

Skip to content

hatton/ai-ocr-tests

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bun AI OCR Tester

A comprehensive Bun-based CLI and Web application for OCR processing using multiple AI models.

Features

  • Dual Mode Operation: Works as both a CLI tool and a web application
  • Multiple AI Models: Support for Mistral AI OCR and Gemini Pro 2.5
  • PDF Processing: Converts PDF files to images and performs OCR
  • Real-time Progress: WebSocket-based progress updates in web mode
  • Persistent Output: Web mode saves results to permanent folders
  • Modern UI: React frontend with Material-UI components

Prerequisites

  • Bun runtime installed

Quick Start

  1. Install dependencies:

    bun install
  2. Set up API keys:

    cp .env.example .env
    # Edit .env with your API keys
  3. Build the application:

    bun run build:full
  4. Run in web mode:

    bun run dev
    # Opens http://localhost:3001 automatically

Usage

CLI Mode

# Basic usage
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --output ./output

# With Gemini model and custom prompt
bun run dev --pdf input.pdf --model gemini-pro-2.5 --api-key YOUR_KEY --prompt "Extract all text with formatting" --output ./output

# Extract images separately
bun run dev --pdf input.pdf --model mistral-ai-ocr --api-key YOUR_KEY --get-images --output ./output

Web Mode

# Start web server (opens browser automatically)
bun run dev

# Or build and run production version
bun run build:full
bun run start

The web interface provides:

  • Model selection (Mistral AI OCR, Gemini Pro 2.5)
  • API key management (stored in browser)
  • Custom prompt input
  • Drag-and-drop PDF upload
  • Real-time progress updates
  • Side-by-side PDF viewer and markdown results
  • Copy markdown button
  • Open output folder button

Project Structure

bun-ai-ocr-tester/
├── public/                 # Built React app and static assets
│   ├── index.html          # Main HTML file
│   ├── index.js            # Bundled React app
│   └── pdf.worker.min.js   # PDF.js worker
├── src/
│   ├── index.ts           # Main entry point (CLI/Web router)
│   ├── cli.ts             # CLI mode handler
│   ├── server/            # Web server components
│   │   ├── index.ts       # Express server setup
│   │   └── ws.ts          # WebSocket handler
│   ├── models/            # OCR model implementations
│   │   ├── modelInterface.ts
│   │   ├── mistralOCR.ts
│   │   └── geminiOCR.ts
│   ├── utils/             # Utility functions
│   │   ├── apiKeyManager.ts
│   │   ├── outputManager.ts
│   │   ├── pdfProcessor.ts
│   │   └── progressLogger.ts
│   ├── web/               # React frontend source
│   │   ├── src/
│   │   │   ├── App.tsx
│   │   │   ├── components/
│   │   │   ├── hooks/
│   │   │   └── services/
│   │   └── public/
│   └── types/             # TypeScript definitions
├── ocr-outputs/           # Web mode output directory
├── package.json
├── tsconfig.json
└── README.md

API Keys

Mistral AI

  1. Visit https://console.mistral.ai/
  2. Create an account and get your API key
  3. Use model: mistral-ai-ocr

Gemini Pro

  1. Visit https://makersuite.google.com/app/apikey
  2. Create an account and get your API key
  3. Use model: gemini-pro-2.5

Environment Variables

  • MISTRAL_API_KEY: Default API key for Mistral AI
  • GEMINI_API_KEY: Default API key for Gemini Pro
  • PORT: Custom port for web server (default: 3001)

Build Scripts

  • bun run build:full - Clean and full build
  • bun run build:frontend - Build React app only
  • bun run build:backend - Build backend only
  • bun run clean - Remove build artifacts

Development

# Install dependencies
bun install

# Start in development mode (web)
bun run dev

# Test CLI mode
bun run dev --pdf sample.pdf --model mistral-ai-ocr --api-key YOUR_KEY

# Build for production
bun run build:full

Features in Detail

CLI Mode

  • Process PDFs with command-line arguments
  • Support for both Mistral and Gemini models
  • Configurable output directory
  • Optional image extraction
  • Console progress logging

Web Mode

  • Modern React interface with Material-UI
  • Real-time WebSocket progress updates
  • Persistent browser settings (API keys, prompts)
  • Side-by-side PDF viewer and results
  • Permanent output folders with "Open Folder" feature
  • Copy markdown to clipboard

Security

  • API keys stored locally (browser localStorage or environment)
  • Path validation for file serving
  • Input sanitization
  • CORS protection

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published