🔍 Client-Side OCR with ONNX Runtime

Extract text from images directly in your browser - no server required! Now with RapidOCR and PPU PaddleOCR integration for 100+ languages!

Live Demo | NPM Package | Documentation | API Reference | Troubleshooting

A high-performance, privacy-focused OCR solution that runs entirely in the browser using ONNX Runtime with both RapidOCR and PPU PaddleOCR models. Process text from images and PDF documents without sending data to any server - everything happens locally on your device. Supporting 100+ languages with state-of-the-art accuracy!

📸 Screenshots

Main Interface	Preprocessing Options	Performance Metrics

🚀 Why Choose Client-Side OCR?

🔒 Complete Privacy & Security

Unlike cloud-based OCR services (Google Vision, AWS Textract, Azure OCR), your sensitive documents never leave your device. Perfect for:

📄 Legal documents & contracts
💳 Financial statements & invoices
🏥 Medical records
🆔 Personal IDs & passports
🔐 Confidential business documents

💰 Zero Costs, Unlimited Usage

No API fees: Save thousands compared to cloud OCR services
No rate limits: Process unlimited documents
No subscriptions: One-time integration, lifetime usage
No surprises: Predictable performance, no service outages

⚡ Superior Performance

Instant results: No network latency (avg 300-1500ms)
Offline capable: Works without internet after initial load
GPU acceleration: Uses WebGL for faster processing
Batch optimization: Process multiple regions efficiently

🎯 How It's Different

Feature	Client-Side OCR	Cloud OCR (Google/AWS)	Tesseract.js
Privacy	✅ 100% local	❌ Data sent to servers	✅ Local
Cost	✅ Free forever	❌ Pay per request	✅ Free
Languages	✅ 100+ built-in	✅ Many	⚠️ Manual setup
Performance	✅ Fast (ONNX)	⚠️ Network dependent	❌ Slow
Accuracy	✅ State-of-art	✅ High	⚠️ Good
Setup	✅ Simple npm install	❌ Complex API setup	⚠️ Large models
Preprocessing	✅ Built-in OpenCV	⚠️ Limited	❌ Basic
Model Size	✅ 15-30MB total	N/A	❌ 60MB+ per language
Offline	✅ Full support	❌ Requires internet	✅ Supported

🎨 Advanced Features Not Found Elsewhere

🖼️ Smart Preprocessing: Built-in OpenCV.js for image enhancement
🔄 Auto-rotation: Detects and corrects upside-down text
📊 Confidence scores: Get reliability metrics for each word
🔤 Word segmentation: Separate text into individual words
📱 Mobile optimized: Responsive design with camera capture
🚀 Progressive Web App: Install as native app on any device
🎯 Multiple Model Support: Choose between RapidOCR and PPU models

🎯 Real-World Use Cases

Perfect for Applications That Need:

📱 Document Scanner Apps: Build mobile/web document scanners
🏢 Enterprise Document Processing: Process sensitive documents securely
🏥 Healthcare Systems: Extract text from medical records privately
🏛️ Government Portals: Handle citizen documents without data leaks
📚 Education Platforms: Convert handwritten notes to digital text
💼 Business Card Readers: Extract contact information instantly
🧾 Receipt/Invoice Processing: Automate expense tracking
📖 Digital Libraries: Make scanned books searchable

✨ Core Features

🚀 100% Client-Side: All OCR processing happens in the browser - no data leaves your device
🎯 High Accuracy: Uses state-of-the-art RapidOCR and PPU PaddleOCR v4/v5 models
🌍 100+ Languages: Support for major world languages including Chinese, English, Japanese, Korean, Arabic, Hindi, Tamil, and more
📱 PWA Support: Works offline after initial load with service worker caching
🖼️ Image Preprocessing: Built-in OpenCV.js for auto-enhancement, denoising, deskewing
🔄 Auto-Rotation: Automatically detects and corrects upside-down text
📄 PDF Support: Extract text from PDFs page-by-page with detailed results
🎨 Modern UI: Beautiful, responsive interface built with React & Mantine UI
📦 Smart Caching: Models cached locally for instant subsequent use
🔧 Developer Friendly: Simple API, TypeScript support, React components
📊 Performance Monitoring: Real-time metrics and processing insights

👨‍💻 About the Author

Sivasubramanian Ramanathan

I created this module while experimenting and learning about extracting data from unstructured documents. What started as a curiosity about client-side OCR capabilities evolved into this comprehensive library that brings powerful text recognition to the browser.

Technology Stack

Frontend: React 19 + TypeScript + Vite
UI Framework: Mantine UI v8
OCR Engine: ONNX Runtime Web
Models: RapidOCR + PPU PaddleOCR (PP-OCRv4/v5)
Processing: RapidOCR techniques (CTC decoding, DB postprocessing)
PWA: Vite PWA Plugin + Workbox

Attribution & Credits

This project builds upon the excellent work of:

🏆 RapidOCR

Repository: https://github.com/RapidAI/RapidOCR
Advanced OCR implementation with multi-language support
Processing techniques and model hosting
Licensed under Apache License 2.0

🏆 PaddleOCR

Repository: https://github.com/PaddlePaddle/PaddleOCR
The state-of-the-art OCR models used in this application
Licensed under Apache License 2.0

🔥 OnnxOCR

Repository: https://github.com/jingsongliujing/OnnxOCR
ONNX model conversion and inference implementation reference
Provided the ONNX models and dictionary files

🚀 ppu-paddle-ocr

Repository: https://github.com/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr
TypeScript implementation reference
Deskew algorithm implementation inspiration

🚀 Demo

Try the live demo: https://siva-sub.github.io/client-ocr/

💡 Quick Comparison

// ❌ Cloud OCR (Privacy Risk + Costs)
const result = await fetch('https://api.service.com/ocr', {
  method: 'POST',
  body: formData, // Your sensitive data leaves your device!
  headers: { 'API-Key': 'sk-xxxxx' } // Costs money per request
});

// ❌ Tesseract.js (Slow + Large)
const worker = await Tesseract.createWorker('eng'); // 60MB+ download
const { data } = await worker.recognize(image); // Slow processing

// ✅ Client-Side OCR (Private + Fast + Free)
import { RapidOCREngine } from 'client-side-ocr';
const ocr = new RapidOCREngine({ lang: 'en' }); // 15MB total
await ocr.initialize(); // One-time setup
const result = await ocr.process(imageData); // Fast, local, private!

📦 Installation

Install from NPM

npm install client-side-ocr

yarn add client-side-ocr

pnpm add client-side-ocr

For Development

# Clone the repository
git clone https://github.com/siva-sub/client-ocr.git
cd client-ocr

# Install dependencies
npm install

# Run development server
npm run dev

# Build for production
npm run build

Quick Start

As a Library

import { createOCREngine } from 'client-side-ocr';

// Initialize the OCR engine with language selection
const ocr = createOCREngine({
  language: 'en', // or 'ch', 'fr', 'de', 'ja', 'ko', etc.
  modelVersion: 'PP-OCRv4' // or 'PP-OCRv5'
});
await ocr.initialize();

// Process an image with advanced options
const result = await ocr.processImage(imageFile, {
  enableWordSegmentation: true,
  returnConfidence: true
});
console.log(result.text);
console.log(result.confidence);
console.log(result.wordBoxes); // Word-level bounding boxes

React Component

import { RapidOCRInterface } from 'client-side-ocr/react';

function App() {
  return (
    <RapidOCRInterface 
      defaultLanguage="en"
      modelVersion="PP-OCRv4"
      onResult={(result) => console.log(result)}
    />
  );
}

Via CDN

<script type="module">
  import { createOCREngine } from 'https://unpkg.com/client-side-ocr@latest/dist/index.mjs';
  
  const ocr = createOCREngine();
  await ocr.initialize();
</script>

Documentation

📚 Comprehensive Guides

Usage Guide - Complete usage documentation with examples
API Reference - Detailed API documentation
Model Documentation - Information about available OCR models
Troubleshooting Guide - Common issues and solutions

API Overview

// Create RapidOCR engine
const ocr = createRapidOCREngine({
  language: 'en', // 'ch', 'fr', 'de', 'ja', 'ko', 'ru', 'pt', 'es', 'it', 'id', 'vi', 'fa', 'ka'
  modelVersion: 'PP-OCRv4', // or 'PP-OCRv5'
  modelType: 'mobile' // or 'server'
});

// Initialize with automatic model download
await ocr.initialize();

// Process image with RapidOCR techniques
const result = await ocr.processImage(file, {
  enableTextClassification: true,  // 180° rotation detection
  enableWordSegmentation: true,     // Word-level boxes
  preprocessConfig: {
    detectImageNetNorm: true,       // ImageNet normalization for detection
    recStandardNorm: true           // Standard normalization for recognition
  },
  postprocessConfig: {
    unclipRatio: 2.0,              // Text region expansion
    boxThresh: 0.7                  // Box confidence threshold
  }
});

// Access enhanced results
console.log(result.text);           // Extracted text
console.log(result.confidence);     // Overall confidence
console.log(result.lines);          // Text lines with individual confidence
console.log(result.wordBoxes);      // Word-level segmentation
console.log(result.angle);          // Detected text angle (0° or 180°)
console.log(result.processingTime); // Processing time breakdown by stage

For detailed API documentation, see API Reference.

Model Support

The library supports both RapidOCR and PPU PaddleOCR models with multi-language capabilities:

Supported Languages (100+)

Language	Code	RapidOCR	PPU Models	Notes
Chinese	ch	✅	✅	Simplified & Traditional
English	en	✅	✅	Full support
French	fr	✅	❌	RapidOCR only
German	de	✅	❌	RapidOCR only
Japanese	ja	✅	✅	Hiragana, Katakana, Kanji
Korean	ko	✅	✅	Hangul support
Russian	ru	✅	❌	Cyrillic script
Portuguese	pt	✅	❌	Brazilian & European
Spanish	es	✅	❌	Latin American & European
Italian	it	✅	❌	RapidOCR only
Indonesian	id	✅	❌	RapidOCR only
Vietnamese	vi	✅	❌	With tone marks
Persian	fa	✅	❌	Right-to-left support
Kannada	ka	✅	❌	Indic script support

Model Specifications

Model Component	Size	Purpose	Features
Detection	4-5MB	Text region detection	DB algorithm with unclip expansion
Recognition	8-17MB	Text recognition	CTC decoding with embedded dictionary
Classification	0.5MB	Text angle detection	0° and 180° rotation correction

Model Architecture

Detection Models: Uses DB (Differentiable Binarization) algorithm with:
- ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Dynamic resolution adjustment (multiples of 32)
- Unclip ratio for text region expansion
Recognition Models: Features include:
- CTC (Connectionist Temporal Classification) decoding
- Embedded dictionaries in model metadata
- Dynamic width calculation based on aspect ratio
- Standard normalization ((pixel/255 - 0.5) / 0.5)
- PPU models: Red channel only for grayscale, 0-based dictionary indexing
Classification Models: Text orientation detection:
- Detects 0° and 180° rotations
- Batch processing with aspect ratio sorting
- Automatic rotation correction

Model Sources

RapidOCR Models: Hosted on RapidOCR's ModelScope repository
PPU Models: Downloaded from PPU PaddleOCR repository with special preprocessing

Architecture

graph TD
    A[Image Upload] --> B[Language Selection]
    B --> C[Model Download Check]
    C -->|Not Cached| D[Download Models]
    C -->|Cached| E[Detection Preprocessing]
    D --> E
    E --> F[ONNX Detection Worker]
    F --> G[Text Classification]
    G -->|180° Detected| H[Rotate Image]
    G -->|Normal| I[Recognition Preprocessing]
    H --> I
    I --> J[ONNX Recognition Worker]
    J --> K[CTC Decoding]
    K --> L[Word Segmentation]
    L --> M[Final Output]
    
    subgraph Processing Pipeline
        E -->|ImageNet/Standard Norm| F
        I -->|Model-specific Norm| J
        K -->|Dictionary| L
    end
    
    subgraph Model Management
        C
        D
    end

Performance

Processing Speed

Average processing time: 300-1500ms (depending on image size, language, and device)
Batch processing optimization for multiple text regions
Aspect ratio sorting for efficient recognition batching

Optimizations

WebGL backend for GPU acceleration when available
Web Workers for non-blocking parallel processing
Automatic model caching with SHA256 verification
Smart preprocessing pipeline selection based on model type
Efficient memory management with typed arrays
Width limiting for PPU models to prevent memory issues

Advanced Features

Word-level segmentation: Separates Chinese characters from English/numbers
Confidence scoring: Per-character and per-line confidence metrics
Rotation detection: Automatic 180° text correction
Dynamic resolution: Adaptive image resizing for optimal accuracy
Stack overflow prevention: Safe handling of large documents

Browser Support

Chrome/Edge 90+ (recommended)
Firefox 89+
Safari 15+
Requires WebAssembly and Web Workers support

Development

Project Structure

client-ocr/
├── src/
│   ├── core/           # OCR engine and services
│   ├── workers/        # Web Workers for processing
│   ├── ui/            # React components
│   └── types/         # TypeScript definitions
├── public/
│   └── models/        # ONNX models and dictionaries
├── docs/              # Documentation
├── screenshots/       # Application screenshots
└── .github/
    └── workflows/     # GitHub Actions for deployment

Key Components

RapidOCREngine: Main OCR orchestrator with multi-language support
PPUModelHandler: Special handling for PPU PaddleOCR models
DetPreProcess: Detection preprocessing with model-specific normalization
RecPreProcess: Recognition preprocessing with dynamic width calculation
ClsPreProcess: Classification preprocessing for rotation detection
CTCLabelDecode: CTC decoding with word segmentation
DBPostProcess: DB postprocessing with unclip expansion
ModelDownloader: Automatic model fetching from multiple sources
MetaONNXLoader: Extract embedded dictionaries from models

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Special thanks to:

The RapidAI team for RapidOCR and model hosting
The PaddlePaddle team for creating PaddleOCR
The OnnxOCR project for ONNX conversion tools
The ppu-paddle-ocr team for TypeScript implementation reference
The open-source community for making this possible

🚀 What's New in v2.0

RapidOCR Integration: Complete integration with RapidOCR processing pipeline
PPU Model Support: Added support for PPU PaddleOCR models with special preprocessing
100+ Language Support: Extended language support beyond the original 14
Advanced Processing: CTC decoding, DB postprocessing, and word segmentation
Model Auto-Download: Automatic model fetching with progress tracking
Embedded Dictionaries: Models now include character dictionaries in metadata
Improved Accuracy: Better preprocessing with proper normalization techniques
Batch Optimization: Aspect ratio sorting for efficient batch processing
Stack Overflow Prevention: Safe handling of large documents without memory issues
Enhanced UI: Modern interface with tabs for OCR, preprocessing, and performance

Made with ❤️ by Sivasubramanian Ramanathan

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
docs		docs
documentation		documentation
examples		examples
public		public
screenshots		screenshots
scripts		scripts
src		src
.gitignore		.gitignore
.npmignore		.npmignore
CHANGELOG.md		CHANGELOG.md
DEPLOYMENT.md		DEPLOYMENT.md
DEPLOYMENT_SUMMARY.md		DEPLOYMENT_SUMMARY.md
LICENSE		LICENSE
MODELS.md		MODELS.md
RAPIDOCR_FEATURE_IMPLEMENTATION_PLAN.md		RAPIDOCR_FEATURE_IMPLEMENTATION_PLAN.md
README.md		README.md
README.npm.md		README.npm.md
USAGE.md		USAGE.md
eslint.config.js		eslint.config.js
fix-all-errors.sh		fix-all-errors.sh
fix-all-remaining-errors.sh		fix-all-remaining-errors.sh
fix-build-errors.sh		fix-build-errors.sh
fix-final-errors.sh		fix-final-errors.sh
fix-type-imports.sh		fix-type-imports.sh
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
test-detection-fixed.html		test-detection-fixed.html
test-detection.html		test-detection.html
test-image.txt		test-image.txt
test-ocr-direct.html		test-ocr-direct.html
test-ocr-image.png		test-ocr-image.png
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.gh-pages.ts		vite.config.gh-pages.ts
vite.config.lib.ts		vite.config.lib.ts
vite.config.ts		vite.config.ts

License

siva-sub/client-ocr

Folders and files

Latest commit

History

Repository files navigation