Thanks to visit codestin.com
Credit goes to github.com

Skip to content

siva-sub/client-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Client-Side OCR with ONNX Runtime

Extract text from images directly in your browser - no server required! Now with RapidOCR and PPU PaddleOCR integration for 100+ languages!

npm version npm downloads License Demo GitHub stars

Live Demo | NPM Package | Documentation | API Reference | Troubleshooting


A high-performance, privacy-focused OCR solution that runs entirely in the browser using ONNX Runtime with both RapidOCR and PPU PaddleOCR models. Process text from images and PDF documents without sending data to any server - everything happens locally on your device. Supporting 100+ languages with state-of-the-art accuracy!

📸 Screenshots

Main Interface Preprocessing Options Performance Metrics
Main Interface Preprocessing Performance

🚀 Why Choose Client-Side OCR?

🔒 Complete Privacy & Security

Unlike cloud-based OCR services (Google Vision, AWS Textract, Azure OCR), your sensitive documents never leave your device. Perfect for:

  • 📄 Legal documents & contracts
  • 💳 Financial statements & invoices
  • 🏥 Medical records
  • 🆔 Personal IDs & passports
  • 🔐 Confidential business documents

💰 Zero Costs, Unlimited Usage

  • No API fees: Save thousands compared to cloud OCR services
  • No rate limits: Process unlimited documents
  • No subscriptions: One-time integration, lifetime usage
  • No surprises: Predictable performance, no service outages

Superior Performance

  • Instant results: No network latency (avg 300-1500ms)
  • Offline capable: Works without internet after initial load
  • GPU acceleration: Uses WebGL for faster processing
  • Batch optimization: Process multiple regions efficiently

🎯 How It's Different

Feature Client-Side OCR Cloud OCR (Google/AWS) Tesseract.js
Privacy ✅ 100% local ❌ Data sent to servers ✅ Local
Cost ✅ Free forever ❌ Pay per request ✅ Free
Languages ✅ 100+ built-in ✅ Many ⚠️ Manual setup
Performance ✅ Fast (ONNX) ⚠️ Network dependent ❌ Slow
Accuracy ✅ State-of-art ✅ High ⚠️ Good
Setup ✅ Simple npm install ❌ Complex API setup ⚠️ Large models
Preprocessing ✅ Built-in OpenCV ⚠️ Limited ❌ Basic
Model Size ✅ 15-30MB total N/A ❌ 60MB+ per language
Offline ✅ Full support ❌ Requires internet ✅ Supported

🎨 Advanced Features Not Found Elsewhere

  • 🖼️ Smart Preprocessing: Built-in OpenCV.js for image enhancement
  • 🔄 Auto-rotation: Detects and corrects upside-down text
  • 📊 Confidence scores: Get reliability metrics for each word
  • 🔤 Word segmentation: Separate text into individual words
  • 📱 Mobile optimized: Responsive design with camera capture
  • 🚀 Progressive Web App: Install as native app on any device
  • 🎯 Multiple Model Support: Choose between RapidOCR and PPU models

🎯 Real-World Use Cases

Perfect for Applications That Need:

  • 📱 Document Scanner Apps: Build mobile/web document scanners
  • 🏢 Enterprise Document Processing: Process sensitive documents securely
  • 🏥 Healthcare Systems: Extract text from medical records privately
  • 🏛️ Government Portals: Handle citizen documents without data leaks
  • 📚 Education Platforms: Convert handwritten notes to digital text
  • 💼 Business Card Readers: Extract contact information instantly
  • 🧾 Receipt/Invoice Processing: Automate expense tracking
  • 📖 Digital Libraries: Make scanned books searchable

✨ Core Features

  • 🚀 100% Client-Side: All OCR processing happens in the browser - no data leaves your device
  • 🎯 High Accuracy: Uses state-of-the-art RapidOCR and PPU PaddleOCR v4/v5 models
  • 🌍 100+ Languages: Support for major world languages including Chinese, English, Japanese, Korean, Arabic, Hindi, Tamil, and more
  • 📱 PWA Support: Works offline after initial load with service worker caching
  • 🖼️ Image Preprocessing: Built-in OpenCV.js for auto-enhancement, denoising, deskewing
  • 🔄 Auto-Rotation: Automatically detects and corrects upside-down text
  • 📄 PDF Support: Extract text from PDFs page-by-page with detailed results
  • 🎨 Modern UI: Beautiful, responsive interface built with React & Mantine UI
  • 📦 Smart Caching: Models cached locally for instant subsequent use
  • 🔧 Developer Friendly: Simple API, TypeScript support, React components
  • 📊 Performance Monitoring: Real-time metrics and processing insights

👨‍💻 About the Author

Sivasubramanian Ramanathan

I created this module while experimenting and learning about extracting data from unstructured documents. What started as a curiosity about client-side OCR capabilities evolved into this comprehensive library that brings powerful text recognition to the browser.

LinkedIn GitHub Email Website

Technology Stack

  • Frontend: React 19 + TypeScript + Vite
  • UI Framework: Mantine UI v8
  • OCR Engine: ONNX Runtime Web
  • Models: RapidOCR + PPU PaddleOCR (PP-OCRv4/v5)
  • Processing: RapidOCR techniques (CTC decoding, DB postprocessing)
  • PWA: Vite PWA Plugin + Workbox

Attribution & Credits

This project builds upon the excellent work of:

🏆 RapidOCR

  • Repository: https://github.com/RapidAI/RapidOCR
  • Advanced OCR implementation with multi-language support
  • Processing techniques and model hosting
  • Licensed under Apache License 2.0

🏆 PaddleOCR

🔥 OnnxOCR

🚀 ppu-paddle-ocr

🚀 Demo

Try the live demo: https://siva-sub.github.io/client-ocr/

💡 Quick Comparison

// ❌ Cloud OCR (Privacy Risk + Costs)
const result = await fetch('https://api.service.com/ocr', {
  method: 'POST',
  body: formData, // Your sensitive data leaves your device!
  headers: { 'API-Key': 'sk-xxxxx' } // Costs money per request
});

// ❌ Tesseract.js (Slow + Large)
const worker = await Tesseract.createWorker('eng'); // 60MB+ download
const { data } = await worker.recognize(image); // Slow processing

// ✅ Client-Side OCR (Private + Fast + Free)
import { RapidOCREngine } from 'client-side-ocr';
const ocr = new RapidOCREngine({ lang: 'en' }); // 15MB total
await ocr.initialize(); // One-time setup
const result = await ocr.process(imageData); // Fast, local, private!

📦 Installation

Install from NPM

npm install client-side-ocr
yarn add client-side-ocr
pnpm add client-side-ocr

NPM

For Development

# Clone the repository
git clone https://github.com/siva-sub/client-ocr.git
cd client-ocr

# Install dependencies
npm install

# Run development server
npm run dev

# Build for production
npm run build

Quick Start

As a Library

import { createOCREngine } from 'client-side-ocr';

// Initialize the OCR engine with language selection
const ocr = createOCREngine({
  language: 'en', // or 'ch', 'fr', 'de', 'ja', 'ko', etc.
  modelVersion: 'PP-OCRv4' // or 'PP-OCRv5'
});
await ocr.initialize();

// Process an image with advanced options
const result = await ocr.processImage(imageFile, {
  enableWordSegmentation: true,
  returnConfidence: true
});
console.log(result.text);
console.log(result.confidence);
console.log(result.wordBoxes); // Word-level bounding boxes

React Component

import { RapidOCRInterface } from 'client-side-ocr/react';

function App() {
  return (
    <RapidOCRInterface 
      defaultLanguage="en"
      modelVersion="PP-OCRv4"
      onResult={(result) => console.log(result)}
    />
  );
}

Via CDN

<script type="module">
  import { createOCREngine } from 'https://unpkg.com/client-side-ocr@latest/dist/index.mjs';
  
  const ocr = createOCREngine();
  await ocr.initialize();
</script>

Documentation

📚 Comprehensive Guides

API Overview

// Create RapidOCR engine
const ocr = createRapidOCREngine({
  language: 'en', // 'ch', 'fr', 'de', 'ja', 'ko', 'ru', 'pt', 'es', 'it', 'id', 'vi', 'fa', 'ka'
  modelVersion: 'PP-OCRv4', // or 'PP-OCRv5'
  modelType: 'mobile' // or 'server'
});

// Initialize with automatic model download
await ocr.initialize();

// Process image with RapidOCR techniques
const result = await ocr.processImage(file, {
  enableTextClassification: true,  // 180° rotation detection
  enableWordSegmentation: true,     // Word-level boxes
  preprocessConfig: {
    detectImageNetNorm: true,       // ImageNet normalization for detection
    recStandardNorm: true           // Standard normalization for recognition
  },
  postprocessConfig: {
    unclipRatio: 2.0,              // Text region expansion
    boxThresh: 0.7                  // Box confidence threshold
  }
});

// Access enhanced results
console.log(result.text);           // Extracted text
console.log(result.confidence);     // Overall confidence
console.log(result.lines);          // Text lines with individual confidence
console.log(result.wordBoxes);      // Word-level segmentation
console.log(result.angle);          // Detected text angle (0° or 180°)
console.log(result.processingTime); // Processing time breakdown by stage

For detailed API documentation, see API Reference.

Model Support

The library supports both RapidOCR and PPU PaddleOCR models with multi-language capabilities:

Supported Languages (100+)

Language Code RapidOCR PPU Models Notes
Chinese ch Simplified & Traditional
English en Full support
French fr RapidOCR only
German de RapidOCR only
Japanese ja Hiragana, Katakana, Kanji
Korean ko Hangul support
Russian ru Cyrillic script
Portuguese pt Brazilian & European
Spanish es Latin American & European
Italian it RapidOCR only
Indonesian id RapidOCR only
Vietnamese vi With tone marks
Persian fa Right-to-left support
Kannada ka Indic script support

Model Specifications

Model Component Size Purpose Features
Detection 4-5MB Text region detection DB algorithm with unclip expansion
Recognition 8-17MB Text recognition CTC decoding with embedded dictionary
Classification 0.5MB Text angle detection 0° and 180° rotation correction

Model Architecture

  • Detection Models: Uses DB (Differentiable Binarization) algorithm with:

    • ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    • Dynamic resolution adjustment (multiples of 32)
    • Unclip ratio for text region expansion
  • Recognition Models: Features include:

    • CTC (Connectionist Temporal Classification) decoding
    • Embedded dictionaries in model metadata
    • Dynamic width calculation based on aspect ratio
    • Standard normalization ((pixel/255 - 0.5) / 0.5)
    • PPU models: Red channel only for grayscale, 0-based dictionary indexing
  • Classification Models: Text orientation detection:

    • Detects 0° and 180° rotations
    • Batch processing with aspect ratio sorting
    • Automatic rotation correction

Model Sources

  • RapidOCR Models: Hosted on RapidOCR's ModelScope repository
  • PPU Models: Downloaded from PPU PaddleOCR repository with special preprocessing

Architecture

graph TD
    A[Image Upload] --> B[Language Selection]
    B --> C[Model Download Check]
    C -->|Not Cached| D[Download Models]
    C -->|Cached| E[Detection Preprocessing]
    D --> E
    E --> F[ONNX Detection Worker]
    F --> G[Text Classification]
    G -->|180° Detected| H[Rotate Image]
    G -->|Normal| I[Recognition Preprocessing]
    H --> I
    I --> J[ONNX Recognition Worker]
    J --> K[CTC Decoding]
    K --> L[Word Segmentation]
    L --> M[Final Output]
    
    subgraph Processing Pipeline
        E -->|ImageNet/Standard Norm| F
        I -->|Model-specific Norm| J
        K -->|Dictionary| L
    end
    
    subgraph Model Management
        C
        D
    end
Loading

Performance

Processing Speed

  • Average processing time: 300-1500ms (depending on image size, language, and device)
  • Batch processing optimization for multiple text regions
  • Aspect ratio sorting for efficient recognition batching

Optimizations

  • WebGL backend for GPU acceleration when available
  • Web Workers for non-blocking parallel processing
  • Automatic model caching with SHA256 verification
  • Smart preprocessing pipeline selection based on model type
  • Efficient memory management with typed arrays
  • Width limiting for PPU models to prevent memory issues

Advanced Features

  • Word-level segmentation: Separates Chinese characters from English/numbers
  • Confidence scoring: Per-character and per-line confidence metrics
  • Rotation detection: Automatic 180° text correction
  • Dynamic resolution: Adaptive image resizing for optimal accuracy
  • Stack overflow prevention: Safe handling of large documents

Browser Support

  • Chrome/Edge 90+ (recommended)
  • Firefox 89+
  • Safari 15+
  • Requires WebAssembly and Web Workers support

Development

Project Structure

client-ocr/
├── src/
│   ├── core/           # OCR engine and services
│   ├── workers/        # Web Workers for processing
│   ├── ui/            # React components
│   └── types/         # TypeScript definitions
├── public/
│   └── models/        # ONNX models and dictionaries
├── docs/              # Documentation
├── screenshots/       # Application screenshots
└── .github/
    └── workflows/     # GitHub Actions for deployment

Key Components

  • RapidOCREngine: Main OCR orchestrator with multi-language support
  • PPUModelHandler: Special handling for PPU PaddleOCR models
  • DetPreProcess: Detection preprocessing with model-specific normalization
  • RecPreProcess: Recognition preprocessing with dynamic width calculation
  • ClsPreProcess: Classification preprocessing for rotation detection
  • CTCLabelDecode: CTC decoding with word segmentation
  • DBPostProcess: DB postprocessing with unclip expansion
  • ModelDownloader: Automatic model fetching from multiple sources
  • MetaONNXLoader: Extract embedded dictionaries from models

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

MIT License - see LICENSE file for details

🙏 Acknowledgments

Special thanks to:

  • The RapidAI team for RapidOCR and model hosting
  • The PaddlePaddle team for creating PaddleOCR
  • The OnnxOCR project for ONNX conversion tools
  • The ppu-paddle-ocr team for TypeScript implementation reference
  • The open-source community for making this possible

🚀 What's New in v2.0

  • RapidOCR Integration: Complete integration with RapidOCR processing pipeline
  • PPU Model Support: Added support for PPU PaddleOCR models with special preprocessing
  • 100+ Language Support: Extended language support beyond the original 14
  • Advanced Processing: CTC decoding, DB postprocessing, and word segmentation
  • Model Auto-Download: Automatic model fetching with progress tracking
  • Embedded Dictionaries: Models now include character dictionaries in metadata
  • Improved Accuracy: Better preprocessing with proper normalization techniques
  • Batch Optimization: Aspect ratio sorting for efficient batch processing
  • Stack Overflow Prevention: Safe handling of large documents without memory issues
  • Enhanced UI: Modern interface with tabs for OCR, preprocessing, and performance

Made with ❤️ by Sivasubramanian Ramanathan

npm version GitHub stars

About

High-performance client-side OCR with ONNX Runtime and PaddleOCR models

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •