Extract text from images directly in your browser - no server required! Now with RapidOCR and PPU PaddleOCR integration for 100+ languages!
Live Demo | NPM Package | Documentation | API Reference | Troubleshooting
A high-performance, privacy-focused OCR solution that runs entirely in the browser using ONNX Runtime with both RapidOCR and PPU PaddleOCR models. Process text from images and PDF documents without sending data to any server - everything happens locally on your device. Supporting 100+ languages with state-of-the-art accuracy!
Unlike cloud-based OCR services (Google Vision, AWS Textract, Azure OCR), your sensitive documents never leave your device. Perfect for:
- 📄 Legal documents & contracts
- 💳 Financial statements & invoices
- 🏥 Medical records
- 🆔 Personal IDs & passports
- 🔐 Confidential business documents
- No API fees: Save thousands compared to cloud OCR services
- No rate limits: Process unlimited documents
- No subscriptions: One-time integration, lifetime usage
- No surprises: Predictable performance, no service outages
- Instant results: No network latency (avg 300-1500ms)
- Offline capable: Works without internet after initial load
- GPU acceleration: Uses WebGL for faster processing
- Batch optimization: Process multiple regions efficiently
| Feature | Client-Side OCR | Cloud OCR (Google/AWS) | Tesseract.js |
|---|---|---|---|
| Privacy | ✅ 100% local | ❌ Data sent to servers | ✅ Local |
| Cost | ✅ Free forever | ❌ Pay per request | ✅ Free |
| Languages | ✅ 100+ built-in | ✅ Many | |
| Performance | ✅ Fast (ONNX) | ❌ Slow | |
| Accuracy | ✅ State-of-art | ✅ High | |
| Setup | ✅ Simple npm install | ❌ Complex API setup | |
| Preprocessing | ✅ Built-in OpenCV | ❌ Basic | |
| Model Size | ✅ 15-30MB total | N/A | ❌ 60MB+ per language |
| Offline | ✅ Full support | ❌ Requires internet | ✅ Supported |
- 🖼️ Smart Preprocessing: Built-in OpenCV.js for image enhancement
- 🔄 Auto-rotation: Detects and corrects upside-down text
- 📊 Confidence scores: Get reliability metrics for each word
- 🔤 Word segmentation: Separate text into individual words
- 📱 Mobile optimized: Responsive design with camera capture
- 🚀 Progressive Web App: Install as native app on any device
- 🎯 Multiple Model Support: Choose between RapidOCR and PPU models
- 📱 Document Scanner Apps: Build mobile/web document scanners
- 🏢 Enterprise Document Processing: Process sensitive documents securely
- 🏥 Healthcare Systems: Extract text from medical records privately
- 🏛️ Government Portals: Handle citizen documents without data leaks
- 📚 Education Platforms: Convert handwritten notes to digital text
- 💼 Business Card Readers: Extract contact information instantly
- 🧾 Receipt/Invoice Processing: Automate expense tracking
- 📖 Digital Libraries: Make scanned books searchable
- 🚀 100% Client-Side: All OCR processing happens in the browser - no data leaves your device
- 🎯 High Accuracy: Uses state-of-the-art RapidOCR and PPU PaddleOCR v4/v5 models
- 🌍 100+ Languages: Support for major world languages including Chinese, English, Japanese, Korean, Arabic, Hindi, Tamil, and more
- 📱 PWA Support: Works offline after initial load with service worker caching
- 🖼️ Image Preprocessing: Built-in OpenCV.js for auto-enhancement, denoising, deskewing
- 🔄 Auto-Rotation: Automatically detects and corrects upside-down text
- 📄 PDF Support: Extract text from PDFs page-by-page with detailed results
- 🎨 Modern UI: Beautiful, responsive interface built with React & Mantine UI
- 📦 Smart Caching: Models cached locally for instant subsequent use
- 🔧 Developer Friendly: Simple API, TypeScript support, React components
- 📊 Performance Monitoring: Real-time metrics and processing insights
Sivasubramanian Ramanathan
I created this module while experimenting and learning about extracting data from unstructured documents. What started as a curiosity about client-side OCR capabilities evolved into this comprehensive library that brings powerful text recognition to the browser.
- Frontend: React 19 + TypeScript + Vite
- UI Framework: Mantine UI v8
- OCR Engine: ONNX Runtime Web
- Models: RapidOCR + PPU PaddleOCR (PP-OCRv4/v5)
- Processing: RapidOCR techniques (CTC decoding, DB postprocessing)
- PWA: Vite PWA Plugin + Workbox
This project builds upon the excellent work of:
- Repository: https://github.com/RapidAI/RapidOCR
- Advanced OCR implementation with multi-language support
- Processing techniques and model hosting
- Licensed under Apache License 2.0
- Repository: https://github.com/PaddlePaddle/PaddleOCR
- The state-of-the-art OCR models used in this application
- Licensed under Apache License 2.0
- Repository: https://github.com/jingsongliujing/OnnxOCR
- ONNX model conversion and inference implementation reference
- Provided the ONNX models and dictionary files
- Repository: https://github.com/PT-Perkasa-Pilar-Utama/ppu-paddle-ocr
- TypeScript implementation reference
- Deskew algorithm implementation inspiration
Try the live demo: https://siva-sub.github.io/client-ocr/
// ❌ Cloud OCR (Privacy Risk + Costs)
const result = await fetch('https://api.service.com/ocr', {
method: 'POST',
body: formData, // Your sensitive data leaves your device!
headers: { 'API-Key': 'sk-xxxxx' } // Costs money per request
});
// ❌ Tesseract.js (Slow + Large)
const worker = await Tesseract.createWorker('eng'); // 60MB+ download
const { data } = await worker.recognize(image); // Slow processing
// ✅ Client-Side OCR (Private + Fast + Free)
import { RapidOCREngine } from 'client-side-ocr';
const ocr = new RapidOCREngine({ lang: 'en' }); // 15MB total
await ocr.initialize(); // One-time setup
const result = await ocr.process(imageData); // Fast, local, private!# Clone the repository
git clone https://github.com/siva-sub/client-ocr.git
cd client-ocr
# Install dependencies
npm install
# Run development server
npm run dev
# Build for production
npm run buildimport { createOCREngine } from 'client-side-ocr';
// Initialize the OCR engine with language selection
const ocr = createOCREngine({
language: 'en', // or 'ch', 'fr', 'de', 'ja', 'ko', etc.
modelVersion: 'PP-OCRv4' // or 'PP-OCRv5'
});
await ocr.initialize();
// Process an image with advanced options
const result = await ocr.processImage(imageFile, {
enableWordSegmentation: true,
returnConfidence: true
});
console.log(result.text);
console.log(result.confidence);
console.log(result.wordBoxes); // Word-level bounding boxesimport { RapidOCRInterface } from 'client-side-ocr/react';
function App() {
return (
<RapidOCRInterface
defaultLanguage="en"
modelVersion="PP-OCRv4"
onResult={(result) => console.log(result)}
/>
);
}<script type="module">
import { createOCREngine } from 'https://unpkg.com/client-side-ocr@latest/dist/index.mjs';
const ocr = createOCREngine();
await ocr.initialize();
</script>- Usage Guide - Complete usage documentation with examples
- API Reference - Detailed API documentation
- Model Documentation - Information about available OCR models
- Troubleshooting Guide - Common issues and solutions
// Create RapidOCR engine
const ocr = createRapidOCREngine({
language: 'en', // 'ch', 'fr', 'de', 'ja', 'ko', 'ru', 'pt', 'es', 'it', 'id', 'vi', 'fa', 'ka'
modelVersion: 'PP-OCRv4', // or 'PP-OCRv5'
modelType: 'mobile' // or 'server'
});
// Initialize with automatic model download
await ocr.initialize();
// Process image with RapidOCR techniques
const result = await ocr.processImage(file, {
enableTextClassification: true, // 180° rotation detection
enableWordSegmentation: true, // Word-level boxes
preprocessConfig: {
detectImageNetNorm: true, // ImageNet normalization for detection
recStandardNorm: true // Standard normalization for recognition
},
postprocessConfig: {
unclipRatio: 2.0, // Text region expansion
boxThresh: 0.7 // Box confidence threshold
}
});
// Access enhanced results
console.log(result.text); // Extracted text
console.log(result.confidence); // Overall confidence
console.log(result.lines); // Text lines with individual confidence
console.log(result.wordBoxes); // Word-level segmentation
console.log(result.angle); // Detected text angle (0° or 180°)
console.log(result.processingTime); // Processing time breakdown by stageFor detailed API documentation, see API Reference.
The library supports both RapidOCR and PPU PaddleOCR models with multi-language capabilities:
| Language | Code | RapidOCR | PPU Models | Notes |
|---|---|---|---|---|
| Chinese | ch | ✅ | ✅ | Simplified & Traditional |
| English | en | ✅ | ✅ | Full support |
| French | fr | ✅ | ❌ | RapidOCR only |
| German | de | ✅ | ❌ | RapidOCR only |
| Japanese | ja | ✅ | ✅ | Hiragana, Katakana, Kanji |
| Korean | ko | ✅ | ✅ | Hangul support |
| Russian | ru | ✅ | ❌ | Cyrillic script |
| Portuguese | pt | ✅ | ❌ | Brazilian & European |
| Spanish | es | ✅ | ❌ | Latin American & European |
| Italian | it | ✅ | ❌ | RapidOCR only |
| Indonesian | id | ✅ | ❌ | RapidOCR only |
| Vietnamese | vi | ✅ | ❌ | With tone marks |
| Persian | fa | ✅ | ❌ | Right-to-left support |
| Kannada | ka | ✅ | ❌ | Indic script support |
| Model Component | Size | Purpose | Features |
|---|---|---|---|
| Detection | 4-5MB | Text region detection | DB algorithm with unclip expansion |
| Recognition | 8-17MB | Text recognition | CTC decoding with embedded dictionary |
| Classification | 0.5MB | Text angle detection | 0° and 180° rotation correction |
-
Detection Models: Uses DB (Differentiable Binarization) algorithm with:
- ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
- Dynamic resolution adjustment (multiples of 32)
- Unclip ratio for text region expansion
-
Recognition Models: Features include:
- CTC (Connectionist Temporal Classification) decoding
- Embedded dictionaries in model metadata
- Dynamic width calculation based on aspect ratio
- Standard normalization ((pixel/255 - 0.5) / 0.5)
- PPU models: Red channel only for grayscale, 0-based dictionary indexing
-
Classification Models: Text orientation detection:
- Detects 0° and 180° rotations
- Batch processing with aspect ratio sorting
- Automatic rotation correction
- RapidOCR Models: Hosted on RapidOCR's ModelScope repository
- PPU Models: Downloaded from PPU PaddleOCR repository with special preprocessing
graph TD
A[Image Upload] --> B[Language Selection]
B --> C[Model Download Check]
C -->|Not Cached| D[Download Models]
C -->|Cached| E[Detection Preprocessing]
D --> E
E --> F[ONNX Detection Worker]
F --> G[Text Classification]
G -->|180° Detected| H[Rotate Image]
G -->|Normal| I[Recognition Preprocessing]
H --> I
I --> J[ONNX Recognition Worker]
J --> K[CTC Decoding]
K --> L[Word Segmentation]
L --> M[Final Output]
subgraph Processing Pipeline
E -->|ImageNet/Standard Norm| F
I -->|Model-specific Norm| J
K -->|Dictionary| L
end
subgraph Model Management
C
D
end
- Average processing time: 300-1500ms (depending on image size, language, and device)
- Batch processing optimization for multiple text regions
- Aspect ratio sorting for efficient recognition batching
- WebGL backend for GPU acceleration when available
- Web Workers for non-blocking parallel processing
- Automatic model caching with SHA256 verification
- Smart preprocessing pipeline selection based on model type
- Efficient memory management with typed arrays
- Width limiting for PPU models to prevent memory issues
- Word-level segmentation: Separates Chinese characters from English/numbers
- Confidence scoring: Per-character and per-line confidence metrics
- Rotation detection: Automatic 180° text correction
- Dynamic resolution: Adaptive image resizing for optimal accuracy
- Stack overflow prevention: Safe handling of large documents
- Chrome/Edge 90+ (recommended)
- Firefox 89+
- Safari 15+
- Requires WebAssembly and Web Workers support
client-ocr/
├── src/
│ ├── core/ # OCR engine and services
│ ├── workers/ # Web Workers for processing
│ ├── ui/ # React components
│ └── types/ # TypeScript definitions
├── public/
│ └── models/ # ONNX models and dictionaries
├── docs/ # Documentation
├── screenshots/ # Application screenshots
└── .github/
└── workflows/ # GitHub Actions for deployment
RapidOCREngine: Main OCR orchestrator with multi-language supportPPUModelHandler: Special handling for PPU PaddleOCR modelsDetPreProcess: Detection preprocessing with model-specific normalizationRecPreProcess: Recognition preprocessing with dynamic width calculationClsPreProcess: Classification preprocessing for rotation detectionCTCLabelDecode: CTC decoding with word segmentationDBPostProcess: DB postprocessing with unclip expansionModelDownloader: Automatic model fetching from multiple sourcesMetaONNXLoader: Extract embedded dictionaries from models
Contributions are welcome! Please feel free to submit a Pull Request.
MIT License - see LICENSE file for details
Special thanks to:
- The RapidAI team for RapidOCR and model hosting
- The PaddlePaddle team for creating PaddleOCR
- The OnnxOCR project for ONNX conversion tools
- The ppu-paddle-ocr team for TypeScript implementation reference
- The open-source community for making this possible
- RapidOCR Integration: Complete integration with RapidOCR processing pipeline
- PPU Model Support: Added support for PPU PaddleOCR models with special preprocessing
- 100+ Language Support: Extended language support beyond the original 14
- Advanced Processing: CTC decoding, DB postprocessing, and word segmentation
- Model Auto-Download: Automatic model fetching with progress tracking
- Embedded Dictionaries: Models now include character dictionaries in metadata
- Improved Accuracy: Better preprocessing with proper normalization techniques
- Batch Optimization: Aspect ratio sorting for efficient batch processing
- Stack Overflow Prevention: Safe handling of large documents without memory issues
- Enhanced UI: Modern interface with tabs for OCR, preprocessing, and performance
Made with ❤️ by Sivasubramanian Ramanathan