PDF OCR Tool

Turn scanned PDFs into searchable PDFs. Works with any PDF that contains images or scanned pages.

What you need

Python 3.6+
Tesseract OCR:
- Windows: Download from https://github.com/UB-Mannheim/tesseract/wiki
- macOS: brew install tesseract
- Linux: sudo apt-get install tesseract-ocr

Setup

pip install -r requirements.txt

How to use

Basic:

python ocr_pdf.py your_file.pdf

Creates your_file_searchable.pdf

With GUI:

python ocr_gui.py

Multiple files:

python ocr_pdf.py file1.pdf file2.pdf file3.pdf

Different language (German example):

python ocr_pdf.py document.pdf -l deu

Custom output location:

python ocr_pdf.py scan.pdf -o /path/to/output.pdf

Languages

Common language codes:

eng - English
deu - German
fra - French
spa - Spanish
ita - Italian
por - Portuguese
chi_sim - Chinese
jpn - Japanese
kor - Korean

For multiple languages use + like: eng+deu+fra

File sizes

The tool automatically optimizes output size.

To disable optimization: python ocr_pdf.py file.pdf --no-optimize

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
__pycache__		__pycache__
=1.23.0		=1.23.0
README.md		README.md
error_handlers.py		error_handlers.py
ocr_gui.py		ocr_gui.py
ocr_pdf.py		ocr_pdf.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PDF OCR Tool

What you need

Setup

How to use

Languages

File sizes

About

Uh oh!

Releases

Packages

Languages

rzadevv/OCRpdf

Folders and files

Latest commit

History

Repository files navigation

PDF OCR Tool

What you need

Setup

How to use

Languages

File sizes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages