Thanks to visit codestin.com
Credit goes to github.com

Skip to content

calfa-co/hye-tesseract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Open OCR Model for Armenian

This open OCR model is designed to recognize text in Classical, Western, and Eastern Armenian. It has been trained on a diverse range of documents, including noisy texts, historical fonts, and Armenian newspapers. The model is optimized for use with Tesseract-ocr (refer to the documentation for installation instructions).

To know more, see blog announcement

This model relies on Tesseract's default layout analysis engine and does not perform layout analysis or post-processing.

How to use

Put the model inside the tesseract models folder (e.g. /usr/share/tessdata on Linux).Place the model file in the tesseract models directory (e.g. on a Linux system, this directory is typically located at /usr/share/tessdata).

In CLI:

tesseract input_image output_file -l hye-calfa-n --dpi 300

or using python wrapper pytesseract:

import pytesseract
from PIL import Image
image = Image.open('path/to/image/file').convert("RGB")
text = pytesseract.image_to_string(image, lang='hye-calfa-n', config='--dpi 300')