Open OCR Model for Armenian

This open OCR model is designed to recognize text in Classical, Western, and Eastern Armenian. It has been trained on a diverse range of documents, including noisy texts, historical fonts, and Armenian newspapers. The model is optimized for use with Tesseract-ocr (refer to the documentation for installation instructions).

To know more, see blog announcement

This model relies on Tesseract's default layout analysis engine and does not perform layout analysis or post-processing.

How to use

Put the model inside the tesseract models folder (e.g. /usr/share/tessdata on Linux).Place the model file in the tesseract models directory (e.g. on a Linux system, this directory is typically located at /usr/share/tessdata).

In CLI:

tesseract input_image output_file -l hye-calfa-n --dpi 300

or using python wrapper pytesseract:

import pytesseract
from PIL import Image
image = Image.open('path/to/image/file').convert("RGB")
text = pytesseract.image_to_string(image, lang='hye-calfa-n', config='--dpi 300')

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
hye-calfa-n.traineddata		hye-calfa-n.traineddata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Open OCR Model for Armenian

How to use

About

Uh oh!

Releases 1

calfa-co/hye-tesseract

Folders and files

Latest commit

History

Repository files navigation

Open OCR Model for Armenian

How to use

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1