This open OCR model is designed to recognize text in Classical, Western, and Eastern Armenian. It has been trained on a diverse range of documents, including noisy texts, historical fonts, and Armenian newspapers. The model is optimized for use with Tesseract-ocr (refer to the documentation for installation instructions).
To know more, see blog announcement
This model relies on Tesseract's default layout analysis engine and does not perform layout analysis or post-processing.
Put the model inside the tesseract models folder (e.g. /usr/share/tessdata on Linux).Place the model file in the tesseract models directory (e.g. on a Linux system, this directory is typically located at /usr/share/tessdata).
In CLI:
tesseract input_image output_file -l hye-calfa-n --dpi 300or using python wrapper pytesseract:
import pytesseract
from PIL import Image
image = Image.open('path/to/image/file').convert("RGB")
text = pytesseract.image_to_string(image, lang='hye-calfa-n', config='--dpi 300')