Codestin Search App

s0wa48 · 2026-02-27T18:45:56Z

Summary

The issue was that users specifying languages=["gr"] for Greek language PDFs were getting incorrect OCR output because "gr" (the ISO 3166-1 alpha-2 country code for Greece) was not mapped to the Tesseract language code "ell".
Similarly, "el" (the ISO 639-1 language code for Modern Greek) was also not mapped.
This fix adds both "gr" and "el" as aliases in TESSERACT_LANGUAGES_AND_CODES that map to "ell", which is the correct Tesseract code for Modern Greek.
This ensures that users who pass either languages=["gr"] or languages=["el"] will get proper Greek OCR processing.

This PR was auto-generated by Gittensor bot using Claude AI to fix a reported issue.

fix: add 'el' and 'gr' as Greek language code aliases for Tesseract OCR

348e8f4

s0wa48 force-pushed the fix/issue-2939-text-extraction-issue-greek-l branch from d24b157 to 348e8f4 Compare February 27, 2026 18:46