Paperless-ngx uses an OCR engine that is not particularly good with languages like chinese, korean and especially seems to perform badly when multiple languages are present in the same document.
Multiple language in the same document is extremely common in Hong Kong.
Could doclytics be a bridge to apply LLMs to do the OCR instead of the built in (or overwrite)?
For example the new model available called minicpm-v is capable of OCR in multiple languages
Thanks!