Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Using doclytics to update the "Content" of a document #97

@hermanmak

Description

@hermanmak

Paperless-ngx uses an OCR engine that is not particularly good with languages like chinese, korean and especially seems to perform badly when multiple languages are present in the same document.

Multiple language in the same document is extremely common in Hong Kong.

Could doclytics be a bridge to apply LLMs to do the OCR instead of the built in (or overwrite)?
For example the new model available called minicpm-v is capable of OCR in multiple languages

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions