Codestin Search App

Ocean-OCR

Introduction

Multimodal large language models (MLLMs) have shown impressive capabilities across various domains, excelling in processing and understanding information from multiple modalities. Despite the rapid progress made previously, insufficient OCR ability hinders MLLMs from excelling in text-related tasks. In this paper, we present Ocean-OCR, a 3B MLLM with state-of-the-art performance on various OCR scenarios and comparable understanding ability on general tasks. We employ Native Resolution ViT to enable variable resolution input and utilize a substantial collection of high-quality OCR datasets to enhance the model performance.

OCR practical scenarios

To evaluate the model's performance on OCR practical scenarios, we construct comprehensive evaluation datasets on OCR tasks including: (1) Document extraction; (2) Scene text recognition; (3) Handwritten recognition. The evaluation metrics can be calculated by the following instruction.

(1) Document Extraction

English Document:

python eval.py --checkpoint_path ocean_ocr_checkpoint_path --eval_type document_en

Chinese Document:

python eval.py --checkpoint_path ocean_ocr_checkpoint_path --eval_type document_zh

(2) Scene Text Recognition

python eval.py --checkpoint_path ocean_ocr_checkpoint_path --eval_type scene_text_rec

(3) Handwritten Recognition

Handwritten in English:

python eval.py --checkpoint_path ocean_ocr_checkpoint_path --eval_type handwritten_en

Handwritten in Chinese:

python eval.py --checkpoint_path ocean_ocr_checkpoint_path --eval_type handwritten_zh

引用

如果您觉得我们模型/代码/论文有帮助，请给我们 ⭐ 和引用 📝，感谢！

@article{chen2025ocean,
  title={Ocean-OCR: Towards General OCR Application via a Vision-Language Model},
  author={Chen, Song and Guo, Xinyu and Li, Yadong and Zhang, Tao and Lin, Mingan and Kuang, Dongdong and Zhang, Youwei and Ming, Lingfeng and Zhang, Fengyu and Wang, Yuran and others},
  journal={arXiv preprint arXiv:2501.15558},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
OCR_eval_data		OCR_eval_data
README.md		README.md
benchmarks.png		benchmarks.png
eval.py		eval.py
eval_internvl2_5.py		eval_internvl2_5.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Ocean-OCR

Introduction

OCR practical scenarios

引用

About

Uh oh!

Releases

Packages

Languages

guoxy25/Ocean-OCR

Folders and files

Latest commit

History

Repository files navigation

Ocean-OCR

Introduction

OCR practical scenarios

引用

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages