A curated list of resources for Document Understanding (DU) topic
-
Updated
Jun 2, 2023
A curated list of resources for Document Understanding (DU) topic
📚 Process PDFs, Word documents and more with spaCy
Document Layout Analysis resources repos for development with PdfPig.
Document Layout Analysis
Page to PAGE Layout Analysis Tool
Detectron2 for Document Layout Analysis
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
Tools for extract figure, table, text, .. from a pdf document.
Proof of concept of training a simple Region Classifier using PdfPig and ML.NET (LightGBM). The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
BoundaryNet - A Semi-Automatic Layout Annotation Tool
Simple docker deployment of document layout analysis using detectron2
GloSAT Historical Measurement Table Dataset
document layout analysis results
Awesome historical newspaper analysis tools and literature
Proof of concept of a simple SVM Region Classifier using PdfPig and Accord.Net. The objective is to classify each text block in a pdf document page as either title, text, list, table and image.
Project for Deep Learning and its application
An end to end deep learning approach to extract information from shipping records
DocuParse is a high-performance tool for converting PDF documents into clean, structured Markdown files. Designed for speed and accuracy, it extracts and formats content while minimizing errors like hallucinations and repetitions.
Add a description, image, and links to the document-layout-analysis topic page so that developers can more easily learn about it.
To associate your repository with the document-layout-analysis topic, visit your repo's landing page and select "manage topics."