-
docling Public
Forked from docling-project/doclingDocling bundles PDF document conversion to JSON and Markdown in an easy, self-contained package.
Python MIT License UpdatedSep 2, 2024 -
bonito Public
Forked from BatsResearch/bonitoA lightweight library for generating synthetic instruction tuning datasets for your data without GPT.
Python BSD 3-Clause "New" or "Revised" License UpdatedMar 4, 2024 -
Awesome-Table-Recognition Public
Forked from cv-small-snails/Awesome-Table-RecognitionA curated list of resources dedicated to table recognition
UpdatedJan 28, 2024 -
DocumentLayoutAnalysis Public
Forked from BobLd/DocumentLayoutAnalysisDocument Layout Analysis resources repos for development with PdfPig.
C# UpdatedOct 1, 2023 -
PLIX Public
PLIX (Pipeline for Information Extraction) is a Python package and command line tool for information extraction from (PDF) documents.
-
-
CRASS-data-set Public
Forked from apergo-ai/CRASS-data-setThe data for the CRASS-benchmark. See: https://www.crass.ai for further information.
Jupyter Notebook Apache License 2.0 UpdatedNov 1, 2022 -
docquery Public
Forked from impira/docqueryAn easy way to extract information from documents
Python MIT License UpdatedOct 5, 2022 -
ocrd_segment Public
Forked from OCR-D/ocrd_segmentOCR-D-compliant page segmentation
Python MIT License UpdatedJul 22, 2022 -
doc-hcii2022-slides Public
Slides to our HCII 2022 talk on "Putting users in the loop: How User Research Can Guide AI Development for a Consumer-Oriented Self-service Portal". Imported from https://git.informatik.uni-leipzig…
UpdatedJul 12, 2022 -
pdfix_sdk_example_python Public
Forked from pdfix/pdfix_sdk_example_pythonPDFix SDK samples for Python. PDF manipulation, content extraction, conversion , accessibility and more...
Python UpdatedFeb 11, 2022 -
awesome-data-labeling Public
Forked from HumanSignal/awesome-data-labelingA curated list of awesome data labeling tools
UpdatedOct 8, 2021 -
layout-parser Public
Forked from Layout-Parser/layout-parserA Python Library for Document Layout Understanding
Python Apache License 2.0 UpdatedSep 15, 2021 -
pdfix_sdk_example_cpp Public
Forked from pdfix/pdfix_sdk_example_cppMake PDF Files Accessible, Extract Data from PDF, Convert PDF to HTML, Fill-in PDF Form, Stamp PDF and more...
C++ UpdatedSep 15, 2021 -
BIG-bench-1 Public
Forked from apergo-ai/BIG-bench-1Beyond the Imitation Game collaborative benchmark for enormous language models
Jupyter Notebook Apache License 2.0 UpdatedAug 25, 2021 -
-
GastCluster Public
A set of bash scripts to spread number crunching jobs across several machines and collect the results back into a single file
Apache License 2.0 UpdatedJan 28, 2021 -
SciTSR Public
Forked from Academic-Hammer/SciTSRTable structure recognition dataset of the paper: Complicated Table Structure Recognition
Python MIT License UpdatedJul 7, 2020