Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View frankiert's full-sized avatar

Block or report frankiert

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Benchmarking Open Source LLMs for Text-to-Table Generation

Python 4 Updated Aug 21, 2025

🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based

HTML 329 39 Updated Oct 13, 2023

A Repo For Document AI

Python 3,107 184 Updated Dec 19, 2025

Document Layout Analysis resources repos for development with PdfPig.

C# 627 68 Updated Oct 1, 2023

Label Studio is a multi-type data labeling and annotation tool with standardized output format

TypeScript 25,894 3,268 Updated Dec 19, 2025

A curated list of resources for Document Understanding (DU) topic

1,486 166 Updated Jun 2, 2023

The data for the CRASS-benchmark

Jupyter Notebook 16 2 Updated Oct 24, 2022

Table structure recognition dataset of the paper: Complicated Table Structure Recognition

Python 378 59 Updated Jul 7, 2020

Master repository which includes most other OCR-D repositories as submodules

Makefile 72 19 Updated Jul 4, 2025

Collection of OCR-related python tools and wrappers from @OCR-D

Python 132 33 Updated Dec 19, 2025

Extracts raw text from web archives (WARCs).

Java 7 Updated Feb 24, 2024

A Unified Toolkit for Deep Learning Based Document Image Analysis

Python 5,622 516 Updated Aug 15, 2024