Transforms PDF, Documents and Images into Enriched Structured Data
-
Updated
Dec 3, 2023 - JavaScript
Transforms PDF, Documents and Images into Enriched Structured Data
Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams
extract internal monitoring data from application logs for collection in a timeseries database
a library for audio and music analysis
The Apache Tika toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF).
Provides functions to read and write from/to an object or array using a simple string notation
Visual Novels resource browser
Extract files from any kind of container formats
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
Tika-Python is a Python binding to the Apache Tika™ REST services allowing Tika to be called natively in the Python community.
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
🦜⛏️ Did you say you like data?
A C++ static library offering a clean and simple interface to the 7-zip shared libraries.
A program to extract files from the RPA archive format.
Stanford Open Information Extraction made simple!
File Injector is a script that allows you to store any file in an image using steganography
DataTool is a program that lets you extract models, maps, and files from Overwatch.
PHP URI Template (RFC 6570) supports both URI expansion & extraction
.net text extraction & export framework
Add a description, image, and links to the extraction topic page so that developers can more easily learn about it.
To associate your repository with the extraction topic, visit your repo's landing page and select "manage topics."