Stars
A curated list of open source tools used in analytics platforms and data engineering ecosystem
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
💫 Toolkit to help you get started with Spec-Driven Development
Code to process many kinds of content by an author into an MCP server
Exercises for the book Artificial Intelligence: A Modern Approach
A reactive notebook for Python — run reproducible experiments, query with SQL, execute as a script, deploy as an app, and version with git. Stored as pure Python. All in a modern, AI-native editor.
A single interface to use and evaluate different agent frameworks
⚡ Cloud-native, AI-powered, document processing pipelines on AWS.
A Python framework for multi-modal document understanding with Amazon Bedrock
ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.
Get up and running with OpenAI gpt-oss, DeepSeek-R1, Gemma 3 and other models.
Toolkit for linearizing PDFs for LLM datasets/training
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Implementation of Nougat Neural Optical Understanding for Academic Documents
A Comprehensive Toolkit for High-Quality PDF Content Extraction
Turn any PDF or image document into structured data for your AI. A powerful, lightweight OCR toolkit that bridges the gap between images/PDFs and LLMs. Supports 100+ languages.
Background job processing library for Elixir focused on simplicity
Automate code & data workflows with interactive Elixir notebooks
A terminal workspace with batteries included
Demo Extension
A ULauncher/Albert extension that supports currency, units and date time conversion, as well as a calculator that supports complex numbers and functions.
Querybook is a Big Data Querying UI, combining collocated table metadata and a simple notebook interface.
A native Rust library for Delta Lake, with bindings into Python
Container runtimes on macOS (and Linux) with minimal setup
</> htmx - high power tools for HTML
Nuke a whole AWS account and delete all its resources.
A curated list of Machine Learning libraries and resources for the Elixir programming language.
Elixir library for time series forecasting, inspired by Facebook's Prophet and NeuralProphet