Stars
DSPy: The framework for programming—not prompting—language models
Beancount: Double-Entry Accounting from Text Files.
Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.
Python bindings to libpostal for fast international address parsing/normalization
Pelias is a modular open-source geocoder using Elasticsearch.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
Get an intuitive sense for the ROC curve and other binary classification metrics with interactive visualization.
Table Transformer (TATR) is a deep learning model for extracting tables from unstructured documents (PDFs and images). This is also the official repository for the PubTables-1M dataset and GriTS ev…
OCR, layout analysis, reading order, table recognition in 90+ languages
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
UniTable: Towards a Unified Table Foundation Model
OCR Annotations from Amazon Textract for Industry Documents Library
Automated CI toolchain to produce precompiled opencv-python, opencv-python-headless, opencv-contrib-python and opencv-contrib-python-headless packages.
We write your reusable computer vision tools. 💜
Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.
A Unified Toolkit for Deep Learning Based Document Image Analysis
The Query Builder component for React
Layered, depth-first reading—start with summaries, tap to explore details, and gain clarity on complex topics.
Analyze how a Git repo grows over time
A lightning-fast search engine API bringing AI-powered hybrid search to your sites and applications.
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Open-source scientific and technical publishing system built on Pandoc.
A github action for exporting dbt docs to a notion database
🔅 Shapash: User-friendly Explainability and Interpretability to Develop Reliable and Transparent Machine Learning Models
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy
Optimal binning: monotonic binning with constraints. Support batch & stream optimal binning. Scorecard modelling and counterfactual explanations.
Fit interpretable models. Explain blackbox machine learning.