Thanks to visit codestin.com
Credit goes to github.com

Skip to content

mansueto-institute/mi-chainlink

Repository files navigation

mi-chainlink

A powerful, flexible framework for entity resolution and record linkage using DuckDB as the database engine built upon the work of Who Owns Chicago by the Mansueto Institute for Urban Innovation including the work of Kevin Bryson, Ana (Anita) Restrepo Lachman, Caitlin P., Joaquin Pinto, and Divij Sinha.

This package enables you to load data from various sources, clean and standardize entity names and addresses, and create links between entities based on exact and fuzzy matching techniques.

Source: https://github.com/mansueto-institute/mi-chainlink

Documentation: https://mansueto-institute.github.io/mi-chainlink/

Issues: https://github.com/mansueto-institute/mi-chainlink/issues

Overview

This framework helps you solve the entity resolution problem by:

  1. Loading data from multiple sources into a DuckDB database
  2. Cleaning and standardizing entity names and addresses
  3. Creating exact matches between entities based on names and addresses
  4. Generating fuzzy matches using TF-IDF similarity
  5. Exporting the resulting linked data for further analysis

The system is designed to be configurable through YAML files and supports incremental updates to an existing database.

Installation

Package is available on PyPI. You can install it using pip or uv:

pip install mi-chainlink
uv add mi-chainlink

Usage

Command Line Interface

# Run interactive session
chainlink

# Run with path to config yaml
chainlink path/to/config.yaml

Python Interface

from chainlink import chainlink

chainlink(
    config: dict, ## dict with config details
    config_path: str | Path = DIR / "configs/config.yaml", ## path to store dict post processing
)

About

A powerful, flexible framework for entity resolution and record linkage.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •