Thanks to visit codestin.com
Credit goes to github.com

Skip to content

alexZeakis/pyTokenJoin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pyTokenJoin

alt text

Overview

TokenJoin is an efficient method for solving the Fuzzy Set Similarity Join problem. It relies only on tokens and their defined utilities, avoiding pairwise comparisons between elements. It is submitted to the International Conference on Very Large Databases (VLDB). This is the repository for the python source code. More information about the original method can be found here.

Installation

You can easily install pytokenjoin from PyPI using pip:

pip install pytokenjoin

More on PyPI.

Usage

There are two ways to use TokenJoin:

  • When using a threshold δ, e.g. δ=0.7
  • When requesting top-k results, e.g. k=100.

There are also two similarity functions supported: Jaccard and Edit Similarity.

More information on how to use the functions can be found on this jupyter notebook.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors