Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Consider replacing compound match strength with the highest match strength for a group of voters #1550

@marekhorst

Description

@marekhorst

Affiliation matching algorithm is driven by the concept of matchers and voters. Each matcher consists of multiple voters, each having match strength assigned.

When multiple voters vote for a match we are recalculating the new match strength for a given affiliation and organization pair with the following formula:

matchStrength = (matchStrength + voter.getMatchStrength()) - matchStrength * voter.getMatchStrength(); // bear in mind: the strengths are less that one
.

After countless analysis of the affmatching outcome whenever false positive was reported it was noticed the final match strength defined for a given affiliation and organization matched pair was almost always extremely high. This means it is pretty difficult to define a reasonable match strength threshold below which we could eliminate matches from exporting them back to the graph.

Currently some voters defined for a matcher are similar (e.g. strict matching and levenshtein distance matching) in a way they work on the same part of the affiliation organization name and building a compound match strength for that pair may result in an artificial increase of the match strength.

We should consider addressing this match strength value "inflation" by reducing the final match strength for a given pair whenever multiple voters voted for a match e.g. by picking the highest match strength of a voter claiming the match is valid.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions