-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Affiliation matching algorithm is driven by the concept of matchers and voters. Each matcher consists of multiple voters, each having match strength assigned.
When multiple voters vote for a match we are recalculating the new match strength for a given affiliation and organization pair with the following formula:
Line 39 in 871021e
| matchStrength = (matchStrength + voter.getMatchStrength()) - matchStrength * voter.getMatchStrength(); // bear in mind: the strengths are less that one |
After countless analysis of the affmatching outcome whenever false positive was reported it was noticed the final match strength defined for a given affiliation and organization matched pair was almost always extremely high. This means it is pretty difficult to define a reasonable match strength threshold below which we could eliminate matches from exporting them back to the graph.
Currently some voters defined for a matcher are similar (e.g. strict matching and levenshtein distance matching) in a way they work on the same part of the affiliation organization name and building a compound match strength for that pair may result in an artificial increase of the match strength.
We should consider addressing this match strength value "inflation" by reducing the final match strength for a given pair whenever multiple voters voted for a match e.g. by picking the highest match strength of a voter claiming the match is valid.