-
-
Notifications
You must be signed in to change notification settings - Fork 31
Description
Dear Antonin Delpeuch,
I read your paper "OpenTapioca: Lightweight Entity Linking for Wikidata" https://arxiv.org/abs/1904.09131.
I like the paper a lot.
I looked at the code https://github.com/wetneb/opentapioca.
I would like to ask you a question about the paper and the implementation.
In the paper on page 6 the last term in the equation for s(e, e') is
(1 - beta)^2 |l(e) intersection l(e')| / |l(e)| |l(e')| .
As far as I understand this term is implemented here:
https://github.com/wetneb/opentapioca/blob/master/opentapioca/similarities.py#L67
My question is: why is it not proba += (1-beta)(1-beta)(len_common/(len(edges_a)*len(edges_b))) ?
In the paper |l(e) intersection l(e')| is not squared but in the implementation len_common is squared.
As far as I understand len_common should not be squared because the term
(1 - beta)^2 |l(e) intersection l(e')| / |l(e)| |l(e')|
is the probability of reaching the same vertex v with one hop from e and one hop from e' if v does not belong to l(e) and l(e').
And (1 - beta)^2 is the probability of not staying on e and e',
|l(e) intersection l(e')| / |l(e)| is the probability of reaching from e (selecting) some vertex v from the intersection and
1 / |l(e')| is probability of reaching from e' the selected vertex v.
It seems that the formula in the paper is correct but is not implemented correctly, isn't it?
Thank you.
Regards,
Petar Mitankin
Software developer
Sirma AI, trading as Ontotext, http://www.ontotext.com/