-
-
Notifications
You must be signed in to change notification settings - Fork 427
Description
This problem concerns the C core, but for now I am going to give examples using the Mathematica interface (because I'm the most fluent in it and because I know precisely how it maps to C). I will add R or Python examples if requested.
Edge betweenness based community detection is works by repeatedly cutting the edge with the highest edge betweenness. This way it constructs a "dendrogram", i.e. a series of possible clusterings. Then it selects the one with the highest modularity.
Betweenness calculations are based on the concept of graph distance. When the algorithm is given weights, it interprets it as the "length"/"distance" of that edge. Conceptually, vertices connected with a "short" / "low weight" edge are more tightly coupled than those connected by a "long" / "high weight" edge.
But modularity calculations use weights the opposite way. It is high weight values (instead of low ones) that indicate tighter coupling.
The problem with igraph_community_edge_betweennes()
is that it simultaneously uses both "interpretations".
Let's take the following example graph (Mathematica syntax):
g = Graph[{1 <-> 2, 2 <-> 3, 3 <-> 4, 4 <-> 1},
EdgeWeight -> {5, 1, 1, 1}, VertexLabels -> "Name",
EdgeLabels -> "EdgeWeight"]
Edge weights are written over the edges.
It is clear that the edge with the highest betweenness is 3 <-> 4
. That's because the shortest path between 1 -> 2
is 1 -> 4 -> 3 -> 2
with a total distance of 3, instead of the distance of 5 along 1 -> 2
.
Once that is removed, the highest betweenness edge is 1 <-> 2
as it's in the "middle" of a graph like this:
It is reasonable to expect {1,4}
and {2,3}
as the communities, since the 1 <-> 2
coupling was less tight than all the others.
The function actually finds a single "community":
Here are the edges in the order of their removal, and the betweenness of each one at that stage:
In[35]:= cl["RemovedEdges"]
Out[35]= {3 <-> 4, 1 <-> 2, 2 <-> 3, 4 <-> 1}
In[36]:= cl["EdgeBetweenness"]
Out[36]= {4., 4., 1., 1.}
It's as we expected them.
The following are the modularities for each stage of the partitioning, in reverse order:
In[20]:= cl["Modularity"]
Out[20]= {-0.3125, -0.28125, -0.25, 0.}
We can verify that they indeed correspond to the following partititonings:
In[34]:= IGModularity[g, #] & /@
{
{{1, 2, 3, 4}},
{{1, 4}, {2, 3}},
{{1, 4}, {2}, {3}},
{{1}, {4}, {2}, {3}}
}
Out[34]= {0., -0.25, -0.28125, -0.3125}
Notice that they are negative, the modularity never improves. This is because IGModularity
(i.e. igraph_modularity()
) considers 1 <-> 2
to be very tightly coupled with its weight of 5. In fact it would prefer the following partitioning:
In[40]:= IGModularity[g, {{1, 2}, {3, 4}}]
Out[40]= 0.125
I hope it's clear why I think that the current behaviour of igraph_community_edge_betweennes()
is not quite right. But figuring out what it should actually do isn't quite a trivial question.
Am I misunderstanding something here?