Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Questionable behaviour of edge betweenness based community detection with weights #1040

@szhorvat

Description

@szhorvat

This problem concerns the C core, but for now I am going to give examples using the Mathematica interface (because I'm the most fluent in it and because I know precisely how it maps to C). I will add R or Python examples if requested.

Edge betweenness based community detection is works by repeatedly cutting the edge with the highest edge betweenness. This way it constructs a "dendrogram", i.e. a series of possible clusterings. Then it selects the one with the highest modularity.

Betweenness calculations are based on the concept of graph distance. When the algorithm is given weights, it interprets it as the "length"/"distance" of that edge. Conceptually, vertices connected with a "short" / "low weight" edge are more tightly coupled than those connected by a "long" / "high weight" edge.

But modularity calculations use weights the opposite way. It is high weight values (instead of low ones) that indicate tighter coupling.

The problem with igraph_community_edge_betweennes() is that it simultaneously uses both "interpretations".

Let's take the following example graph (Mathematica syntax):

g = Graph[{1 <-> 2, 2 <-> 3, 3 <-> 4, 4 <-> 1}, 
  EdgeWeight -> {5, 1, 1, 1}, VertexLabels -> "Name", 
  EdgeLabels -> "EdgeWeight"]

image

Edge weights are written over the edges.

It is clear that the edge with the highest betweenness is 3 <-> 4. That's because the shortest path between 1 -> 2 is 1 -> 4 -> 3 -> 2 with a total distance of 3, instead of the distance of 5 along 1 -> 2.

Once that is removed, the highest betweenness edge is 1 <-> 2 as it's in the "middle" of a graph like this:

image

It is reasonable to expect {1,4} and {2,3} as the communities, since the 1 <-> 2 coupling was less tight than all the others.


The function actually finds a single "community":

image

Here are the edges in the order of their removal, and the betweenness of each one at that stage:

In[35]:= cl["RemovedEdges"]
Out[35]= {3 <-> 4, 1 <-> 2, 2 <-> 3, 4 <-> 1}

In[36]:= cl["EdgeBetweenness"]
Out[36]= {4., 4., 1., 1.}

It's as we expected them.

The following are the modularities for each stage of the partitioning, in reverse order:

In[20]:= cl["Modularity"]
Out[20]= {-0.3125, -0.28125, -0.25, 0.}

We can verify that they indeed correspond to the following partititonings:

In[34]:= IGModularity[g, #] & /@
 {
  {{1, 2, 3, 4}},
  {{1, 4}, {2, 3}},
  {{1, 4}, {2}, {3}},
  {{1}, {4}, {2}, {3}}
  }

Out[34]= {0., -0.25, -0.28125, -0.3125}

Notice that they are negative, the modularity never improves. This is because IGModularity (i.e. igraph_modularity()) considers 1 <-> 2 to be very tightly coupled with its weight of 5. In fact it would prefer the following partitioning:

In[40]:= IGModularity[g, {{1, 2}, {3, 4}}]
Out[40]= 0.125

I hope it's clear why I think that the current behaviour of igraph_community_edge_betweennes() is not quite right. But figuring out what it should actually do isn't quite a trivial question.

Am I misunderstanding something here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions