Questionable behaviour of edge betweenness based community detection with weights

This problem concerns the C core, but for now I am going to give examples using the Mathematica interface (because I'm the most fluent in it and because I know precisely how it maps to C). I will add R or Python examples if requested.

Edge betweenness based community detection is works by repeatedly cutting the edge with the highest edge betweenness.  This way it constructs a "dendrogram", i.e. a series of possible clusterings. Then it selects the one with the highest modularity.

Betweenness calculations are based on the concept of graph distance. When the algorithm is given weights, it interprets it as the "length"/"distance" of that edge.  Conceptually, vertices connected with a "short" / "low weight" edge are more tightly coupled than those connected by a "long" / "high weight" edge.

But modularity calculations use weights the opposite way.  It is high weight values (instead of low ones) that indicate tighter coupling.

The problem with `igraph_community_edge_betweennes()` is that it simultaneously uses both "interpretations".

Let's take the following example graph (Mathematica syntax):

```
g = Graph[{1 <-> 2, 2 <-> 3, 3 <-> 4, 4 <-> 1}, 
  EdgeWeight -> {5, 1, 1, 1}, VertexLabels -> "Name", 
  EdgeLabels -> "EdgeWeight"]
```

![image](https://user-images.githubusercontent.com/1212871/32459768-fe5ea566-c330-11e7-8322-f54fda5abaf0.png)

Edge weights are written over the edges.

It is clear that the edge with the highest betweenness is `3 <-> 4`. That's because the shortest path between `1 -> 2` is `1 -> 4 -> 3 -> 2` with a total distance of 3, instead of the distance of 5 along `1 -> 2`.

Once that is removed, the highest betweenness edge is `1 <-> 2` as it's in the "middle" of a graph like this:

![image](https://user-images.githubusercontent.com/1212871/32460092-e4f6a956-c331-11e7-99f6-168124d6c0db.png)

It is reasonable to expect `{1,4}` and `{2,3}` as the communities, since the `1 <-> 2` coupling was less tight than all the others.

----

The function actually finds a single "community":

![image](https://user-images.githubusercontent.com/1212871/32459803-17982386-c331-11e7-956d-e9d53ccb096f.png)

Here are the edges in the order of their removal, and the betweenness of each one at that stage:

```
In[35]:= cl["RemovedEdges"]
Out[35]= {3 <-> 4, 1 <-> 2, 2 <-> 3, 4 <-> 1}

In[36]:= cl["EdgeBetweenness"]
Out[36]= {4., 4., 1., 1.}
```

It's as we expected them.

The following are the modularities for each stage of the partitioning, in *reverse* order:

```
In[20]:= cl["Modularity"]
Out[20]= {-0.3125, -0.28125, -0.25, 0.}
```

We can verify that they indeed correspond to the following partititonings:

```
In[34]:= IGModularity[g, #] & /@
 {
  {{1, 2, 3, 4}},
  {{1, 4}, {2, 3}},
  {{1, 4}, {2}, {3}},
  {{1}, {4}, {2}, {3}}
  }

Out[34]= {0., -0.25, -0.28125, -0.3125}
```

Notice that they are negative, the modularity never improves. This is because `IGModularity` (i.e. `igraph_modularity()`) considers `1 <-> 2` to be very tightly coupled with its weight of 5.  In fact it would prefer the following partitioning:

```
In[40]:= IGModularity[g, {{1, 2}, {3, 4}}]
Out[40]= 0.125
```

I hope it's clear why I think that the current behaviour of `igraph_community_edge_betweennes()` is not quite right.  But figuring out what it should actually do isn't quite a trivial question.

Am I misunderstanding something here?



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Questionable behaviour of edge betweenness based community detection with weights #1040

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Questionable behaviour of edge betweenness based community detection with weights #1040

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions