CLUSTERING
SOCIAL NETWORK
GRAPHS
Introduction
Social Network is a nonrandom collection
of entities in a network, having at least
one relationship between them
Social networks contain communities of
entities that are connected by many edges
Eg: Groups of friends at school, Researchers
interested in the same topic etc.
Communities can be identified by clustering
Absence of a proper distance measure
Disadvantages Sub communities will not be identified
of Possibility of different cluster nodes
Standard getting combined
Possibility of wrong clustering in both
Clustering K Means and Hierarchical clustering
Algorithms
Betweenness of an edge (a, b) is the
number of pairs of nodes x and y
such that the edge (a, b) lies on the
Solving the shortest path between x and y
Finding the edges that are least likely
problem to be inside a community
Large betweenness shows edge runs
between two different communities
"Betweenness"
The Girvan-Newman Algorithm
Used for calculating the number of shortest paths going through each edge
Visits each node X once and computes the number of shortest paths from X to each of the other nodes that
go through each of the edges
STEPS
1. Performing a breadth-first search (BFS) of the graph, starting at the node X
2. Label each node by the number of shortest paths that reach it from the root and label each node Y by sums of labels
3. Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go
through e
4. Repeat for all nodes
5. Completing credit calculation
The Girvan-Newman Algorithm contd...
Step1 - performing a breadth-first search (BFS) of the graph
The Girvan-Newman Algorithm contd...
Step 2 - Label each node by the number of shortest paths that reach it from the root and label each node Y
by sums of labels
The Girvan-Newman Algorithm contd...
Step 3 - Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go through e
The rules for the calculation are as follows:
1. Each leaf in the DAG (a leaf is a node with no DAG edges to nodes at levels below) gets a credit of 1.
2. Each node that is not a leaf gets a credit equal to 1 plus the sum of the credits of the DAG edges from that node to the level
below.
3. A DAG edge e entering node Z from the level above is given a share of the credit of Z proportional to the fraction of shortest
paths from the root to Z that go through e.
The Girvan-Newman Algorithm contd...
Step 5 &6 -Repeat for all nodes and Completing credit calculation
Since each shortest path will have been discovered twice – once when each of its endpoints is the root – we must
divide the credit for each edge by 2.
Girvan-Newman Algorithm
& Betweenness
Remove Edges with highest credit value
Stopped when individuals are assigned to clusters
The Girvan-Newman Algorithm Disadvantage
Nodes cannot be in two different communities together
Certain nodes may be removed from the community on being associated with another
community
THANKYOU