0% found this document useful (0 votes)

130 views34 pages

Social Network Graph Mining

The document summarizes key concepts related to mining social network graphs. It defines a social network as a collection of entities with relationships between them, and notes they can be modeled as graphs. It discusses properties of social networks like non-randomness and locality. Various types of networks are described like telephone, email, and collaboration networks. It also covers clustering in social networks to identify communities and defines the Girvan-Newman algorithm for detecting communities based on betweenness centrality.

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views34 pages

Social Network Graph Mining

Uploaded by

SYA63Raj More

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Mining

Social Network Graphs

Debapriyo Majumdar
Data Mining – Fall 2014
Indian Statistical Institute Kolkata

November 13, 17, 2014

Social Network

No introduc+on required

Really?

We s7ll need to understand a
few proper7es

disclaimer: the brand logos are used here en7rely for educa7onal purpose 2
Social Network
§ A collection of entities
– Typically people, but could be something else too
§ At least one relationship between entities of the network
– For example: friends
– Sometimes boolean: two people are either friends or they are not
– May have a degree
– Discrete degree: friends, family, acquaintances, or none
– Degree – real number: the fraction of the average day that two people
spend talking to each other
§ An assumption of nonrandomness or locality
– Hard to formalize
– Intuition: that relationships tend to cluster
– If entity A is related to both B and C, then the probability that B and C
are related is higher than average (random)
3
Social Network as a Graph

A B D E

A graph with
boolean (friends)
C relationship
G F

§ Check for the non-randomness criterion

§ In a random graph (V,E) of 7 nodes and 9 edges, if XY is an edge, YZ
is an edge, what is the probability that XZ is an edge?
– For a large random graph, it would be close to |E|/(|V|C2) = 9/21 ~ 0.43
– Small graph: XY and YZ are already edges, so compute within the rest
– So the probability is (|E|−2)/(|V|C2−2) = 7/19 = 0.37
§ Now let’s compute what is the probability for this graph in particular

Example courtesy: Leskovec, Rajaraman and Ullman 4

Social Network as a Graph

A B D E

have A graph with

D o es
boolean (friends)
locality C relationship
G F
ty
proper
§ For each X, check possible YZ and check if YZ is an edge or not
§ Example: if X = A, YZ = {BC}, it is an edge
X= YZ= Yes/Total X= YZ= Yes/Total
A BC 1/1 E DF 1/1
B AC, AD, CD 1/3 F DE,DG,EG 2/3
C AB 1/1 G DF 1/1
BE,BG,BF,EF,
D 2/6 Total 9/16 ~ 0.56
EG,FG
5
Types of Social (or Professional) Networks

A B D E

C G F

§ Of course, the “social network”. But also several other types

§ Telephone network
§ Nodes are phone numbers
§ AB is an edge if A and B talked over phone within the last one week,
or month, or ever
§ Edges could be weighted by the number of times phone calls were
made, or total time of conversation
6
Types of Social (or Professional) Networks

A B D E

C G F

§ Email network: nodes are email addresses

§ AB is an edge if A and B sent mails to each other within the last one
week, or month, or ever
– One directional edges would allow spammers to have edges
§ Edges could be weighted
§ Other networks: collaboration network – authors of papers, jointly
written papers or not
§ Also networks exhibiting locality property
7
Clustering of Social Network Graphs
§ Locality property à there are clusters
§ Clusters are communities
– People of the same institute, or company
– People in a photography club
– Set of people with “Something in common” between them
§ Need to define a distance between points (nodes)
§ In graphs with weighted edges, different distances exist
§ For graphs with “friends” or “not friends” relationship
– Distance is 0 (friends) or 1 (not friends)
– Or 1 (friends) and infinity (not friends)
– Both of these violate the triangle inequality
– Fix triangle inequality: distance = 1 (friends) and 1.5 or 2 (not
friends) or length of shortest path
8
Tradi7onal Clustering
A B D E

C G F

§ Intuitively, two communities

§ Traditional clustering depends on the distance
– Likely to put two nodes with small distance in the same cluster
– Social network graphs would have cross-community edges
– Severe merging of communities likely
§ May join B and D (and hence the two communities) with not
so low probability
9
Betweenness of an Edge
A B D E

C G F

§ Betweenness of an edge AB: #of pairs of nodes (X,Y) such that AB lies on
the shortest path between X and Y
– There can be more than one shortest paths between X and Y
– Credit AB the fraction of those paths which include the edge AB
§ High score of betweenness means?
– The edge runs “between” two communities
§ Betweenness gives a better measure
– Edges such as BD get a higher score than edges such as AB
§ Not a distance measure, may not satisfy triangle inequality. Doesn’t matter!
10
The Girvan – Newman Algorithm
§ Step 1 – BFS: Start at a node X, Calculate betweenness of edges
perform a BFS with X as root 1
E
§ Observe: level of node Y = length 1
of shortest path from X to Y 1
D F
§ Edges between level are called Level 1
“DAG” edges
– Each DAG edge is part of at
least one shortest path from X 1 B G Level 2
2

§ Step 2 – Labeling: Label each node

Y by the number of shortest paths
from X to Y A C Level 3
1 1
11
The Girvan – Newman Algorithm
Step 3 – credit sharing: Calculate betweenness of edges
§ Each leaf node gets credit 1 1
§ Each non-leaf node gets 1 + E
sum(credits of the DAG edges to the
1 4.5 1.5
level below) 1
§ Credit of DAG edges: Let Yi (i=1, 4.5 D Level 1
F
… , k) be parents of Z, pi = label(Yi) 1.5
credit(Z ) × pi 3 0.5 0.5
credit(Yi , Z ) =
( p1 +! pk )
§ Intuition: a DAG edge YiZ gets the
1 B G Level 2
share of credit of Z proportional to 3 2
1
the #of shortest paths from X to Z
1 1
going through YiZ
Finally: Repeat Steps 1, 2 and 3 with
each node as root. For each edge, A C Level 3
betweenness = sum credits obtained in all 1 1 1 1
12
iterations / 2
Computa7on in prac7ce
§ Complexity: n nodes, e edges
– BFS starting at each node: O(e)
– Do it for n nodes
– Total: O(ne) time
– Very expensive
§ Method in practice
– Choose a random subset W of the nodes
– Compute credit of each edge starting at each node in W
– Sum and compute betweenness
– A reasonable approximation

13
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

14
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

15
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

16
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

17
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

18
Finding Communi7es using Betweenness
Method 1:
§ Keep adding edges (among existing ones) starting from lowest betweenness
§ Gradually join small components to build large connected components

19
Finding Communi7es using Betweenness
Method 2:
§ Start from all existing edges. The graph may look like one big component.
§ Keep removing edges starting from highest betweenness
§ Gradually split large components to arrive at communities

20
Finding Communi7es using Betweenness
Method 2:
§ Start from all existing edges. The graph may look like one big component.
§ Keep removing edges starting from highest betweenness
§ Gradually split large components to arrive at communities

21
Finding Communi7es using Betweenness
Method 2:
§ Start from all existing edges. The graph may look like one big component.
§ Keep removing edges starting from highest betweenness
§ Gradually split large components to arrive at communities

At some point, removing the edge with highest betweenness would split
the graph into separate components 22
Finding Communi7es using Betweenness
§ For a fixed threshold of betweenness, both methods would
ultimately produce the same clustering
§ However, a suitable threshold is not known beforehand
§ Method 1 vs Method 2
– Method 2 is likely to take less number of operations. Why?
– Inter-community edges are less than intra-community edges

23
Triangles in Social Network Graph
§ Number of triangles in a social network graph is expected to
be much larger than a random graph with the same size
– The locality property
§ Counting the number of triangles
– How much the graph looks like a social network
– Age of community
• A new community forms
• Members bring in their like minded friends
• Such new members are expected to eventually connect to
other members directly

24
Triangle Coun7ng Algorithm
Graph (V, E); |V| = n, |E| = m

§ Step 1: Compute degree of each node

– Examine each edge
– Add degree 1 to each of the two nodes
– Takes O(m) time
§ Step 2: A hash table (vi,vj) à 1
– So that, given two nodes, we can determine if they have an edge
between them
– Construction takes O(m) time
– Each query ~expected O(1) time, with a proper hash function
§ Step 3: An index v à list of nodes adjacent to v
– Construction takes O(m) time, querying takes O(1) time
25
Coun7ng Heavy Hi[er Triangles
§ Heavy hitter node: a node with degree ≥ √m
§ Note: there are at most 2√m heavy hitter nodes
– More than 2√m nodes à total degree > 2m (but |E| = m)
§ Heavy hitter triangle: triangle with all 3 heavy hitter nodes
§ Number of possible heavy hitter triangles: at most 2√mC3 ~
O(m3/2)
§ For each possible triangle, use hash table (step 2) to check if
all three edges exist
§ Takes O(m3/2) time

26
Coun7ng other Triangles
§ Consider an ordering of nodes vi << vj if
– Either degree(vi) < degree(vj), and
– If degree(vi) = degree(vj) then i < j
§ For each edge (vi,vj)
– If both nodes are heavy hitters, skip (already done)
– Suppose vi is not a heavy hitter
– Find nodes w1,w2,…,wk which are adjacent to vi (using node à
adjacent nodes index, step 3) [Takes O(k) time]
– For each wl , l = 1, … , k check if edge vjwl exist, in O(1) time,
total O(k) time
– Count the triangle {vi vj wl} if and only if
• Edge vjwl exists
• Also vi << wl
– Total time for each edge (vi,vj) is O(√m)
– There are m edges, total time is O(m3/2) time

27
Op7mality
Worst case scenario
§ If G is a complete graph
§ Number of triangles = mC3 ~ O(m3/2)
§ Cannot even enumerate all triangles in less than O(m3/2)
§ Hence it is the lower bound for computing all triangles
If G is sparse
§ Consider a complete graph G’ with n nodes, m edges
§ Note that m = nC2 = O(n2)
§ Construct G from G’ by adding a chain of length n2
§ The number of triangles remain the same, O(m3/2)
§ The number of edges remain of the same order O(m)
§ G is quite sparse, lowering edge to node ratio
§ Still cannot compute the triangles in less than O(m3/2) time

28
Directed Graphs in (Social) Networks
§ Set of nodes V and directed edges (arcs) u à v
§ The web: pages link to other pages
§ Persons made calls to other persons
§ Twitter, Google+: people follow other people
§ All undirected graphs can be considered as directed
– Think of each edge as bidirectional

29
Paths and Neighborhoods
§ Path of length k: a sequence of nodes v0,v1,…,vk from v0 to vk
so that vi à vi+1 is an arc for i = 0, …, k – 1

§ Neighborhood N(v,d) of radius d for a node v: set of all nodes

w such that there is a path from v to w of length ≤ d
§ For a set of nodes V, N(V,d):= {w | there is a path of length ≤ d
from some v in V to w}
§ Neighborhood profile of a node v: sequence of sizes of its
neighborhoods of radius d = 1, 2, …; that is
|N(v,1)|, |N(v,2)|, |N(v,3)|, …

30
Neighborhood Proﬁle
A B D E

C G F

Neighborhood profile of B Neighborhood profile of A

N(“B”,1) = 4 N(“A”,1) = 3
N(“B”,2) = 7 N(“A”,2) = 4
N(“A”,3) = 7

31
Diameter of a Graph
§ Diameter of a graph G(V,E): the smallest integer d
such that for any two nodes v, w in V, there is a path
of length at most d from v to w
– Only makes sense for strongly connected graphs
– Can reach any node from any node
§ The web graph: not strongly connected
– But there is a large strongly connected component
§ The six degrees of separation conjecture
– The diameter of the graph of the people in the world is six

32
Diameter and Neighborhood Proﬁle
§ Neighborhood profile of a node v
|N(v,1)|, |N(v,2)|, |N(v,3)|, … … |V| = N(v,k) for some k
§ Denote this k as d(v)
§ If G is a complete graph, d(v) = 1
§ Diameter of G is maxv{d(v)}

33
Reference
§ Mining of Massive Datasets, by Leskovec, Rajaraman
and Ullman, Chapter 10

A Survey of AI Text-to-Image and AI Text-to-Video Generators
No ratings yet
A Survey of AI Text-to-Image and AI Text-to-Video Generators
5 pages
Network Analysis Techniques
No ratings yet
Network Analysis Techniques
77 pages
Unit 2 Notes Social Media
No ratings yet
Unit 2 Notes Social Media
28 pages
Cisco Email Security
No ratings yet
Cisco Email Security
9 pages
Module 6 - Business Correspondence PDF
No ratings yet
Module 6 - Business Correspondence PDF
47 pages
November 2020 HEQ Exam Payment Details PDF
No ratings yet
November 2020 HEQ Exam Payment Details PDF
2 pages
BDP 2023 11
No ratings yet
BDP 2023 11
60 pages
SMA Module2A
No ratings yet
SMA Module2A
130 pages
FMT
No ratings yet
FMT
3 pages
Lecture 12 Graph
No ratings yet
Lecture 12 Graph
57 pages
Graphs
No ratings yet
Graphs
65 pages
Chapter 6 - Mining Social Network Graphs PDF
No ratings yet
Chapter 6 - Mining Social Network Graphs PDF
74 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
FSD Module 3 Notes
No ratings yet
FSD Module 3 Notes
16 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
2.8 Graphs
No ratings yet
2.8 Graphs
16 pages
CBSA Training Materials (A-2020-13556 / March 12, 2021) (OCR)
No ratings yet
CBSA Training Materials (A-2020-13556 / March 12, 2021) (OCR)
190 pages
5 Thmodule
No ratings yet
5 Thmodule
28 pages
Module VI - Mining Social Network Graph
No ratings yet
Module VI - Mining Social Network Graph
88 pages
Graph Theory and Social Networks: Alexandru Costan
No ratings yet
Graph Theory and Social Networks: Alexandru Costan
28 pages
Section 5
No ratings yet
Section 5
21 pages
SN Notes
No ratings yet
SN Notes
28 pages
Social Network Analysis
No ratings yet
Social Network Analysis
74 pages
Unit 6 Mining Social Network Graph
No ratings yet
Unit 6 Mining Social Network Graph
9 pages
C SIGPM 2403-Demo
No ratings yet
C SIGPM 2403-Demo
5 pages
E-Communities - Part1
No ratings yet
E-Communities - Part1
80 pages
SOCOTEC VPN & Portal Access Guide
No ratings yet
SOCOTEC VPN & Portal Access Guide
10 pages
Networks Sna
No ratings yet
Networks Sna
126 pages
03-Basic Concepts of Network Analysis
No ratings yet
03-Basic Concepts of Network Analysis
40 pages
Complex Network Models
No ratings yet
Complex Network Models
110 pages
Unit 2 BCAM-061
No ratings yet
Unit 2 BCAM-061
26 pages
SMA Module2A
No ratings yet
SMA Module2A
121 pages
Mit14 15s22 Lec2
No ratings yet
Mit14 15s22 Lec2
39 pages
Digital Marketing Agreement Template
No ratings yet
Digital Marketing Agreement Template
4 pages
BDA Practical Experiment 1
No ratings yet
BDA Practical Experiment 1
5 pages
Graph Theory Basics for Beginners
No ratings yet
Graph Theory Basics for Beginners
89 pages
04 Communities
No ratings yet
04 Communities
78 pages
Computer, Email, and Password Security
100% (1)
Computer, Email, and Password Security
65 pages
Wps Usa Corp.: Purchase Order
No ratings yet
Wps Usa Corp.: Purchase Order
1 page
Lesson 1
No ratings yet
Lesson 1
50 pages
C2 - Social Network Measurement
No ratings yet
C2 - Social Network Measurement
42 pages
Chapter 11 Mining Social-Network Graphs
No ratings yet
Chapter 11 Mining Social-Network Graphs
13 pages
BDA Unit - 05
No ratings yet
BDA Unit - 05
7 pages
Chapter 3
No ratings yet
Chapter 3
54 pages
Topic 1 - Graphs
No ratings yet
Topic 1 - Graphs
14 pages
Email With Attached Resume and Cover Letter
100% (2)
Email With Attached Resume and Cover Letter
4 pages
Teknologi Dan Inovasi Dalam Pendidikan
No ratings yet
Teknologi Dan Inovasi Dalam Pendidikan
18 pages
SNA-Community Detection
No ratings yet
SNA-Community Detection
38 pages
Menendez Llorente
No ratings yet
Menendez Llorente
22 pages
WSC Week4 Community
No ratings yet
WSC Week4 Community
77 pages
Sna It Unit5
No ratings yet
Sna It Unit5
20 pages
Social Network Analysis Guide
No ratings yet
Social Network Analysis Guide
62 pages
Mod1 2
No ratings yet
Mod1 2
21 pages
05 Networks
No ratings yet
05 Networks
48 pages
Template Email
100% (1)
Template Email
3 pages
Module3 Communitynetworks
No ratings yet
Module3 Communitynetworks
102 pages
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
No ratings yet
HCMUT MATHS4CS 055263 Assignment Community Structure Identification IMP
10 pages
LECTURE NOTES 1 - Introduction To Graphs To Post
No ratings yet
LECTURE NOTES 1 - Introduction To Graphs To Post
67 pages
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
No ratings yet
FALLSEM2018-19 - CSE3021 - ETH - SJT824 - VL2018191006149 - Reference Material I - Module3 - CommunityNetworks1
98 pages
Community Detection in Social Network Ver4
No ratings yet
Community Detection in Social Network Ver4
23 pages
Week 2 - Social Network Analysis
No ratings yet
Week 2 - Social Network Analysis
30 pages
Gionis
No ratings yet
Gionis
191 pages
Graph Analytics For Python Developers
No ratings yet
Graph Analytics For Python Developers
13 pages
IBM Managing Sterling File Gateway (StudentGuide)
No ratings yet
IBM Managing Sterling File Gateway (StudentGuide)
216 pages
Graph Algorithms in MapReduce & Spark
No ratings yet
Graph Algorithms in MapReduce & Spark
22 pages
LogBook 1
No ratings yet
LogBook 1
5 pages
Element of Graph Theroy
No ratings yet
Element of Graph Theroy
6 pages
Week 16
No ratings yet
Week 16
47 pages
Des Example Something
No ratings yet
Des Example Something
12 pages
Week 3
No ratings yet
Week 3
3 pages
CS 106X, Lecture 22 Graphs BFS DFS: Programming Abstractions in C++, Chapter 18
No ratings yet
CS 106X, Lecture 22 Graphs BFS DFS: Programming Abstractions in C++, Chapter 18
80 pages
Issues of Operating Systems Security
No ratings yet
Issues of Operating Systems Security
6 pages
Graph Algorithms & Data Mining
No ratings yet
Graph Algorithms & Data Mining
7 pages
FOS - Center - (V. 6.7.44) - UM - DEC2019
No ratings yet
FOS - Center - (V. 6.7.44) - UM - DEC2019
80 pages
Remus A Security Enhanced Operating Syst
No ratings yet
Remus A Security Enhanced Operating Syst
26 pages
Huawei Information Security Guide
No ratings yet
Huawei Information Security Guide
26 pages
rr2012 Conclude Libre PDF
No ratings yet
rr2012 Conclude Libre PDF
15 pages
Social Network Analysis
No ratings yet
Social Network Analysis
22 pages
Bda tt1
No ratings yet
Bda tt1
8 pages
Accenture India Coding Assessment Notice
No ratings yet
Accenture India Coding Assessment Notice
2 pages
Community Detection in Social Media: Symeon Papadopoulos
No ratings yet
Community Detection in Social Media: Symeon Papadopoulos
75 pages
NSF UG Engg Scholarship 2025 - Poster
No ratings yet
NSF UG Engg Scholarship 2025 - Poster
1 page
Brain Tumor Detection Research
No ratings yet
Brain Tumor Detection Research
1 page
Advanced Topics in Data Mining Special Focus: Social Networks
No ratings yet
Advanced Topics in Data Mining Special Focus: Social Networks
35 pages
7 Best Ways To Lookup For IP Address
No ratings yet
7 Best Ways To Lookup For IP Address
3 pages
SITXCCS015-Learner Assessment Pack-V2.1
0% (2)
SITXCCS015-Learner Assessment Pack-V2.1
65 pages
Controllable Video Generation With Text-Based Instructions
No ratings yet
Controllable Video Generation With Text-Based Instructions
12 pages
Sem6 Minor Report
No ratings yet
Sem6 Minor Report
33 pages
Final Report
No ratings yet
Final Report
22 pages
Effective Business Communication
No ratings yet
Effective Business Communication
29 pages
Complete Safe Links Overview For Microsoft Defender For Office 365 - Microsoft Defender For Office 365 - Microsoft Learn
No ratings yet
Complete Safe Links Overview For Microsoft Defender For Office 365 - Microsoft Defender For Office 365 - Microsoft Learn
18 pages
Automated Facial Recognition
No ratings yet
Automated Facial Recognition
6 pages
Worksheet Grade 10
No ratings yet
Worksheet Grade 10
6 pages
Financial Education and Financial Knowledge: Evidence From Indian Schools
No ratings yet
Financial Education and Financial Knowledge: Evidence From Indian Schools
39 pages
VanWest - College - Online - Placement - Test - Manual (English)
No ratings yet
VanWest - College - Online - Placement - Test - Manual (English)
2 pages
Module 7 Sample Test 1: A. B. C. D
No ratings yet
Module 7 Sample Test 1: A. B. C. D
9 pages
Foundation Communication
No ratings yet
Foundation Communication
7 pages
An Analysis On Measuring Graph Patterns in Social Networks
No ratings yet
An Analysis On Measuring Graph Patterns in Social Networks
6 pages
SMOKE HOOD - Primitive Skateboarding
No ratings yet
SMOKE HOOD - Primitive Skateboarding
1 page
National University - Bangladesh
No ratings yet
National University - Bangladesh
3 pages
Credit Card Processing Test Cases
No ratings yet
Credit Card Processing Test Cases
8 pages
Research Paper Emaildetection
No ratings yet
Research Paper Emaildetection
6 pages

Social Network Graph Mining

Uploaded by

Social Network Graph Mining

Uploaded by

Mining

Social Network Graphs

November 13, 17, 2014

§ Check for the non-randomness criterion

Example courtesy: Leskovec, Rajaraman and Ullman 4

have A graph with

§ Of course, the “social network”. But also several other types

§ Email network: nodes are email addresses

§ Intuitively, two communities

§ Step 2 – Labeling: Label each node

§ Step 1: Compute degree of each node

§ Neighborhood N(v,d) of radius d for a node v: set of all nodes

Neighborhood profile of B Neighborhood profile of A

You might also like