Social Network Analysis
Some content from Lada Adamic
and Eytan Adar
Vocabulary Lesson
Rela+onal Tie
Actor parentOf
Person supervisorOf
Group reallyHates (+/-)
Event …
… Dyad
Rela+on: collec=on of =es of a specific type
(every parentOf =e)
Vocabulary Lesson
If A likes B and B likes C then A likes C (transi?vity)
If A likes B and C likes B then A likes C
…
Triad
Vocabulary Lesson
Social
Network
One mode
Vocabulary Lesson
Social
Network
Two mode
Vocabulary Lesson
Ego-Centered ego
Network
(egonet,
neighborhood)
Describing Networks
• Geodesic
– shortest_path(n,m)
• Diameter
– max(geodesic(n,m)) n,m actors in graph
• Density / Sparsity
– Number of exis?ng edges / All possible edges
– Degeneracy (number k such that every subgraph has a
vertex of degree k or less)
• Related to arboricity (number of forests that cover every
edge)
Degeneracy in the Real World
From hMps://arxiv.org/pdf/1006.5440
Degeneracy in the Real World
From hMps://arxiv.org/pdf/1006.5440
Degeneracy in the Real World
From hMps://arxiv.org/pdf/1006.5440
Random Network Graph Models
• Two classic examples:
– Erdős–Rényi
• G(n,M): randomly draw M edges between n nodes
• G(n,p): randomly draw edges between n nodes, each
with probability p.
– These models don’t really model the real world, in
that they don’t show:
• Small world phenomenon
• Power laws
• Sparsity
Small world experiment
MA
NE
Milgram’s experiment (1960’s):
• Given a target individual and a par?cular property, pass the message to a
person you correspond with who is “closest” to the target.
• “Six degrees of separa?on”
Two more examples of power laws
Distribu?on of users among
web sites
Sites ranked by popularity
Power Laws (Scale-Free Networks)
• Power-law
– A scale-free network is a network whose degree
distribu?on follows a power law, at least
asympto?cally.
– That is, the frac?on P(k) of nodes in the network
having k connec?ons to other nodes goes for large
values of k as
P(k) ~ x-k
– Typically k is in the range from 2 to 3.
– Many networks have been reported to be scale-free.
Barabási & Albert (BA)
Random Graph Model
• Very simple algorithm to implement
– start with an ini?al set of m0 fully connected nodes
• e.g. m0 = 3
– now add new ver?ces one by one, each one with exactly m
edges
– each new edge connects to an exis?ng vertex in
propor?on to the number of edges that vertex already has
→ preferen+al aCachment
Proper?es of a BA graph
• The degree distribu?on is scale free with exponent
k=3 P(k) = 2 m2/k3
• The graph is connected
– Every new vertex is born with a link or several links It
then connects to m ‘older’ ver?ces
– Probability pi of connec?ng to node i:
• ki is the degree of node i
• The older get richer
– Nodes accumulate links as ?me goes on, which gives
older nodes an advantage since newer nodes are going
to aMach preferen?ally – and older nodes have a higher
degree to tempt them with than some new kid on the
block
Common Tasks
• Measuring “importance”
– Centrality, pres?ge
• Diffusion modeling
– Epidemiological
• Clustering
– Clustering coefficients
• Structure analysis
– Subgraph isomorphisms, etc.
• Visualiza?on/Privacy/etc.
Centrality Measures
• Degree centrality
– Edges per node (the more, the more important
the node)
• Closeness centrality
– How close the node is to every other node
• Betweenness centrality
– How many shortest paths go through the edge
node (communica?on metaphor)
Common Tasks
• Measuring “importance”
– Centrality, pres?ge (incoming links)
• Diffusion modeling
– Epidemiological
• Clustering
– Clustering coefficients
• Structure analysis
– Subgraph Isomorphisms, etc.
• Visualiza?on/Privacy/etc.
Epidemiological
• Viruses
– Biological, computa?onal
– STDs, needle sharing, etc.
– Mark Handcock at UW
• Blog networks
– Applying SIR models (Info Diffusion Through Blogspace, Gruhl et
al.)
• Induce transmission graph, cascade models, simula?on
– Link predic?on (Tracking Informa?on Epidemics in Blogspace,
Adar et al.)
• Find repeated “likely” infec?ons
– Outbreak detec?on (Cost-effec?ve Outbreak Detec?on in
Networks, Leskovec et al.)
• Submodularity
Common Tasks
• Measuring “importance”
– Centrality, pres?ge (incoming links)
• Diffusion modeling
– Epidemiological
• Clustering
– Clustering coefficients
• Structure analysis
– Subgraph Isomorphisms, etc.
• Visualiza?on/Privacy/etc.
Domingo
Carlos
Alejandro
Eduardo
Frank
Hal
Karl
Bob
Ike
Gill
Lanny
Mike
John
Xavier
Utrecht
Norm
Russ
Quint
Wendle
Ozzie
Ted
Sam
Vern
Paul
Global Clustering Coefficient
• The global clustering coefficient C is defined as:
• In this formula, a connected triplet is defined to
be a connected subgraph consis?ng of three
ver?ces and two edges. Thus, each triangle
forms three connected triplets, explaining the
factor of three in the formula.
Local Clustering Coefficient
• The local clustering coefficient of a vertex (node) in
a graph quan?fies how close its neighbors are to
being a clique (i.e., complete graph).
• The number of possible connec?ons for the
neighbors of a node i of degree ki is, of course,
ki(ki-1)/2.
• The local clustering coefficient Ci of node i is
defined as:
• We will discuss later how to compute these values.
Common Tasks
• Measuring “importance”
– Centrality, pres?ge (incoming links)
• Diffusion modeling
– Epidemiological
• Clustering
– Blockmodeling, Girvan-Newman
• Structure analysis
– Subgraph Isomorphisms, etc.
• Visualiza?on/Privacy/etc.
Common Tasks
• Measuring “importance”
– Centrality, pres?ge (incoming links)
• Diffusion modeling
– Epidemiological
• Clustering
– Clustering coefficients
• Structure analysis
– Mo?fs, Isomorphisms, etc.
• Visualiza?on/Privacy/etc.
Privacy
• Emerging interest in anonymizing networks
– Lars Backstrom (WWW’07) demonstrated one of
the first aMacks
• How to remove labels while preserving graph
proper?es?
– While ensuring that labels cannot be reapplied