0% found this document useful (0 votes)

14 views126 pages

Graph and Patterns

The document outlines a lecture on graph theory and its applications in data science, focusing on the motivations for studying graphs, various graph concepts, and network generation. It discusses real-world networks, properties of graphs, and models for complex networks, including power-law distributions and the concept of 'small world' phenomena. Additionally, it highlights the importance of understanding graph patterns and their implications in social behavior and information propagation.

Uploaded by

bocerin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views126 pages

Graph and Patterns

Uploaded by

bocerin283

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 126

Algorithm Foundations of Data Science

Lecture 3: Graph and Patterns

MING GAO

DaSE@ECNU
(for course related communications)
[email protected]

Mar. 28, 2018

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 2 / 47
Graph Motivations

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 3 / 47
Graph Motivations

Graphs - why should we care?

Networks in real world

“YahooWeb graph”: 1B vertices(Web sites), 6B edges (http
links)
Facebook, Twitter, etc: more than 1B users
Food Web: all biologies, food chain
Power-grid: vertices (plants or consumers), edges (power lines)
Airline route: vertices (airports), edges (flights)
Adoption: users purchase products, adopt services, etc.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 4 / 47
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Graph Motivations

Motivation questions
Questions
What do real graphs look like?
What properties of vertices, edges are important to model?
What local and global properties are important to measure?
Are graphs helpful to understand the real world?
Social influence
Recommendation
Information propagation
Human behaviors
Is a sub-graph “normal” (Water army, fraud detection, spam
filtering, etc)?
How to generate realistic graphs?
How to get a “good” sample of a network?
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature

2001.
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature

2001.
Regular network: each node has exactly the same
number of edges.
Graph Motivations

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature

Models for complex networks

Steven H. S. proposes the model for complex networks in Nature

2001.
Regular network: each node has exactly the same
number of edges.
Random network: it is obtained by starting with
a set of n isolated vertices and adding successive
edges between them at random.
Scale-free network: it grows via attaching new
nodes to previously existing nodes randomly,
while the probability is proportional to the degree
of the target node, i.e., richly connected nodes
tend to get richer, leading to the formation of
hubs and a skewed degree distribution with a
heavy tail.(Matthew Effect or Pareto’s Law)

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 6 / 47
Graph Motivations

Are real graphs random?

Graph Motivations

Are real graphs random?

Looks random - right?

How does the Internet look like? Any rules?
Graph Motivations

Are real graphs random?

Looks random - right?

How does the Internet look like? Any rules?

Diameter: would you like to guess?

In- and outdegree distributions: if average degree is 2, what
is the most probable degree?
Other (surprising) patterns?

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 7 / 47
Graph Patterns

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 8 / 47
Graph Patterns

Power-law I
Graph Patterns

Power-law I

Internet topology

Out-degree distribution is plotted in log-log scale.

It forms a line with a slope ∼ −2.15
freq. = deg .−2.15

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 9 / 47
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

principle, there are many settings with power law (Zipf’s law).
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Graph Patterns

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

What is the power-law?

Due to Matthew effect, Pareto’s law, “rich-get-richer”, or the 80/20

principle, there are many settings with power law (Zipf’s law).
80% of Italy’s land owned by 20% of the population.
Richest 20% obtain 82.70% income.
Bible: rank VS. frequency (log-log)
Web: hit count VS. volume
File: count VS. size
Publication: citation VS. count
Business
80% of a company’s profits come from 20% of customers.
80% of a company’s complaints come from 20% of customers.
80% of a company’s profits come from 20% of the time staff
spent
80% of a company’s sales are made by 20% of sales staff

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 10 / 47
Graph Patterns

Power-law II
Graph Patterns

Power-law II

Rank of out-degrees

Vertices are ranked in decreasing out-degree order, and

plotted in log-log scale.
It forms a line with a slope ∼ −0.74
deg . = rank −0.74

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 11 / 47
Graph Patterns

Power-law III
Graph Patterns

Power-law III

Rank of eigenvalues

Eigenvalues of adjacency matrix (top 20) are ranked in

decreasing order, and plotted in log-log scale.
It forms a line with a slope ∼ −0.48
eigen. = rank −0.48

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 12 / 47
Graph Patterns

Power-law IV
Graph Patterns

Power-law IV

Hop plot

P many neighbors within 1, 2, · · · , h hops?

How
( hi=1 avg .i )
Pairs of vertices are plotted in log-log scale. It forms a
line with a slope ∼ 2.83
pairs. = hop 2.83

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 13 / 47
Graph Patterns

Power-law V
Graph Patterns

Power-law V

Counting of triangles

X-axis: # of triangles a vertex participates in

Y-axis: count of such vertices
In log-log scale, the plot is almost linear.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 14 / 47
Graph Patterns

Triangle law
How to count # triangles?

Naive algorithm: 3-way join (O(n3 )).

# triangles = 16 ni=1 λ3i . Why?
P

Because of skewness, we only need the top few eigenvalues via

using Lanczos algorithm.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 15 / 47
Graph Patterns

Erdös number

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 16 / 47
Graph Patterns

Erdös number

Small world - six degrees of separation

The world looks “small” when you think of how short a path of friends
it takes to get from you to almost anyone else. Stanley Milgram and
his colleagues in the 1960s did an experiment.
296 randomly chosen “starters” asked to forward a letter to a
“target” person, a stockbroker in Boston’s suburb.
The six degrees of separation was also found by Jure Leskovec
on Miscrosoft Instant Message.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 16 / 47
Graph Patterns

Shrinking diameter

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 17 / 47
Graph Patterns

Shrinking diameter

Citation or patents networks

For citation network, they collected citations among Physics papers.

11 years data
29,555 papers
352,807 citations
For each month, create a graph of all citations up to the
month.
The diameters are plotted in the figures.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 17 / 47
Graph Patterns

Temporal evolution of graphs

Question

Let N(t) and E (t) be # nodes and # edges at time t, respectively.

Suppose that N(t + 1) = 2N(t), what is your guess for E (t + 1)?
Graph Patterns

Temporal evolution of graphs

Question

Let N(t) and E (t) be # nodes and # edges at time t, respectively.

Suppose that N(t + 1) = 2N(t), what is your guess for E (t + 1)?
It is over-doubled, but obeying: E (t) ∼ N(t)α for all t, where
1 < α < 2.
For tree (clique), α = 1 (α = 2).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 18 / 47
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates

with relative neocortex volume.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates

with relative neocortex volume.
Our relationships form a hierarchically inclusive series of circles
of increasing size but decreasing intensity.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates

with relative neocortex volume.
Our relationships form a hierarchically inclusive series of circles
of increasing size but decreasing intensity.
150 is the limitation on reciprocated relationships.
Graph Patterns

Dunbar’s number
Why primates have unusually big brains?

Social group size (and a lot of social behaviour as wel) correlates

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 19 / 47
Graph Concepts Graph types

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 20 / 47
Graph Concepts Graph types

Graph types
Undirected graph

A undirected graph on 4 vertices

Degree: # edges connected to the
vertex
Degree 0 vertex: isolated vertex

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 21 / 47
Graph Concepts Graph types

Graph types
Undirected graph

A undirected graph on 4 vertices

Degree: # edges connected to the
vertex
Degree 0 vertex: isolated vertex

Directed graph

A directed graph on 4 vertices

In-degree: # incoming edges to the
vertex
Out-degree: # outgoing edges to the
vertex
Degree: in-degree + outdegree

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 21 / 47
Graph Concepts Graph types

Graph types cont.

Signed graph

A signed graph on 3 vertices

Positive-degree: # edges associated
with positive labels
Negative-degree: # edges associated
with negative labels

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 22 / 47
Graph Concepts Graph types

Graph types cont.

Signed graph

A signed graph on 3 vertices

Positive-degree: # edges associated
with positive labels
Negative-degree: # edges associated
with negative labels

Bipartite graph

Users interact on social platforms

Reply network
Retweet network
Adoption network

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 22 / 47
Graph Concepts Properties

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 23 / 47
Graph Concepts Properties

Paths

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Paths
Path
Path is a sequence of nodes with the property
that each consecutive pair in the sequence is
connected by an edge
Simple path does not repeat nodes.
The length of path is the number of
nodes in the path

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Cycle

Cycle is a path with at least three edges, in

which the first and last nodes are the same.
Every edge in the 1970 Arpanet belongs to a
cycle, and this was by design. Why?

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 24 / 47
Graph Concepts Properties

Connectivity

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 25 / 47
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes

s.t.:
Every node in the subset has a path to
every other; and
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes

s.t.:
Every node in the subset has a path to
every other; and
The subset is not part of some larger set
with the property that every node can
reach every other.
Graph Concepts Properties

Connectivity

Connected component

A connected component is a subset of nodes

s.t.:
Every node in the subset has a path to
every other; and
The subset is not part of some larger set
with the property that every node can
reach every other.
A graph is connected if for every pair of nodes,
there is a path between them, i.e., the whole
graph is a connected component.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 25 / 47
Graph Concepts Properties

Strongly connected component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 26 / 47
Graph Concepts Properties

Strongly connected component

A directed graph is strongly connected if there

is a path from every node to every other node.

Edges of the path must follow the

forward direction.
Graph Concepts Properties

Strongly connected component

A directed graph is strongly connected if there

is a path from every node to every other node.

Edges of the path must follow the

forward direction.
A undirected graph can be treated as a
bidirectional graph. Thus connected
component in a directed graph is also a
SCC.
Graph Concepts Properties

Strongly connected component

A directed graph is strongly connected if there

is a path from every node to every other node.

Edges of the path must follow the

Strongly connected component

A directed graph is strongly connected if there

is a path from every node to every other node.

Edges of the path must follow the

forward direction.
A undirected graph can be treated as a
bidirectional graph. Thus connected
component in a directed graph is also a
SCC.
In a strongly connected component,
there are followers and followees for each
node.
SCCs can be treated as super-nodes.
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 26 / 47
Graph Concepts Properties

Giant component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 27 / 47
Graph Concepts Properties

Giant component

Giant connected component

A connected component that contains a sig-

nificant fraction of all the nodes.
When a network (e.g., friendship
network) contains a giant component, it
almost always contains only one.
Graph Concepts Properties

Giant component

Giant connected component

A connected component that contains a sig-

Giant component

Giant connected component

A connected component that contains a sig-

nificant fraction of all the nodes.
When a network (e.g., friendship
network) contains a giant component, it
almost always contains only one.
The other connected components are
very small by comparison.
The largest connected component would
break apart into three distinct
components if this node were removed
[related to robustness of network].

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 27 / 47
Graph Concepts Properties

Web giant component

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 28 / 47
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-

ponent (containing home pages of many of the
major commercial, governmental, and non-
profit organizations)
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-

ponent (containing home pages of many of the
major commercial, governmental, and non-
profit organizations)
IN: nodes that can reach the giant SCC
but cannot be reached from it, i.e.,
nodes that are “upstream” of it.
Graph Concepts Properties

Web giant component

Web graph

Web contains a giant strongly connected com-

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 28 / 47
Graph Concepts Properties

Distance and diameter

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 29 / 47
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance

The distance between two vertices in a graph
is the number of edges in a shortest path.
Diameter is the length of the “longest
shortest path” between any two vertices
of a graph.
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance

The distance between two vertices in a graph
is the number of edges in a shortest path.
Diameter is the length of the “longest
shortest path” between any two vertices
of a graph.
Erdös number is bounded by diameter of
a graph.
Graph Concepts Properties

Distance and diameter

Distance or Geodesic distance

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 29 / 47
Graph Concepts Properties

Mean Geodesic distance of undirected networks

Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between

nodes i and j.
Mean Geodesic distance includes distance to itself.
Graph Concepts Properties

Mean Geodesic distance of undirected networks

Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between

nodes i and j.
Mean Geodesic distance includes distance to itself.
Can be computed in O(mn) using breadth first search, where
m denotes # of edges.
Graph Concepts Properties

Mean Geodesic distance of undirected networks

Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between

nodes i and j.
Mean Geodesic distance includes distance to itself.
Can be computed in O(mn) using breadth first search, where
m denotes # of edges.
What happens if the network has multiple connected
components?
Graph Concepts Properties

Mean Geodesic distance of undirected networks

Definition
1 X
L= 1
dij ,
2 n(n + 1) i≥j

where n denotes # of nodes, and dij is the shortest distance between

1 X
L−1 = d −1
1
2 n(n + 1) i≥j ij
MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 30 / 47
Graph Concepts Properties

Summarization

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 31 / 47
Graph Concepts Graph Modeling

Outline

1 Graph
Motivations
Patterns

2 Graph Concepts
Graph types
Properties
Graph Modeling

3 Network Generation

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 32 / 47
Graph Concepts Graph Modeling

Adjacency matrix

Definition
Given a finite graph G = (V , E ), an adjacency matrix A is a |V |×|V |
matrix, whose elements indicate whether pairs of vertices are adjacent
or not in the graph.
The adjacency matrix is a (0,1)-matrix with zeros on its
diagonal.
If the graph is undirected, the adjacency matrix is symmetric.
Graph Concepts Graph Modeling

Adjacency matrix

The adjacency matrix A of a bipartite

graph whose two parts have r and s
vertices
can be written
in the form
0r ,r B
A= .
B 0s,s

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 33 / 47
Graph Concepts Graph Modeling

Storing a graph
Adjacency lists

An adjacency list is a collection of unordered lists used to represent

a graph G . Each list describes the set of neighbors of a vertex in the
graph.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 34 / 47
Graph Concepts Graph Modeling

Random walk of a graph

Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Graph Concepts Graph Modeling

Random walk of a graph

Markov chain
Suppose that G = (V , E ) is a graph of n vertices with vertex set V and
edge set E ⊂ V × V . Let N(x) = {y |(x, y ) ∈ E }, and degree of vertex x
denote as d(x) = |N(x)|.
Note that x is isolated vertex if N(x) = 0.
G is an undirected graph, we have (x, y ) ∈ E if (y , x) ∈ E .
1
For each x ∈ V , the transition matrix P(y |x) is d(x) if y ∈ N(x), and
P(y |x) = 0 otherwise.
Let X be a random walk on G , if G is connected then X is
irreducible.
X has period 2 if and only if G is bipartite, in which case the parts
are the cyclic classes of X .
Let D = diag (d1 , d2 , · · · , dn ) be a diagonal matrix, and P = D −1 A.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 35 / 47
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph

Definition
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph

Definition
Given a graph G , (Combinatorial) Laplacian of G :
L = D − A,  i.e.,
 dv , if u = v ;
L(u, v ) = −1, if u and v are adjacent ;
0, otherwise.

If G is an undirected graph G, and its Laplacian matrix
L with eigenvalues λ0 ≤ λ1 ≤ · · · ≤ λn−1 , then
L is singular and symmetric(existing λi = 0).
Since row sum and column sum of L is zero,
λ0 = 0 and v0 = (1, 1, · · · , 1).
The second smallest eigenvalue is called algebraic
connectivity.
Graph Concepts Graph Modeling

Combinatorial Laplacian of graph

Incidence matrix
Definition
An incidence matrix B is a |V |×|E | matrix that shows the relationship
between vertices and edges of graph G = (V , E ).
Graph Concepts Graph Modeling

Incidence matrix
Definition
An incidence matrix B is a |V |×|E | matrix that shows the relationship
between vertices and edges of graph G = (V , E ).
Each column corresponds to an edge e = (vi , vj ) (with i < j),
where the value of an entry is 1 in the row corresponding to vi ,
and entry −1 in the row corresponding to vj .
Graph Concepts Graph Modeling

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 37 / 47
Graph Concepts Graph Modeling

Normalized Laplacian of graph

Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,

 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;
√
 u v
0, otherwise.
Graph Concepts Graph Modeling

Normalized Laplacian of graph

Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,

 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;
√
 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
Graph Concepts Graph Modeling

Normalized Laplacian of graph

Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,

 1, if u = v ;
1
i.e., L(u, v ) = − d d , if u and v are adjacent ;
√
 u v
0, otherwise.
L = D −1/2 BB T D −1/2 = I − D −1/2 AD −1/2 =
D 1/2 (I − P)D −1/2 . Thus, L is positive semidefinite and
0 ≤ λ(L) ≤ 2.
L is singular and symmetric, and λ0 = 0 corresponding to
eigenvector D 1/2 v0T = D 1/2 (1, 1, · · · , 1)T .
Graph Concepts Graph Modeling

Normalized Laplacian of graph

Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,

Normalized Laplacian of graph

Definition

Given a graph  G , normailized Laplacian of G : L = D −1/2 LD −1/2 ,

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 38 / 47
Graph Concepts Graph Modeling

Properties of normalized Laplacian [WebScience 2013]

Properties

The eigenvalues of the normalized Laplacian matrix of graph G with

n vertices satisfy the following properties:
n
0 ≤ λ2 ≤ n−1 ≤ λn ≤ 2.
n
λ2 = · · · = λn = n−1 if and only if G is a clique.
λn = 2 if and only if G is a bi-clique.
G has at least i connected components if and only if λj = 0,
for j = 1, 2, · · · , i.
The mean of eigenvalues λ2 , λ3 , · · · , λn of a network G with n
n
vertices is n−1 .
The variance of eigenvalues λ2 , λ3 , · · · , λn of a network G with
1 Pn Pn Aij n
n vertices is n−1 i=1 j6=i d(vi )d(vj ) − (n−1)2 (R-energy).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 39 / 47
Network Generation

Network generation

Generators
Erdös-Renyi model
Preferential attachment
Variations + extensions
Copying model
Triad-closing
Butterfly model
Recursion - Kronecker generator

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 40 / 47
Network Generation

Random network generator: Erdös-Renyi model

Erdös-Renyi model is known as the random graph model, which gen-

erates undirected random graphs.
Parameters: N (# vertices) and p (prob. of forming an edge)
For each possible node pair, the approach generates an edge
with probability p. Thus, # edges = pN(N−1)
2 .
Degree distribution:
P(node has degree k) = N−1
k
k p (1 − p)N−1−k
Follows binomial distribution with mean (N − 1)p and variance
(N − 1)p(1 − p) (not power-law distribution).

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 41 / 47
Network Generation

Scale-free network generator

Preferential attachment model
The more connected a node is, the more likely it is to receive new links
(namely, Rich gets Richer, Matthew Effect or Paretos Law, etc.).
Price model
Barabasi Albert model

Price model for citation networks

Each new paper is generated with m citations (mean).
New papers cite previous papers with probability proportional
to their indegree (citations).
Each new paper is generated with m citations (mean).
New papers cite previous papers with probability proportional to
their indegree (citations).
Power law with exponent α = 2 + m1 [Science 1965]

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 42 / 47
Network Generation

Barabasi Albert model

Model
Network Generation

Barabasi Albert model

Model

Start with an initial network of m0 (≥ 2)

nodes, and the degree of each node ≥ 1,
otherwise it will always remain isolated.
Network Generation

Barabasi Albert model

Model

Start with an initial network of m0 (≥ 2)

Barabasi Albert model

Model

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 43 / 47
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

matrix S ∈ Rnp×mq is given by
 
u11 V u12 V ··· u1m V
O  u21 V u22 V ··· u2m V 
S =U V =
 
··· ··· ··· ··· 
un1 V un2 V ··· unm V
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

matrix S ∈ Rnp×mq is given by
 
u11 V u12 V · · · u1m V
O  u21 V u22 V · · · u2m V 
S =U V = ···

··· ··· ··· 
un1 V un2 V · · · unm V
N N N N N
A (aB + C ) = (aA) B + A C , but A B 6= B A.
N N N
(A B)(C D) = (AC ) (BD).
(A B) = A−1 B −1 and (A B)T = AT
N −1 N N N T
B
|A B| = |A| |B| and Tr (A B) = Tr (A)Tr (B) if A ∈ Rn×n and
m n
N N
B ∈ Rm×m .
Network Generation

Kronecker product of matrices

Given two matrices U ∈ Rn×m and V ∈ Rp×q , the Kronecker product

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 44 / 47
Network Generation

Kronecker model cont.

Model
Instead of a single property of the network, Kronecker model can fit
multiple properties of a network, which makes them interesting for
fitting.
Network Generation

Kronecker model cont.

Model
Instead of a single property of the network, Kronecker model can fit
multiple properties of a network, which makes them interesting for
fitting.
Deterministic Kronecker model: it begins with an initiator
graph G1 with N1 nodes, and produces successively larger
graphs G2 , · · · , Gn such that the k−th graph Gk has Nk = N1k .
Network Generation

Kronecker model cont.

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 45 / 47
Network Generation

Sources for generator

Generators
Erdös Renyi: http://ladamic.com/netlearn/NetLogo501/
ErdosRenyiDegDist.html
BRITE: http://wwwcsbuedu/brite/
INET: http://topology.eecs.umich.edu/inet
Kronecker:
[email protected]
http://www.cc.gatech.edu/dimacs10/archive/
kronecker.shtml
http://www.cc.gatech.edu/dimacs10/archive/
kronecker.shtml

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 46 / 47
Take-home msg.

Take-home messages

Graph
Motivations
Patterns
Graph aspects
Graph types
Properties
Graph modeling
Network generation
Erdös Renyi model
Barabasi Albert model
Kronecker model

MING GAO (DaSE@ECNU) Algorithm Foundations of Data Science Mar. 28, 2018 47 / 47

Karjakin Defence Sample
No ratings yet
Karjakin Defence Sample
16 pages
Network Centrality Explained
No ratings yet
Network Centrality Explained
8 pages
Blackand White Magic Excerpt
No ratings yet
Blackand White Magic Excerpt
13 pages
Maximum Flow Algorithm Solutions
No ratings yet
Maximum Flow Algorithm Solutions
7 pages
Sphinx Vol 1 - Sample
No ratings yet
Sphinx Vol 1 - Sample
20 pages
13 基于知识图谱的问答
No ratings yet
13 基于知识图谱的问答
73 pages
A Learning Problem For Entity Matching
No ratings yet
A Learning Problem For Entity Matching
19 pages
2.1,2.2-Service Models of Cloud Computing
No ratings yet
2.1,2.2-Service Models of Cloud Computing
17 pages
Master of Arts (Psychology) (MAPC) : Handbook On Project
No ratings yet
Master of Arts (Psychology) (MAPC) : Handbook On Project
50 pages
Belief, Attitude, Intention, and Behavior An Introduction To Theory and Research
No ratings yet
Belief, Attitude, Intention, and Behavior An Introduction To Theory and Research
22 pages
Implementation of Web Page Ranking Algorithms: Presented By
No ratings yet
Implementation of Web Page Ranking Algorithms: Presented By
15 pages
Web Mining and Text Mining
No ratings yet
Web Mining and Text Mining
65 pages
Social Network Analysis Unit-5
No ratings yet
Social Network Analysis Unit-5
31 pages
Design and Analysis of Algorithms: BFS, DFS, and Topological Sort
No ratings yet
Design and Analysis of Algorithms: BFS, DFS, and Topological Sort
30 pages
Key Elementsof Chess Tactics Excerpt
No ratings yet
Key Elementsof Chess Tactics Excerpt
19 pages
Evidence Review of Organisational Sustainability (November 2017)
100% (3)
Evidence Review of Organisational Sustainability (November 2017)
95 pages
9 Large Network
No ratings yet
9 Large Network
68 pages
SNA-UNIT-2 Full
No ratings yet
SNA-UNIT-2 Full
33 pages
X Internet
100% (1)
X Internet
17 pages
Complete Bundle Networks 2nd Edition Mark Newman 9780192527493 0192527495 HQ File
0% (1)
Complete Bundle Networks 2nd Edition Mark Newman 9780192527493 0192527495 HQ File
405 pages
Chap 9 Representation and Description
No ratings yet
Chap 9 Representation and Description
75 pages
Search Algorithms in AI
No ratings yet
Search Algorithms in AI
17 pages
Security and Privacy in Social Networks
No ratings yet
Security and Privacy in Social Networks
13 pages
Lecture Note 2
No ratings yet
Lecture Note 2
12 pages
Statistical Inference: Lecture 2: Transformations and Expectations
No ratings yet
Statistical Inference: Lecture 2: Transformations and Expectations
95 pages
Central It y
No ratings yet
Central It y
92 pages
Sampling
No ratings yet
Sampling
100 pages
Social Information Filtering
No ratings yet
Social Information Filtering
25 pages
分布式数据流
No ratings yet
分布式数据流
64 pages
Module - 4
No ratings yet
Module - 4
26 pages
Challenges & Opportunities in Graph Processing at Alibaba, 钱正平
No ratings yet
Challenges & Opportunities in Graph Processing at Alibaba, 钱正平
49 pages
MEC562 Midterm Examination Questionnaire - Final
No ratings yet
MEC562 Midterm Examination Questionnaire - Final
3 pages
INDE6372 Lec12 Network
No ratings yet
INDE6372 Lec12 Network
138 pages
中国国际象棋：国际象棋中局妙手
No ratings yet
中国国际象棋：国际象棋中局妙手
211 pages
国际象棋入门与提高
No ratings yet
国际象棋入门与提高
253 pages
Common Friends Problem
No ratings yet
Common Friends Problem
42 pages
07 Clustering
No ratings yet
07 Clustering
44 pages
Social Network Analysis Unit-2
No ratings yet
Social Network Analysis Unit-2
24 pages
Vocal Technique For Singers Workshop
No ratings yet
Vocal Technique For Singers Workshop
55 pages
Redis源代码分析
No ratings yet
Redis源代码分析
32 pages
64格导游大师 - 国际象棋实战教科书
No ratings yet
64格导游大师 - 国际象棋实战教科书
316 pages
Sliding Window Topk
No ratings yet
Sliding Window Topk
30 pages
Social Network Data Mining Guide
No ratings yet
Social Network Data Mining Guide
28 pages
Math Project: Karnataka Law Society's Gogte Institute of Technology, Belgaum
No ratings yet
Math Project: Karnataka Law Society's Gogte Institute of Technology, Belgaum
13 pages
Life Hacks Sample
No ratings yet
Life Hacks Sample
30 pages
MOD 3 Homophily, Influence
No ratings yet
MOD 3 Homophily, Influence
39 pages
Copyright Essay
No ratings yet
Copyright Essay
3 pages
Finding Top-K Shortest Simple Paths With Diversity
No ratings yet
Finding Top-K Shortest Simple Paths With Diversity
26 pages
SNA-UNIT-1 Full
No ratings yet
SNA-UNIT-1 Full
84 pages
TSClu Win
No ratings yet
TSClu Win
24 pages
GT-100 System Update Procedure
No ratings yet
GT-100 System Update Procedure
4 pages
Play The Barry Attack: Andrew Martin
No ratings yet
Play The Barry Attack: Andrew Martin
27 pages
9152
No ratings yet
9152
25 pages
Bogoljubov Vol1 Sample
No ratings yet
Bogoljubov Vol1 Sample
25 pages
Unit - III
No ratings yet
Unit - III
34 pages
Data Science Ethics - Lecture 3
No ratings yet
Data Science Ethics - Lecture 3
79 pages
Supply Chain Management: No Reproduction or Distribution Without The Prior Written Consent of Mcgraw-Hill Education
No ratings yet
Supply Chain Management: No Reproduction or Distribution Without The Prior Written Consent of Mcgraw-Hill Education
40 pages
Key Elementsof Chess Strategy Excerpt
No ratings yet
Key Elementsof Chess Strategy Excerpt
17 pages
Waigani City Centre - Dev Control Policy
No ratings yet
Waigani City Centre - Dev Control Policy
36 pages
Social Network Visualization Guide
No ratings yet
Social Network Visualization Guide
20 pages
Human Aspects of Software Engineering
100% (1)
Human Aspects of Software Engineering
14 pages
Vldb2008 Ps 4up
No ratings yet
Vldb2008 Ps 4up
16 pages
Tutorial 26 Sarma Non-Vertical Slices
No ratings yet
Tutorial 26 Sarma Non-Vertical Slices
6 pages
Microsoft Impact Summary 2022
No ratings yet
Microsoft Impact Summary 2022
26 pages
Winawer Sample
No ratings yet
Winawer Sample
17 pages
Data Science Ethics - Lecture 10 - Ethical Deployment
No ratings yet
Data Science Ethics - Lecture 10 - Ethical Deployment
60 pages
AI In: Mechanical Engineering
No ratings yet
AI In: Mechanical Engineering
12 pages
A Algorithm
No ratings yet
A Algorithm
22 pages
软件逆向工程原理与实践
No ratings yet
软件逆向工程原理与实践
162 pages
Examples and Videos of Markov Decision Processes (MDPS) and Reinforcement Learning
No ratings yet
Examples and Videos of Markov Decision Processes (MDPS) and Reinforcement Learning
36 pages
Bogoljubov Volume 2 Sample
No ratings yet
Bogoljubov Volume 2 Sample
15 pages
Geometric Deep Learning for Fake News
No ratings yet
Geometric Deep Learning for Fake News
15 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Network+ Lab Manual Answers
No ratings yet
Network+ Lab Manual Answers
39 pages
BCS-052 Solved Assignment 2024-25 @BCAcrackers
100% (1)
BCS-052 Solved Assignment 2024-25 @BCAcrackers
10 pages
The Role of Media in Public Relations Crisis Commu
No ratings yet
The Role of Media in Public Relations Crisis Commu
10 pages
XMPP Protocol Overview & Features
No ratings yet
XMPP Protocol Overview & Features
17 pages
2024-LS-G8-NMP Mathematics Q1 W3 D2
No ratings yet
2024-LS-G8-NMP Mathematics Q1 W3 D2
13 pages
Construction Delay Analysis Guide
No ratings yet
Construction Delay Analysis Guide
11 pages
SPOJ Solutions for Coders
No ratings yet
SPOJ Solutions for Coders
4 pages
Gerson The Relational Unconscious Psychoanalytic Quarterly
No ratings yet
Gerson The Relational Unconscious Psychoanalytic Quarterly
19 pages
Knowledge Representation
No ratings yet
Knowledge Representation
47 pages
Link Prediction: Leonid E. Zhukov
No ratings yet
Link Prediction: Leonid E. Zhukov
24 pages
Dimensionless Numbers
No ratings yet
Dimensionless Numbers
13 pages
Span of Control
No ratings yet
Span of Control
3 pages
Unit 5 Ids
No ratings yet
Unit 5 Ids
19 pages
8 图数据库系统
No ratings yet
8 图数据库系统
72 pages
Food Chains
No ratings yet
Food Chains
5 pages
The Benoni For The Tournament Player (John Nunn) (Z-Library)
No ratings yet
The Benoni For The Tournament Player (John Nunn) (Z-Library)
164 pages
The Web
No ratings yet
The Web
3 pages
COS4840 Oncology Assignment1
No ratings yet
COS4840 Oncology Assignment1
5 pages
Sna It Unit5
No ratings yet
Sna It Unit5
20 pages
Lecture 01 Properties of Sea Water PDF
No ratings yet
Lecture 01 Properties of Sea Water PDF
6 pages
Monday - Mercury, Venus and The Great Attractor - Astrology and Horoscopes by Eric Francis 261215
No ratings yet
Monday - Mercury, Venus and The Great Attractor - Astrology and Horoscopes by Eric Francis 261215
4 pages
Personal Data Sheet: Single Married Annulled Widowed Separated Others, Specify
No ratings yet
Personal Data Sheet: Single Married Annulled Widowed Separated Others, Specify
4 pages
Guru Harkrishan Public School, India Gate Holiday Homework (2019 - 20) Class 8 English
No ratings yet
Guru Harkrishan Public School, India Gate Holiday Homework (2019 - 20) Class 8 English
5 pages
CP5074 - SNA Unit III Notes
No ratings yet
CP5074 - SNA Unit III Notes
27 pages
Semantic Web SN
No ratings yet
Semantic Web SN
22 pages
AbstractAlgebra PID - Ufd
No ratings yet
AbstractAlgebra PID - Ufd
3 pages
Sna Unit V
No ratings yet
Sna Unit V
9 pages
Pharma Operations Expert CV
No ratings yet
Pharma Operations Expert CV
2 pages
Vaişeshika's Prāgabhāva in Politics
0% (1)
Vaişeshika's Prāgabhāva in Politics
5 pages
Social Network Analytics Course Overview
No ratings yet
Social Network Analytics Course Overview
35 pages
Semantic Web Applications
No ratings yet
Semantic Web Applications
24 pages
Bda - 2 Unit
No ratings yet
Bda - 2 Unit
12 pages
Social Media Analytics and Data Analysis (UNIT 3)
No ratings yet
Social Media Analytics and Data Analysis (UNIT 3)
22 pages
Earth Surface Changes Explained
No ratings yet
Earth Surface Changes Explained
31 pages
ZigBee Short Range Communication
No ratings yet
ZigBee Short Range Communication
56 pages
Bakugan's Global Journey
No ratings yet
Bakugan's Global Journey
2 pages
Social Networks: Ankur Chawla Deepak Saini Karan Dhamija Niharika Sachdeva Sajal Marwah
No ratings yet
Social Networks: Ankur Chawla Deepak Saini Karan Dhamija Niharika Sachdeva Sajal Marwah
36 pages
Unit 6 Mining Social Network Graph
No ratings yet
Unit 6 Mining Social Network Graph
9 pages
Wipro 2
No ratings yet
Wipro 2
8 pages
MA Anthropology Curriculum
No ratings yet
MA Anthropology Curriculum
1 page
Software Engineering Courses
No ratings yet
Software Engineering Courses
9 pages
2.routing Shortest Path Routing
No ratings yet
2.routing Shortest Path Routing
26 pages
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
No ratings yet
Relative Insertion of Business To Customer URL by Discover Web Information Schemas
4 pages
An Introduction To Social Network Analysis
100% (8)
An Introduction To Social Network Analysis
38 pages
What Is PDP?: Personal Development Planning The Engineering Subject Centre 1
0% (1)
What Is PDP?: Personal Development Planning The Engineering Subject Centre 1
5 pages
Big Data Course for MBA Students
No ratings yet
Big Data Course for MBA Students
27 pages