IdeCozmanRamos ECAI04
IdeCozmanRamos ECAI04
on Induced Width
Jaime S. Ide and Fabio G. Cozman and Fabio T. Ramos1
Abstract. We present algorithms for the generation of uniformly gested by T. Kocka (personal communication), would be to produce
distributed Bayesian networks with constraints on induced width. Bayesian networks with a large number of equivalent graphs, as this
The algorithms use ergodic Markov chains to generate samples. The is a property observed in real networks. However we would like to
introduction of constraints on induced width leads to realistic net- use properties with clear intuitive meaning, so that users of our algo-
works but requires new techniques. A tool that generates random rithms would quickly grasp the properties of generated networks.
networks is presented and applications are discussed. A quantity that characterizes the algorithmic complexity of
Bayesian networks, and is easy to explain and to understand, is the in-
duced width. Indirectly, the induced width captures how dense a net-
1 INTRODUCTION work is. Besides, it makes sense to control induced width, as we are
It is often the case that theoretical questions involving artificial in- usually interested in comparing algorithms or parameterizing results
telligence techniques are hard to answer exactly. Many such ques- with respect to the complexity of the underlying network.3 Unfortu-
tions appear in the theory of Bayesian networks; for example, How nately, the generation of random graphs with constraints on induced
does quasi-random sampling algorithms compare to pseudo-random width is significantly more involved than the generation of graphs
sampling? Significant insight into such questions could be obtained with constraints on node degree and number of edges. In this paper
by analyzing large samples of Bayesian networks. However it may we report on new algorithms that accomplish generation of graphs
be difficult to collect hundreds of “real” Bayesian networks for an with simultaneous constraints on all these quantities: induced width,
experiment, or it may be the case that an experiment must be con- node degree, and number of edges.
ducted for a specific type of Bayesian network for which few “real” Following the work of Ide and Cozman [5], we divide the genera-
examples are available. One must then randomly generate Bayesian tion of random Bayesian networks into two steps. First we generate
networks that are somehow close to “real” networks. In fact, many a random directed acyclic graph that satisfies constraints on induced
researchers have used random processes to generate networks in the width, node degree, and number of edges; then we generate proba-
past, but without guarantees of that every allowed graph is produced bility distributions for the graph. To generate the random graph, we
with the same uniform probability (for example, [14, 15]). construct ergodic Markov chains with appropriate stationary distribu-
We would like to have a method that generates Bayesian networks tions, so that successive sampling from the chains leads to the gen-
uniformly; that is, we would like to guarantee that averages taken eration of properly distributed networks. The necessary theory and
with generated networks produce unbiased estimates. We would also algorithms are presented in Sections 2 and 3.
like to have generation methods that are flexible in the sense that con- The methods presented in this paper focus on Bayesian networks,
straints on generated networks can be added with relative ease. For but they convey a general method for generation of testing examples
example, it should be possible to add a constraint on the maximum in artificial intelligence. The idea is to generate uniformly distributed
number of parents for nodes, the average number of children, or the examples using Markov chains. This strategy allows one to easily
maximum number of loops. Ad hoc methods are usually concocted add and modify constraints on the generated examples, provided that
for a particular set of constraints, and it is hard to imagine ways to a few steps are taken. The theory in Section 3 can serve as a guide
add constraints to them. for exactly what steps must be taken to guarantee appropriate results.
Finally, we would like to generate “realistic” networks, however A freely distributed program for Bayesian network generation is
hard it may be to define what is a “real” Bayesian network. A rea- presented in Section 4. In Section 4 we also discuss applications of
sonable strategy is to look for properties that are commonly used random networks.
to characterize Bayesian networks, and to allow some control over
them. This is the strategy followed by Ide and Cozman [5]: they allow 2 BASIC CONCEPTS
control over the degree of a node, thus allowing some control over
the “density” of the connections in the generated Bayesian networks. This section summarizes material from [5] and [3].
We have found that such a strategy is reasonable but not perfect. Re- A directed graph is composed of a set of nodes and a set of edges.
strictions solely on node degree and number of edges lead to “overly An edge (u; v ) goes from a node u (the parent) to a node v (the
random” edges — real networks often have their variables distributed child). A path is a sequence of nodes such that each pair of consecu-
in groups, with few edges between groups.2 Another strategy, sug- tive nodes is adjacent. A path is a cycle if it contains more than two
nodes and the first and last nodes are the same. A cycle is directed
1 Escola Politécnica, Univ. de São Paulo, São Paulo, Brazil. Email: if we can reach the same nodes while following arcs that are in the
[email protected]
2 Tomas Kocka brought this fact to our attention. 3 Carlos Brito suggested this strategy.
same direction. A directed graph is acyclic (a DAG) if it contains no s
( )
j s
which pii > 0, is equal to one (that is, G:C:D:(s pii > 0) = 1).
( )
directed cycles. A graph is connected if there exists a path between Aperiodicity is ensured if pii > 0 (pii is a self-loop probability).
every pair of nodes. A graph is singly-connected, also called a poly- A Markov chain is ergodic if there exists a vector (the stationary
(s)
tree, if there exists exactly one path between every pair of nodes; distribution) satisfying lims !1 pij = j , for all i and j ; a finite
otherwise, the graph is multiply-connected (or multi-connected for aperiodic, irreducible and positive recurrent chain is ergodic. A tran-
short). An extreme sub-graph of a polytree is a sub-graph that is
connected to the remainder of the polytree by a single path. In an P P
N pij = 1 and N pij = 1). A Markov
sition matrix is called doubly stochastic if the rows and columns sum
to one (that is, if j=1 i=1
undirected graph, the direction of the edges is ignored. An ordered chain with such a transition matrix has a uniform stationary distribu-
graph is a pair containing an undirected graph and an ordering of tion [11].
nodes. The width of a node in an ordered graph is the number of its
neighbors that precede it in the ordering. The width of an ordering is
the maximum width over all nodes. The induced width of an ordered 3 GENERATING RANDOM DAGS
graph is the width of the ordered graph obtained as follows: nodes are In this section we show how to generate random DAGs with con-
processed from last to first; when node X is processed, all its preced- straints on induced width, node degree and number of edges. After
ing nodes are connected (call these connections induced connections such a random DAG is generated, it is easy to construct a complete
and the resulting graph induced graph). An example is presented at Bayesian network by randomly generating associated probability dis-
Figure 1. The induced width of a graph is the minimal induced width tributions — if all variables in the Bayesian network are categorical,
over any ordering; the computation of induced width is an NP-hard probability distributions are produced by sampling Dirichlet distri-
problem [3], and computations are usually based on heuristics [6]. butions. More general methods can be contemplated (for example,
A Bayesian network represents a joint probability density over a
set of variables X
[10]. The density is specified through a directed
it may be interesting to generate logical nodes together with proba-
bilistic nodes) and are left for future work.
acyclic graph; every node in the graph is associated with a variable
X
Xi in , and with a conditional probability density p(Xi pa(Xi )), j To generate random DAGs with specific constraints, we construct
an ergodic Markov chain with uniform limiting distribution, such that
where pa(Xi ) denotes the parents of Xi in the graph. A Bayesian
Q
network represents a unique joint probability density [10]: p( ) =
j
i p(Xi pa(Xi )) (consequence of a Markov condition). The moral
X every state of the chain is a DAG satisfying the constraints. By run-
ning the chain for many iterations, eventually we obtain a satisfactory
DAG.
graph of a Bayesian network is obtained by connecting parents of Algorithm PMMixed produces an ergodic Markov chain with the
any variable and ignoring direction of edges. The induced width of a required properties (Figure 2). The algorithm is significantly more
Bayesian network is the induced width of its moral graph. An infer- complex than the algorithms presented by Ide and Cozman [5]. The
ence is a computation of a posterior probability density for a query added complexity comes from the constraints in induced width. Such
variable given observed variables; the complexity of inferences is a price is worth paying as the induced width is a property that charac-
directly related to the induced width of the underlying Bayesian net- terizes a Bayesian network much more accurately than node degree.
work [3]. The algorithm works as follows. We create a set of n nodes (from
We use Markov chains to generate random graphs, following [8]. 0 to n
Consider a Markov chain Xt ; t f g
0 over finite domains S and
1) and a simple network to start. The loop between lines 03
P = (pij )M
and 08 constructs the next state (next DAG) from the current state.
ij=1 to be a M x M matrix representing transition prob- Lines 05 and 08 verify whether the induced width of the current DAG
abilities, where M is the number of states and pij = P r(Xt+1 =
j
j Xt = i), for all t [11, 13]. The s-step transition probabilities is
satisfies the maximum value allowed; constraints on maximum node
(s)
given by P s = pij = P r(Xt+s = j Xt = i), independent of t.
degree and maximum number of edges must also be checked there.
j If the current DAG is a polytree, the next DAG is constructed in lines
A Markov chain is irreducible if for all i,j there exists s that satis-
(s)
04 and 05; if the current DAG is multi-connected, the next DAG is
fies pij > 0. A Markov chain is irreducible if and only if all pair constructed in lines 07 and 08. Depending on the current graph, dif-
of states intercommunicate. A Markov chain is positive recurrent if ferent operations are performed (the procedures AorR and AR cor-
every state i 2
S can be returned to in a finite number of steps; it respond to the valid operations). Note that the particular procedure
follows a that finite irreducible chain is positive recurrent. A Markov to be performed and the acceptance (or not) of the resulting DAG is
chain is aperiodic if the greatest common divisor of all those s for probabilistic, parameterized by p.
Algorithm PMMixed is essentially a mixture of procedures AorR
and AR. These procedures are used by Ide and Cozman [5] to pro-
B B F L duce respectively multi-connected graphs and polytrees with con-
F F
straints on node degree. We need both to guarantee irreducibility of
L H Markov chains when constraints on induced width are present; the
D D
procedure AR creates a needed “path” in the space of polytrees that
L L
D D is used in Theorem 3. The mixture of procedures has two other ben-
H H efits: first, it creates more complex transitions, hopefully increasing
H B the convergence of the chain; second, it eliminates a restriction on
(a) (b) node degree that was needed by Ide and Cozman [5].
B F The PMMixed algorithm can be understood as a sequence of prob-
(c) (d)
abilistic transitions that follow the scheme in Figure 3.
Figure 1. a) Network, b) moral graph, c) induced graph for ordering We now establish ergodicity of Algorithm PMMixed.
F; L; D; H; B , and d) induced graph for ordering L; H; D; B; F . Dashed
lines represent induced connections. Theorem 1 The Markov chain generated by Algorithm PMMixed is
aperiodic.
Algorithm PMMixed: Generating DAGs with induced Proof. If we have symmetric transition probabilities between two
width control neighbor states, its rows and columns sum one, because the self-
Input: Number of nodes (n), number of iterations (N ), max- loop probabilities are complementary to all other probabilities. Pro-
imum induced width, and possibly constraints on node degree cedure AorR is clearly symmetric; procedure AR is also symmet-
and number of nodes. ric [5]. We just have to check that transitions between polytrees and
Output: A connected DAG with n nodes. multi-connected graphs are symmetric; this is true because transi-
01. Create a network with n nodes, where all nodes have just tions from polytree to multi-connected are accepted with probability
one parent, except the first node that does not have any parent; p, and multi-connected to polytree transitions are also accepted with
02. Repeat N times: the same probability. QED
03. If current graph is a polytree:
04. With probability p, call Procedure AorR; with We need the following lemma to prove Theorem 3.
probability (1 p), call Procedure AR. Lemma 1 After removal of an arc from a multi-connected DAG, its
05. If the resulting graph satisfies imposed induced width does not increase.
constraints, accept the graph;
otherwise, keep previous graph; Proof. When we remove an arc, the moral graph stays the same or
06. else (graph is multi-connected): contains less arcs; by just keeping the same ordering, the induced
07. Call Procedure AorR. width cannot increase. QED
08. If the resulting graph is a polytree and satisfies
imposed constraints, accept with probabil- Theorem 3 The Markov chain generated by Algorithm PMMixed is
ity p; else accept if it satisfies imposed irreducible.
constraints; otherwise keep previous graph.
09. Return current graph after N iterations. Proof. Suppose that we have a multi-connected DAG with n nodes; if
we prove that from this graph we can reach a simple sorted tree (Fig-
Procedure AR: Add and Remove ure 4 (c)), the opposite transformation is also true, because of the
01. Generate uniformly a pair of distinct nodes i; j ; symmetry of our transition matrix — and therefore we could reach
02. If the arc (i; j ) exists in the current graph, keep the same any state from any other (during these transitions, graphs must re-
state; else main acyclic, connected and must satisfy imposed constraints). So,
03. Invert the arc with probability 1/2 to (j; i), and then we start by finding a loop cutset and removing enough arcs to obtain
04. Find the predecessor node k in the path between i and j , a polytree from the multi-connected DAG [10]. The induced width
remove the arc between k and j , and add an arc (i; j ) or arc does not increase during removal operations by Lemma 1. From a
(j; i) depending on the result of line 03. polytree we can move to a simple polytree (Figure 4 (b)) in a recur-
Procedure AorR: Add or Remove sive way. For all extreme sub-graphs of our polytree, for each pair
01. Generate uniformly a pair of distinct nodes i; j ; of extreme sub-graphs (call them branches), it is possible to “cut” a
02. If the arc (i; j ) exists in the current graph, delete the arc, branch and add it in the other branch, by the procedure AR, with-
provided that the underlying graph remains connected; else out ever increasing the induced width. Doing this we get a unique
03. Add the arc if the underlying graph remains acyclic, other- branch. If we have more than two branches connected to a node, we
wise keep same state. repeat this process by pairs; we do this recursively until get a simple
polytree. Now that we have a simple polytree, we get a simple tree
(Figure 4 (a)) just inverting arcs to the same direction, without ever
Figure 2. Algorithm for generating DAGs, mixing operations AR and getting an induced width greater than two. The last step is to get a
AorR.
simple sorted tree (Figure 4 (c)) from the simple tree. The idea here
is illustrated in Figure 5. We want to sort labelled nodes from 1 to n.
Start removing arc (n; k) and adding arc (l; i) (step 1 to 2). Remove
arc (j; n) and add arc (n 1; n) (step 2 and 3). Note that in this con-
p
AorR multiconnected
polytree
1-p figuration, the induced width is one. Now, remove arc (n 1; o) and
AR polytree add arc (j; k) (step 3 and 4). Repeat steps 2 and 4 for all nodes. So,
If
p
accept polytree
from any multi-connected DAG it is possible to reach a simple sorted
polytree
tree. The opposite path is clearly analogous, so we can go from any
1-p DAG to any other DAG, and the chain is irreducible. Note that con-
reject
multiconnected AorR If
straints on node degree and maximum number of edges can be dealt
multiconnected multiconnected with within the same processes. QED
By the previous theorems we obtain:
Figure 3. Structure of PMMixed.
Theorem 4 The Markov chain generated by Algorithm PMMixed is
ergodic and its unique stationary converges to a uniform distribution.
Theorem 2 The transition matrix defined by the Algorithm PM- Figure 4. Simple trees used in our proofs: (a) Simple tree, (b) Simple
Mixed is doubly stochastic. polytree, (c) Simple sorted tree.
p
AorR multiconnected
step 1 i j n k l
polytree
q
AR polytree
If p/(p+q)
n accept polytree
polytree
step 3 k n-1 o j p+q reject
AorR If q/(p+q)
multiconnected multiconnected
multiconnected
Figure 5. Basic moves to obtain a simple sorted tree. Figure 7. Structure of PMMixed with procedure J.
The algorithm PMMixed can be implemented quite efficiently, ex- line 04 and after line 07 in the algorithm PMMixed. The complete al-
cept for the computation of induced width — finding this value is gorithm can be understood as a sequence of probabilistic transitions
a NP-hard problem with no easy solution. There are heuristics for that follow the scheme in Figure 7. All previous theorems can be
computing induced width; some of which have been found to be of easily extended to this new situation; the only one that must be sub-
high quality [6]. Consequently, we must change our goal: instead of stantially modified is Theorem 3. Transitions from polytree to multi-
adopting constraints on exact induced width, we assume that the user connected DAGs are performed with probability (1 q ); transitions
specifies a maximum width given a particular heuristic. We call this
q = 1 q. The value of p and q control the
from multi-connected DAGs to polytrees are performed with proba-
width the heuristic width. Our goal then is to produce random DAGs bility 1 (p + q ) p+ q
on the space of DAGs that have constraints on heuristic width. mixing rate of the chain; we have observed remarkable insensitivity
Apparently we could still use the PMMixed algorithm here, with to these values.
the obvious change that lines 05 and 08 must check heuristic width
instead of induced width. However such a simple modification is
not sufficient: because heuristic width is usually computed with lo- 4 THE BNGenerator AND APPLICATIONS
cal operations, we cannot predict the effect of adding and removing
edges on it. That is, we cannot adapt Lemma 1 to heuristic width The algorithm PMMixed (with the modifications indicated in
in general, and then we cannot predict whether a “path” between Figure 7) can be efficiently implemented with existing ordering
DAGs can in fact be followed by the chain without violating heuris- heuristics, and the resulting DAGs are quite similar to existing
Bayesian networks. We have implemented the algorithm using a
O
tic width constraints. We must create a mechanism that would allow
the chain to transit between arbitrary DAGs regardless of the adopted (n log n) implementation of the minimum weight heuristic. The
heuristic. Our solution is to add a new type of operation, specified result is the BNGenerator package, freely distributed under the GNU
by procedure J (Figure 6) — this procedure allows “jumps” from license (at http://www.pmr.poli.usp.br/ltd/Software/BNGenerator).
arbitrary multi-connected DAGs to polytrees. We also assume that The software uses the facilities in the JavaBayes system, in-
any adopted heuristic is such that, if the DAG is a polytree, then the cluding the efficient implementation of ordering heuristics
heuristic width is equal to the induced width. Even if a given heuris- (http://www.cs.cmu.edu/˜javabayes). The BNGenerator accepts
tic does not satisfy this property, the heuristic can be easily modi- specification of number of nodes, maximum node degree, maximum
fied to do so: test whether the DAG is a polytree and, if so, return number of edges, and maximum heuristic width (for minimum
the induced width of the polytree (the maximum number of parents weight heuristic, but other heuristics can be added). The software
amongst all nodes). also performs uniformity tests using a 2 test. Such tests can be
Procedure J must be called with probability (1 p q ) both after performed only for small number of nodes (as the number of possible
DAGs grows extremely quickly [12]), but they allowed us to test the
Procedure J: Sequence of AorR
01. If the current graph is polytree:
02. Generate uniformly a pair of distinct nodes i; j ;
03. If arc (i; j ) does not exist in current graph,
add the arc; otherwise, keep the same state.
04. If the current graph is multi-connected:
05. Generate uniformly a pair of distinct nodes i; j .
06. If arc (i; j ) exists in current graph, remove the arc;
otherwise, keep the same state.
07. If the new graph satisfies imposed constraints, accept the
graph; otherwise, keep previous graph.