CSDF
CSDF
Abstract put stream, a firing function f may consume just one token
and produce one output token:
We compare synchronous dataflow (SDF) and cyclo-static
dataflow (CSDF), which are each special cases of a model f([x1 ; x2 ; x3 : : : ]) = f(x1 )
of computation we call dataflow process networks. In SDF,
To produce an infinite output stream, the actor must be fired
actors have static firing rules: they consume and produce
repeatedly. A processes formed from repeated firings of a
a fixed number of data tokens in each firing. This model is
dataflow actor is called a dataflow process [7]. The higher-
well suited to multirate signal processing applications and
order function map converts an actor firing function into a
lends itself to efficient, static scheduling, avoiding the run-
process:
time scheduling overhead incurred by general implementa-
tions of process networks. In CSDF, which is a generaliza- map(f)[x1 ; x2 ; x3 : : : ] = [f(x1 ); f(x2 ); f(x3 ) : : : ]
tion of SDF, actors have cyclicly changing firing rules. In
some situations, the added generality of CSDF can unnec- A higher-order function takes a function as an argument and
essarily complicate scheduling. We show how higher-order returns another function. When the function returned by
functions can be used to transform a CSDF graph into a map(f) is applied to the input stream [x1 ; x2 ; x3 : : : ], the re-
SDF graph, simplifying the scheduling problem. In other sult is a stream in which the firing function f is applied point-
situations, CSDF has a genuine advantage over SDF: sim- wise to each element of the input stream. The map function
pler precedence constraints. We show how this makes it pos- can also be described recursively using the stream-building
sible to eliminate unnecessary computations and expose ad- function cons, which inserts an element at the head of a
ditional parallelism. We use digital sample rate conversion stream:
as an example to illustrate these advantages of CSDF.
map(f)[x1 ; x2 ; x3 : : : ] = cons(f(x1 ); map(f)[x2 ; x3 : : : ])
1
1 [1,0]
1
C
2
[0,1]
C
[1]
C H
G F
(a) (b) (c)
h(x) = f(g(x)) () h = f g
Firing rules define the consumption of data from input
streams when a dataflow process is constructed with map. Dataflow actors can be composed in a similar manner, but
For example, the SDF form of the commutator actor, shown it is necessary to define a firing of the new composite actor.
in figure 1(a), has one firing rule: f([x]; [y]) = [x; y]. It con- Assuming that the actors f and g shown in figure 2 each con-
sumes a single token from each of two input streams, and sume and produce a single token, then a natural definition
produces a two-element sequence on the output. Both input for one firing of the composite actor h would be a firing of
streams must have at least one token available before f can g followed by a firing of f. The graph in this figure is “well-
fire. ordered” because there is only one topological sort — one
The CSDF form of the commutator, shown in figure 1(b), natural execution order. The graph in figure 3, however, is
has two firing rules: f1 ([x]; [ ]) = [x] and f2 ([ ]; [y]) = [y]. In not well-ordered because once actor A has fired, both actors
the first firing it consumes a single token x from one input B and C are enabled and could fire in any order or even in
stream and copies it to the output. In the second firing it
copies a token y from the other input to the output. This fir-
ing sequence then repeats cyclicly. In general, CSDF actors
have one rule for each firing of a cyclicly repeated sequence.
An internal state variable could serve as an index to en-
force the proper firing sequence of a CSDF actor. If instead
B
we follow a purely functional dataflow model in which ac-
tors are not allowed to have internal state, we must mod-
ify the firing rules so that the sequence index is shown ex-
plicitly as a function argument. The modified actor, shown
A D
in figure 1(c), has the firing rules f([1]; [x]; [ ]) = ([2]; [x])
and f([2]; [ ]; [y]) = ([1]; [y]). Each firing rule is enabled only
when the proper value is available on the index stream, and
C
produces the appropriate index value to enable the next fir-
ing in the sequence.
The self-loop used to keep track of the sequence index is
a form of state feedback. SDF actors are a trivial example of
such state-dependent firing rules — there is only one state, Figure 3: A graph that is not well-ordered.
2
1 1
B M 1 M 1 M 1
1
1 1 A B C D
A 1
1
C Figure 5: A multirate SDF graph with an exponential
1 1
number of actor rings in a complete cycle.
(a)
1 [0,1] negative element for the actor that consumes tokens. All the
AB AB other elements in the row are zero.
1 1 [1,0] [0,1] Figure 5 shows an example of a multirate SDF graph.
[1,0]
1 1 The topology matrix for this graph is:
1 2 3
C C M 1 0 0
Γ=4 0 0 5
1 1 1 1
M 1
(b) (c) 0 0 M 1
For the system to be balanced, a non-trivial positive repeti-
Figure 4: Deadlock introduced by imposing the SDF tion vector ~r must be found that satisfies the balance equa-
model on a composite data ow actor. tions:
Γ~r = ~0
parallel.
If we require that tokens be available on all inputs before where each element r j of the repetition vector specifies the
execution begins, then a composite actor follows the SDF number of firings of the jth SDF actor, and~0 is the zero vec-
model. This gives the greatest possible flexibility in imple- tor. In this example, the minimal integer solution for the bal-
menting a composite actor’s internal schedule because data ance equations is:
is available for any sequential or parallel schedule. How- T
ever, imposing the SDF model on a composite actor leaves ~ r= 1 M M2 M3
the least flexibility for the rest of the system, which must in- When each actor is fired the number of times specified by~r,
teract with that actor. All input tokens must be available si- the total number of tokens produced on each arc is equal to
multaneously even if the tokens are actually consumed se- the total number of tokens consumed. We define this to be a
quentially. This can introduce deadlock as illustrated in fig- complete cycle. In a complete cycle, a balanced system re-
ure 4. A new directed cycle is introduced in figure 4(b) by turns to its initial state with the same number of tokens on
combining actors A and B, and there is an insufficient num- each arc. Thus the total memory required for the buffers as-
ber of tokens initially on the arcs of this cycle for any of the sociated with the arcs is bounded. If the balance equations
actors to be enabled. Composition can also introduce dead- have a non-trivial solution and a complete cycle can be ex-
lock in other similar situations [8]. ecuted (i.e. there is no deadlock), then this firing sequence
If instead we allow composite actors to follow the CSDF can be repeated infinitely in bounded memory.
model, we can strike a balance between flexibility for the in-
ternal and external schedules. If the graph is well-ordered 3 Cyclo-static dataflow scheduling
and there is only one natural execution order for the internal
system, then the cyclo-static model describes the behavior of Unlike the scalar token consumption and production pa-
the composite actor completely — tokens are consumed and rameters Γi j for SDF, these parameters are vectors ~γi j for
produced in the same order as in the original graph. Thus no
parallelism is lost and deadlock is not introduced, as in fig-
ure 4(c).
[1,0] [1,0]
[1] [1]
2 Synchronous dataflow scheduling
D C
A SDF graph can be described by a topology matrix Γ, [0,1] [0,1]
where the element Γi j is defined as the number of tokens pro-
duced on the ith arc by the jth actor [6]. A negative value
indicates that the actor consumes tokens on that arc. There
is one row in this matrix for each arc in the graph, with one Figure 6: A CSDF system that becomes deadlocked
positive element for the actor that produces tokens and one when transformed to SDF.
3
CSDF [3]. Figure 6 shows an example of a simple CSDF
graph using a commutator and a distributor. The distributor
is the counterpart to the commutator: it distributes tokens
from its input stream to several output streams. The first in-
put token goes to the first output, the second input token goes D1 C1
to the second output, and so on. In this example, the token
production parameters are: ~γ11 =~γ12 = [1],~γ21 =~γ22 = [1; 0] D C
and~γ31 =~γ32 = [0; 1].
Let pi j = dim(~γi j ) be the length or period of the token D2 C2
production pattern for the ith arc connected to the jth actor.
If there is no connection, then pi j = 1. The jth actor fires in
a cycle with period Pj = lcm( pi j ), the least common mul-
(a) (b)
tiple of the consumption and production periods for all the
arcs connected to that actor. In our example, p11 = p12 = 1 Figure 7: Deadlock is caused by a directed cycle in
and p21 = p22 = p31 = p32 = 2. The cycle periods for the the precedence graph.
commutator and distributor in figure 6 are P1 = P2 = 2.
If we let σi j be the sum of the elements in~γi j , then the to- that can execute in parallel when there are many fewer pro-
tal number of tokens produced on an arc in a cycle of firings cessors available.
is given by: Incremental compilation heuristics have been developed
σi j to make parallel SDF scheduling tractable [8]. We would
Γi j = Pj like to simplify CSDF scheduling and take advantage of all
pi j
the scheduling techniques that already exist for SDF. To do
We can now solve the balance equations as described pre- this, we can transform a cycle of CSDF actor firings into a
viously for SDF. For our example in figure 6 the topology single SDF actor firing with a higher-order function. Instead
matrix and repetition vector are: of using the map function to form a process from an infinite
2 3 number of actor firings, we use the loop function to define a
2 2 new actor g that is equivalent to N consecutive firings of the
Γ=4 1 1 5 original actor f.
1 1
loop(f; N)[x1; x2 ; x3 : : : ] = [f(x1 ); f(x2 ); f(x3 ) : : : ; f(xN )]
T
~r= 1 1 By choosing N = Pj , we force all firings of a cycle to be
scheduled together and transform the CSDF actor into a SDF
In CSDF, however, the repetition vector ~r represents not the actor that implements a cycle of firings.
number of actor firings, but the number of cycles. The num- One pitfall of this transformation is that it may introduce
ber of actor firings is r j Pj . deadlock, as in figure 6. The repetition vector for this graph,
r = [ 1 1 ]T , specifies that there should be one cycle of
~
4 Transforming CSDF to SDF each actor, and each actor has two firings in a cycle. The
precedence relationships for this CSDF graph are shown in
The number of actor firings that must be scheduled can figure 7(a). When the firings of a cycle are combined into a
be exponential relative to the number of nodes in an SDF single firing, deadlock is caused by the introduction of a di-
graph [8]. Figure 5 is an example of such a graph. If there rected cycle in the precedence graph in figure 7(b). We can
are N nodes in the graph, then there are more than M N actor safely transform CSDF actors that are not in a directed cycle
firings that must be scheduled. This exponential explosion of the dataflow graph. However, when an actor is part of a
in the number of actor firings is only made worse by having a directed cycle, we might introduce deadlock as just demon-
cycle of Pj firings for CSDF actors. Remember that the bal- strated. In such cases, we must test the resulting CSDF
ance equations determine the number of cycles for a CSDF graph for deadlock using more sophisticated methods [1].
actor. The number of firings is the repetition count r j mul- This transformation from CSDF to SDF reduces the num-
tiplied by the cycle period Pj . If all the periods pi j for the ber of operations that must be scheduled, and allows us to
arcs leading into a node are relatively prime, then Pj can be use the many existing SDF scheduling techniques. But we
quite large. The problem with this explosion is that the par- have seen that transforming a CSDF graph into a SDF graph
allelism expressed in the dataflow graph can far exceed the can introduce deadlock. There are other situations where it
parallelism available in the target hardware. It is counter- is undesirable to perform this transformation, as we shall see
productive to expose hundreds or thousands of operations in the following examples.
4
1 1 1
1 1 3 D 1
X 1 U
A 1
1 1
Figure 8: An FIR anti-aliasing lter followed by a 1:3 3 1 1
downsampler. D 1 Y 1 V
5 Dead code elimination 1 1
5
1 1 1
X 1 U
1 1 1
1 1 1
A
3 1 1
C 2 D 1 Y 1 V
1 1 1 1
B 1
1 Z 1
B1 C2 D2 Y1
A1 X1 A1 C1 D1 X1
C1
A2 C3 D3 Z1
B1 D1 Y1 B1 C2 D2 Y1
B2 C4 D4 X2
A2 Z1 A2 C3 D3 Z1
C2
A3 C5 D5 Y2
B2 X2 B2 C4 D4 X2
B3 C6 D6 Z2
A3 D2 Y2 A3 C5 D5 Y2
C3
B3 Z2 B3 C6 D6 Z2
(a) (b)
Figure 14: Added constraints in the CSDF prece-
Figure 13: The SDF and CSDF precedence graphs dence graph when every actor is assumed to have
for polyphase ltering. internal state.
6
[1,0] [1,0,0] [1,0] [1,0,0] References
[0,1,0] 1 1 [0,1,0]
[1] G. Bilsen, M. Engels, R. Lauwereins, and J. A. Peperstraete.
M C D Cyclo-static data flow. In IEEE Int. Conf. ASSP, pages 3255–
[0,1] [0,0,1] [0,1] [0,0,1] 3258, Detroit, Michigan, May 1995.
(a) (b)
[2] J. T. Buck. Static scheduling and code generation from dy-
Figure 15: The CSDF mixer and an equivalent com- namic dataflow graphs with integer valued control signals. In
bination of the commutator and distributor. Asilomar Conf. Sig. Sys. and Comp., Pacific Grove, Califor-
nia, Oct. 1994.
not have internal state or side effects that require sequential http://ptolemy.eecs.berkeley.edu/papers/IDF Asilomar.ps.Z.
execution, this is the safest way to ensure a correct execu- [3] M. Engels, G. Bilsen, R. Lauwereins, and J. Peperstraete.
tion order. However, this hides many of the advantages of Cyclo-static dataflow: Model and implementation. In Asilo-
the simpler precedence relationships of CSDF. None of the mar Conf. Sig. Sys. and Comp., Pacific Grove, California,
optimizations we have discussed are possible without know- Oct. 1994.
ing which actors have internal state and/or side effects. In [4] G. Kahn. The semantics of a simple language for paral-
the Ptolemy system[9], our approach is to have the designer lel programming. In J. L. Rosenfeld, editor, Information
of an actor specify attributes for it. Thus the designer can as- Processing, pages 471–475, Stockholm, Aug. 1974. Interna-
sert whether or not an actor has internal state or side effects. tional Federation for Information Processing, North-Holland
Publishing Company.
Our loop transformation allows us to use existing SDF
scheduling techniques for CSDF graphs. But it also hides [5] G. Kahn and D. B. MacQueen. Coroutines and networks of
some important advantages of CSDF. Instead of developing parallel processes. In B. Gilchrist, editor, Information Pro-
schedulers that exploit the full generality of CSDF, we could cessing, pages 993–998, Toronto, Aug. 1977. International
Federation for Information Processing, North-Holland Pub-
extend existing SDF schedulers to treat certain multirate ac-
lishing Company.
tors as special cases. In fact, we need only one multirate
actor: the mixer [10], shown in figure 15(a). The mixer is [6] E. A. Lee and D. G. Messerschmitt. Static scheduling of
synchronousdata flow programs for digital signal processing.
a generalization of the distributor and commutator. It can
IEEE Trans. Comput., C-36(1):24–35, Jan. 1987.
have any number of inputs and outputs, and is functionally
equivalent to a combination of a commutator and distributor, [7] E. A. Lee and T. M. Parks. Dataflow process networks. Proc.
as shown in figure 15(b). Because commutators and distrib- IEEE, 83(5):773–799, May 1995.
http://ptolemy.eecs.berkeley.edu/papers/processNets.
utors are sufficient for building any multirate system [11],
we could use a simple dataflow model where the mixer is the [8] J. L. Pino, S. S. Bhattacharyya, and E. A. Lee. A hierarchical
only multirate actor. This would give us all the advantages multiprocessor scheduling framework for synchronous data-
of CSDF without the need to support its full generality. flow graphs. Technical Report UCB/ERL M95/36, Univer-
sity of California, Berkeley, May 1995.
http://ptolemy.eecs.berkeley.edu/papers/erl-95-36.
Acknowledgments [9] J. L. Pino, S. Ha, E. A. Lee, and J. T. Buck. Software syn-
thesis for DSP using Ptolemy. J. VLSI Sig. Proc., 9(1):7–21,
This work is part of the Ptolemy project, which is sup- Jan. 1995.
ported by the Advanced Research Projects Agency and [10] S.-I. Shih. Code generation for VSP software tool in Ptolemy.
the U.S. Air Force (under the RASSP program, contract Master’s thesis, University of California, Berkeley, May
F33615-93-C-1317), the Semiconductor Research Corpora- 1994.
tion (project 95-DC-324), the National Science Foundation [11] P. P. Vaidyanathan. Multirate Systems and Filter Banks. Pren-
(MIP-9201605), the State of California MICRO program, tice Hall, Englewood Cliffs, 1993.
and the following companies: Bellcore, Bell Northern Re-
search, Dolby Laboratories, Hitachi, Mentor Graphics, Mit-
subishi, NEC, Pacific Bell, Philips, and Rockwell. José Luis
Pino is also supported by AT&T Bell Laboratories as part of
the Cooperative Research Fellowship Program.