Masaryk University
Faculty of Informatics
Library for Handling
Asynchronous Events in C++
Bachelor’s Thesis
Branislav Ševc
Brno, Spring 2019
Declaration
Hereby I declare that this paper is my original authorial work, which
I have worked out on my own. All sources, references, and literature
used or excerpted during elaboration of this work are properly cited
and listed in complete reference to the due source.
Branislav Ševc
Advisor: Mgr. Jan Mrázek
i
Acknowledgements
I would like to thank my supervisor, Jan Mrázek, for his valuable guid-
ance and highly appreciated help with avoiding death traps during
the design and implementation of this project.
ii
Abstract
The subject of this work is design, implementation and documentation
of a library for the C++ programming language, which aims to sim-
plify asynchronous event-driven programming. The Reactive Blocks
Library (RBL) allows users to define program as a graph of intercon-
nected function blocks which control message flow inside the graph.
The benefits include decoupling of program’s logic and the method
of execution, with emphasis on increased readability of program’s
logical behavior through understanding of its reactive components.
This thesis focuses on explaining the programming model, then
proceeds to defend design choices and to outline the implementa-
tion layers. A brief analysis of current competing solutions and their
comparison with RBL, along with overhead benchmarks of RBL’s
abstractions over pure C++ approach is included.
iii
Keywords
C++ Programming Language, Inversion of Control, Event-driven Pro-
gramming, Asynchronous Programming, Declarative Programming,
Functional Programming
iv
Contents
Introduction 1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1 Programming Concepts 5
1.1 Imperative and Procedural Programming Paradigm . . . . . 5
1.2 Inversion of Control . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Event-driven Programming Paradigm . . . . . . . . . . . . 6
1.4 Declarative Programming Paradigm . . . . . . . . . . . . . 7
1.5 Functional Programming Paradigm . . . . . . . . . . . . . 7
2 Design 8
2.1 Operation as a Block . . . . . . . . . . . . . . . . . . . . . 8
2.2 Program as a Graph of Blocks . . . . . . . . . . . . . . . . . 9
2.3 Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4 Synchronous and Asynchronous Blocks . . . . . . . . . . . 10
2.5 Nested Graph Composition . . . . . . . . . . . . . . . . . . 13
2.6 Syntactic Sugars . . . . . . . . . . . . . . . . . . . . . . . 14
3 Implementation 15
3.1 Core Functionality . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Built-in Blocks . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3 Executor Blocks . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Algorithm Blocks . . . . . . . . . . . . . . . . . . . . . . . 31
3.5 Introspection Layer . . . . . . . . . . . . . . . . . . . . . . 34
3.6 Expression Template Layer . . . . . . . . . . . . . . . . . . 36
4 Evaluation 43
4.1 Use Case Comparison . . . . . . . . . . . . . . . . . . . . . 43
4.1.1 Boost.Asio . . . . . . . . . . . . . . . . . . . . . . 43
4.1.2 RxCpp . . . . . . . . . . . . . . . . . . . . . . . . 43
4.1.3 Intel® Threading Building Blocks . . . . . . . . 44
4.1.4 Coroutines . . . . . . . . . . . . . . . . . . . . . . 45
4.2 Performance Analysis . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Overhead Analysis . . . . . . . . . . . . . . . . . 48
4.2.2 Performance Benchmarks . . . . . . . . . . . . . 51
v
4.3 Debugging . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5 Conclusion 54
5.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A Asynchronous API Model 56
A.1 Asynchronous Operations . . . . . . . . . . . . . . . . . . 56
A.2 Boost.Asio – An Asynchronous API Example . . . . . . . . 56
B Technical Details 62
B.1 Build requirements . . . . . . . . . . . . . . . . . . . . . . 62
B.2 Third-party libraries . . . . . . . . . . . . . . . . . . . . . 62
B.3 Project Structure . . . . . . . . . . . . . . . . . . . . . . . 64
Bibliography 67
vi
Introduction
Most of today’s computer systems and their software have to react on
some form of external changes that do not have an exactly defined
time of occurrence. The practice of handling these events falls into the
field of event-driven programming (see 1.3).
Notices of external events can originate from the system’s hard-
ware (I/O devices, network, timers, etc.) or from software that exposes
its application programming interface (API) as a set of asynchronous func-
tions (see A.1). An example of such software is an operating system
and its time, peripheral I/O (e.g. networking) and task scheduling
related API1 .
Modern programming languages, including C++, provide sup-
port for asynchronous code to some extent. C++’s implementation
mainly comprises of built-in language primitives for multi-threaded
programming and associated synchronization [2, Chapter 41]. Despite
not being viewed as a conservative and lower-level language, C++ still
lacks the standard concepts and utilities (as of the C++17 standard2 ),
which would provide higher-level abstractions for more convenient
event-driven programming.
In recent years, frameworks and libraries, e.g. Reactive Exten-
sions (Rx) [3], which aim to simplify event-driven programming, have
started to appear in various higher-level languages. While a few of
such libraries, including Rx, have also been written for C++ (elabo-
rated in 4.1.2), we have decided to design and implement our own
library with increased generalization and transparency over existing
solutions. We have given it the name Reactive Blocks Library, or RBL.
Motivation
The aim of this work is to implement a C++ library to support general-
purpose event-driven programming with a higher level of abstraction
than the plain C++17 standard provides. The library should make
1. E.g. POSIX aio (7), POSIX timer_create (2) [1].
2. This should change with the arrival of C++20 and its coroutines, see 4.1.4.
1
asynchronous event-driven programming easier in terms of code size,
readability, and expressiveness.
We would like to be able to write programs based on asynchronous
event handling with an emphasis on code clarity in terms of event
consequences. Functional and declarative programming concepts can
aid us to express the behavior of an event-driven program in a cleaner
and informatively more condensed way.
The usual approach to event handling is done via IoC (see 1.2) and
its most straightforward application – callback functions, as in A.1.
To understand a program written as a set of callbacks, the user has
to thoroughly read the associated logic and scrape the information
about the structure of execution of these callbacks.
The purpose of RBL is to separate this information from the se-
mantics of individual functions or operations. That is, by segregating
the logic among function blocks, which are in return loosely connected
to form a graph of consequent operations with the desired collective
behavior.
A code with callbacks may look like this:
void callback_1 () {
...; a s y n c h r o n o u s _ o p e r a t i o n _ 2 ( callback_2 ); ...;
}
void callback_2 () {
...; a s y n c h r o n o u s _ o p e r a t i o n _ 3 ( callback_3 ); ...;
}
void callback_3 () {
...; a s y n c h r o n o u s _ o p e r a t i o n _ 2 ( callback_2 ); ...;
}
As you can also see in Listing A.1, it may not be clear what the
flow of execution is, especially in larger programs with multiple asyn-
chronous branches, chained operations, and cycles.
2
Our library should capture the dependencies of operations more
explicitly. Since the ellipses (...) could represent many statements,
the information about event consequences is more densely contained
within a syntax like so (considering the ... parts may be grouped
into operations as well):
...;
a s y n c h r o n o u s _ o p e r a t i o n _ 1 -> a s y n c h r o n o u s _ o p e r a t i o n _ 2 ;
...;
a s y n c h r o n o u s _ o p e r a t i o n _ 2 -> a s y n c h r o n o u s _ o p e r a t i o n _ 3 ;
...;
a s y n c h r o n o u s _ o p e r a t i o n _ 3 -> a s y n c h r o n o u s _ o p e r a t i o n _ 2 ;
...;
From this notation, the imagination of the dependencies is straight-
forward:
asynchronous_operation_1
asynchronous_operation_2
asynchronous_operation_3
Figure 1: Operation Consequences
Since standard exceptions that represent errors during an operation
propagate in the same direction as return values, error handling has
to be partly reworked for the conformance with the IoC paradigm.
Additional goals are therefore:
• ability to formulate handling of exceptional cases a similar way
as the processing of data;
• basic introspection and debugging support, which is typically
weakened by each abstraction;
• extensibility and modularity, i.e. having different logical parts,
or layers.
3
Thesis Structure
Chapter 1 – Programming Concepts familiarizes us with the concepts
forming the building ground of this thesis.
Chapter 2 – Design explains the design choices and the programming
paradigm that RBL introduces.
Chapter 3 – Implementation outlines RBL’s API layer by layer.
Chapter 4 – Evaluation compares RBL against the existing solutions
and analyzes the benefits and disadvantages of RBL from both conve-
nience and performance perspectives.
Appendix A – Asynchronous API Model describes the traditional
interface of an asynchronous API in C++.
Appendix B – Technical Details summarizes the structure and tech-
nologies used by the project.
4
1 Programming Concepts
To become fluent with RBL, one has to understand the concepts upon
which it stands. RBL shifts asynchronous programming from the tra-
ditional, procedural approach to the sphere of declarative and functional
programming, which is more suitable for event-driven programming.
All of this is done thanks to the Inversion of Control (IoC) principle.
RBL encompasses all these concepts in an interesting way. This section
further explains the terms and their roles in RBL.
1.1 Imperative and Procedural Programming
Paradigm
Perhaps the oldest programming paradigm is writing a program as a
list of sequentially executed commands that manipulate the state of a
program and its acting environment. The imperative programming
paradigm remains to be the most popular way to program. It is so,
because of its closeness to the hardware that performs the computa-
tions (in terms of instructions), which allows us to achieve maximum
performance. Procedural programming is a form of imperative, with
commands structured into functions with behavior defined by them.
The programmer has complete control over the program’s execu-
tion flow and potentially the entire state of the program. That person is
the only one who dictates whether and when certain actions happen.
In terms of RBL: RBL diverts from these paradigms but does not
want to abolish them. Imperative code fragments should still be used
for implementing synchronous operations that are further logically
indivisible, or if doing so is the only efficient way. It is, after all, the
only way to perform the underlying fundamental operations in C++.
RBL therefore provides minimal adapters to its interface for describing
operations as classic functions – callbacks containing imperative code.
Said interface is based on the ideas listed further.
5
1. Programming Concepts
1.2 Inversion of Control
Inversion of Control (IoC) [5] is a broader programming principle,
which, in its most general form, liberates the client system (or a pro-
grammer) from the responsibility for the program’s execution mecha-
nisms. The control is usually transferred to an execution managing
entity, also known as a framework. Under this paradigm, the client
code supplies the implementation of a program’s logic to the frame-
work as a set of functions or objects. The framework then uses this
additional behavior when it is necessary.
In terms of RBL: RBL implements the IoC model, but it does not
manage asynchronous execution by itself. To complete RBL into a fully
functioning model of asynchronous execution, we wrap the API of an
existing asynchronous framework in RBL’s structures. We have chosen
to create a proof of concept with one such API – Boost.Asio (see A.2).
IoC, in the scope of RBL, means that a function, which is normally
called from a site which inquires its return value for given arguments,
is now called the other way around. The function is invoked by the
presence of its arguments (information source) to produce its output
value, which is an information source for other functions or tasks.
1.3 Event-driven Programming Paradigm
Imperative programming alone is certainly not the best way of han-
dling events with no exact time of occurrence. A programmer has
to frequently check the state of the program and its environment to
detect changes that should initiate further actions. Event-driven pro-
gramming transfers this responsibility from the programmer to the
event handling system.
Apart from creating a specialized event-driven language, such
a system can be constructed in any higher-level language with an
API under the IoC principle – allowing the implementation of ab-
stract interfaces, or callback functions. The underlying mechanism
remains based on an internal event loop, adaptation to an existing
asynchronous API, or even manually written interrupt routines.
6
1. Programming Concepts
In terms of RBL: The execution of the user’s code is guided by events.
External events (acquired from the wrapped asynchronous API) ini-
tiate action inside a program. The behavior is then further specified
using the same principle – as a chain of event handling callbacks
generating and reacting to (now application-internal) events.
1.4 Declarative Programming Paradigm
Declarative programming paradigm is a tool for building logical struc-
tures specifying a program’s behavior, without explicitly stating the
execution steps. The semantics of execution are implicit, and the im-
plementations may vary diametrically, as long as they are producing
identical results.
In terms of RBL: RBL further utilizes the declarative approach to
describe connections between event handlers as a dependency graph.
These declarations become the essence of an RBL program. All the
necessary information about a program written with RBL is contained
within the declared connection graph – its function (vertex) types and
topology. Only once the program’s graph has been constructed, it can
be executed. The execution, however, uses a fixed set of rules.
1.5 Functional Programming Paradigm
The last kind of programming, which found its use in RBL is a form of
declarative programming – functional programming. It is based upon
building structures that represent computations. These structures are
treated as objects and can be composed as such to form more complex
computations and whole programs.
In terms of RBL: RBL does not implement the functional paradigm
in its pure sense. RBL’s graph consists of functions that are allowed to
have an internal state that persists between calls. RBL, as C++ alone,
contains only some elements of functional programming. It supports
the construction of functional-like data transformations, e.g. to map,
filter, or reduce values. The functional paradigm is visible mainly
around algorithm blocks (see 3.4).
7
2 Design
At its core, RBL takes the IoC paradigm, which is inherent for asyn-
chronous operations (i.e. functions with callback parameters), a step
further. RBL also builds all synchronous operations in the IoC way.
The motivation behind this is to make no difference between the way of
writing synchronous and asynchronous operations. This implies more
uniform code and easily interchangeable operations of distinct seman-
tics. Since the IoC concept needs to know the structure of operations,
we need to devise the computation structure and its elements.
2.1 Operation as a Block
We begin by designing a representative for each synchronous or asyn-
chronous function – an object called block. A block possesses the same
characteristics as a function – its behavior, inputs and (potentially
more than one) output. The inputs and outputs have fixed event types.
Because N-ary functions and functions with multiple independent
outputs require additional concepts (see 2.5), let us continue with the
simple case – unary function. Its body is invoked at the place where
it’s result is required:
auto return_value = function ( argument );
With IoC, the same semantics of function are hidden inside a
block. The interface is what differs:
auto block = rbl :: make_block ( function );
block ( argument );
The function can be still invoked using the same syntax, but we have
retired from the use of the return value as the immediate result (as
with asynchronous functions). The output value can only be acquired
under the IoC concept, that is, by providing a result handler, which it
should call. Not by coincidence, the handler is another block.
8
2. Design
But how do we inform block of its presence? By declaring com-
munication connections between blocks:
auto handler = rbl :: make_block ( handler_function );
connect ( block , handler );
Likewise, block can become a handler of another operation:
auto source = rbl :: make_block ( source_function );
connect ( source , block );
This model allows blocks to represent operations that are more
universal than the mathematical definition of a function. Multiple
output ports are allowed, and each output port may produce multiple
events during block’s invocation.
2.1.1 Termination
Blocks, as objects, may have a limited lifetime. Termination is the
act of block removal from the program, which can have various reasons
and significant effects (see 3.1). Considering termination policies as
part of a blocks’ semantics, we can employ the view of a block as a
transformation of input event sequences into output event sequences.
The sequences can be either infinite or finite – ended by termination.
This becomes useful in understanding algorithm blocks (see 3.4).
2.2 Program as a Graph of Blocks
Connections represent information (i.e. event) flow between oper-
ations. Each connection exists between one block, considered as a
source of the events, and the other, their recipient. Together, blocks
and connections form a directed computational graph of a program
as vertexes and edges. The graph persists through its use and may
be modified by adding or removing blocks and connections at the
program’s run-time. RBL achieves this without a central authority,
i.e. entity, that would manage the graph. Connections are part of the
block’s internal state.
RBL further introduces an enhancement – output of each block
can be connected to multiple blocks’ inputs and vice versa. Thus, one
9
2. Design
operation can launch multiple following ones, and a block may be
used as a handler of multiple events.
Our program is indeed only a declaration of vertexes and edges:
// vertexes
block_1_t block_1 ;
block_2_t block_2 ;
block_3_t block_3 ;
// edges
connect ( block_1 , block_2 );
connect ( block_1 , block_3 );
connect ( block_2 , block_3 );
For apparent reasons, RBL introduces an abstraction layer for sim-
plifying the syntax and condensing the information about the connec-
tion topology (see 2.6).
2.3 Events
Events are the information-representing objects, which flow through
established block graphs. An event can represent a value, a valueless
occurrence or an error. RBL uses a more general term for the objects
in its implementation – messages (see 3.1).
2.4 Synchronous and Asynchronous Blocks
With the rules established above, there is no syntactical difference
between the usage of synchronous and asynchronous blocks. Such
information is only contained within the type of operation, which the
block represents.
Synchronous blocks produce events only as immediate reactions to
the input events during their invocation.
Asynchronous blocks initiate asynchronous operations with minimal
blocking and the production of the output events is in the hands of
the underlying asynchronous API.
10
2. Design
This conceptual distinction is most visible when we employ the block
graph perspective. We can see that the graph consists of communica-
tion chains – directed paths in the graph.
2.4.1 Synchronous Subgraphs
Multiple connections of the same block introduce branches – di-
rected trees. A directed subgraph which is delimited by asynchronous
blocks is called synchronous subgraph. In other words, a synchronous
subgraph does not involve any asynchronous operations and it is
executed in a blocking manner to the initiating block. This should
not be an issue – there will always be a portion of code that is exe-
cuted synchronously. Quite the opposite is true; in order to minimize
the overhead of asynchronous execution scheduling, programmers
should strive after writing their programs with the least amount of
asynchronicity, while still being able to produce the desired behavior.
With multiple connections, a block sends the event to each recipient
one in a blocking iterative manner, in order of their connection. This
results in the depth-first execution. Given that execution of one branch
is not an obstacle in executing another branch in due time, this is
perfectly valid construction. If one branch should take a long time to
execute and incur unacceptable delay on others, it should be either
ordered near the end of the list of connections (registered among the
last ones) or delimited by an execution deferring asynchronous block
(i.e. executor, see 3.3). Consider the following block graph:
Figure 2.1: Synchronous Subgraph
11
2. Design
A is the initiating block, C1 is an asynchronous block. D1 and D2 there-
fore do not belong to the synchronous subgraph. Given the connec-
tions being created in the top-down order and each block producing
exactly one event for each input event, the invocation order is:
(A), B1, C1, C2, B2, C1 (again), C3, D3, D3 (again).
Note that X and Y belong to the subgraph, because synchronous com-
munication may also be present in the backward manner (see 3.1).
A synchronous subgraph may be executed by at most one event
source at a time, not to impose a thread-safety requirement upon
blocks. Implicitly parallel execution would also be harder to grasp and
use with confidence.
2.4.2 Cycles
A cycle in the block graph is a result of expressing dependency of
an operation upon its previous results.
Synchronous cycles In general, RBL does not allow synchronous
cycles – cycles consisting entirely of synchronous blocks. This would
also lead to the imposition of the reentrancy1 condition upon the
participating blocks, which would be a nuisance to implement, or
even impossible, given block’s semantics and internal state.
Instead, we can design a special synchronous block, which will
make all synchronous cycles containing this type of block valid, under
special semantics (see 3.2).
Asynchronous cycles Cycles containing asynchronous blocks are
easier to imagine, and no extra support is required from RBL. The syn-
chronous execution stops at each asynchronous block and recursion
does not occur. The only things the programmer has to be certain of
(also in the synchronous case) are:
• the termination condition, i.e. a situation, which breaks the
asynchronous loop;
1. Reentrant functions are those with well-defined recursive execution in the same
thread of execution.
12
2. Design
• multiplicative (exponential) growth of the number of events in
the loop, originating from consecutive branching and merging
of event paths.
2.5 Nested Graph Composition
A graph of blocks grows with the increased complexity of a program.
In such cases, it is desired to add more structure information – divide
the graph into subgraphs with related purpose. RBL thus allows wrap-
ping a subgraph of a program in a composite block, or vice-versa, it
allows implementing a block in terms of a nested block graph.
Such a block is called a group. It is only an RBL concept of how a
block-aggregating class should look like. In practice, it is essentially
a user-defined class, which holds its blocks as private attributes and
creates the connections at its construction. A group shall expose the
inputs and outputs of the internal subgraph as its own inputs and
outputs via member attributes/functions:
connect ( source_1 , add_numbers . input_1 );
connect ( source_2 , add_numbers . input_2 );
connect ( add_numbers . output , handler );
A group does not necessarily have to be implemented in terms of
an internal subgraph. It is a mandatory element when building blocks
with more than one input and/or output. Such blocks have in RBL
their internals hooked to the exposed I/O blocks, and the behavior
is implemented imperatively. Because input events arrive at different
times, a group with multiple inputs usually keeps an internal state
to collect the data required for producing an output. An example of
such blocks are combiners (see 3.4).
Lastly, groups may figure in declarations in a hybrid way:
connect ( source_1 , zip_events . input_1 );
connect ( source_2 , zip_events . input_2 );
connect ( zip_events , handler );
Notice, that the first connection uses the group block directly, because
we can unambiguously identify the one output it has, the same way
as handler has one input. The details are further elaborated in 3.1.
13
2. Design
2.6 Syntactic Sugars
Writing programs by declaring blocks and connection in the verbose
manner above is, in fact, worse than the existing approaches. Since
our goals are to simplify programming, RBL has to pull out a secret
weapon – expression templates.
Expression templates are a C++ idiom for implementing class tem-
plates, which represent computations at compile time [6, Chapter 27].
All the required information to building the computation is embed-
ded within an expression template’s instantiated type. The expression
classes are usually composed so that an outer expression template is
instantiated with the types of its nested expressions.
RBL uses this technique to implement expressions, which build the
underlying block graph around the provided arguments. The simplest
expression is a chain expression, constructible with operator>>, as
with the chain function.
Let us consider the following example (A, B and C being blocks):
connect (A , B );
connect (A , C );
connect (B , C );
Such structure may be also constructed like this:
A >> B >> C ;
A >> C ;
Which is equivalent to its more verbose version:
chain (A , B , C ); // same as chain ( chain (A , B ) , C );
chain (A , C );
With both forms above, we have achieved an establishment of a
minimalist declarative syntax. We are able to specify an arbitrarily
long chain of connections in one statement.
RBL provides some other expressions for general use, partly as
a proof of concept and an example. Most of these expressions also
generate additional blocks and branches in their produced graph.
Expression templates are implemented in a manner, which allows
the user to create potentially domain-specific expressions, by re-using
existing expression API of RBL. The implemented expression tem-
plates are explained more thoroughly in 3.6.
14
3 Implementation
The implementation of RBL is divided into the following logically
layered parts:
• core – low-level, object-oriented block API;
• concrete blocks – built-in, executor and algorithm blocks;
• expression template layer – syntactic sugars for block graph
declaration;
• introspection layer – an additional layer to core, for debugging
and visualization.
3.1 Core Functionality
The core layer of RBL implements its variant of IoC using block objects.
A block, however, is a broader term at the API level, as explained in
the following section.
3.1.1 Blocks
All RBL blocks are classes derived from rbl::block::block_base.
The base class defines the following internal state, which is common
for all blocks:
• uninitialized – the block has not been connected yet, it does not
have any effect on an RBL program;
• running – the block is connected and has the potential for re-
ceiving and sending events;
• terminated – the block is in the process of destruction, no more
events will be received nor produced;
• invalid – the block has been invalidated by move semantics [7,
$11.3.4.2], it does not have any effect on the RBL program.
15
3. Implementation
Publisher and Subscriber Blocks
The atomic block objects are divided into two connectible counterparts:
a publisher and a subscriber.
Publisher is a block, which is a source of events (e.g. periodic event
generator). It is derived from rbl::block::publisher_base class tem-
plate, parametrized by the type of events it produces.
Subscriber is a block, which is a recipient of events (e.g. standard
output writer). It is derived from rbl::block::subscriber_base class
template, parametrized by the type of events it accepts.
These two classes implement the connection and event sending mech-
anisms. They are abstract and further extensible via inheritance.
Connection
Connections represent potential event paths between blocks and can
be formed only between a publisher and a subscriber instances of the
same event type, at run-time. Each publisher holds a list of references
to its subscribers and vice-versa. Different matching (derived) block
types can be therefore connected to their counterpart, without any
distinction on the other side.
Termination
Termination of a block can be a result of:
• an explicit request from inside/outside the block,
• termination of all subscriber’s publishers,
• termination of all publisher’s subscribers.
The last two mentioned scenarios implement a strategy for auto-
matic block graph cleanup. When a block terminates, it notices all its
connected blocks of this event, which remove it from their connec-
tion lists. The process can continue recursively, to result in a part of a
synchronous subgraph being destructed at once.
16
3. Implementation
RBL allows termination of blocks, which are currently being exe-
cuted from the same thread (e.g. a subscriber terminates its publisher
upon receiving an event from it). Publisher and subscriber base classes
implement a mechanism to register disconnections (and connections)
without immediate effect, as to avoid iterator invalidation of the un-
derlying sequence container of block references [7, $22.3.11.5/1, 3].
Listing 3.1: Block Termination
1 if block is not currently entered
2 enter the block
3 add / remove the other block
4 if the list of block counterparts is empty
5 terminate
6 leave the block
7 else
8 stage the other block for addition / removal
The staged changes are applied after each iteration through the
list of connected blocks. The iteration has in all cases only a communi-
cation purpose (see 3.1).
Transformer Blocks
Transformer is a block that is both a subscriber and a publisher, i.e.
a class derived from both. Transformers represent operations with a
single input and a single output. A transformer may appear to be in
one state as a subscriber, and in another as a publisher. This is the case
with asynchronous transformers – executors (see 3.3).
The transformer_base<InputType, OutputType> base class rep-
resents transformers, resolving the ambiguities, that are created by the
questionable, although intentional multiple (non-virtual and indirect)
inheritance of the rbl::block::block_base class [2, Section 21.3.6].
Namely, it implements termination as termination of its subscriber and
publisher – in that order, and disambiguates between the operator()
overloads – uses those of the underlying subscriber.
17
3. Implementation
Figure 3.1: Block Class Diagram
Group Blocks
RBL does not allow inheriting from more than one subscriber or pub-
lisher classes at once. For the purpose of multi-input and multi-output
operations, it utilizes object composition. A group is each block de-
rived from the group class template, parametrized by the type of the
base block.
The base block is a type, which will be used for the implicit se-
mantics (derived from) of the group. Therefore, a group can be a
subscriber, publisher, transformer or none of those. If not provided,
the base block will be taken as block_base and its subblocks grouped
using only composition.
18
3. Implementation
Here is an example of a strictly composed group (i.e. implicitly
neither subscriber nor publisher):
connect ( input_1 , group . input_1 );
connect ( input_2 , group . input_2 );
connect ( group . output_1 , output_1 );
connect ( group . output_2 , output_2 );
If a group has only one output (publisher), we can make our class
derive from group<PublisherType> and use it as:
connect ( input_1 , group . input_1 );
connect ( input_2 , group . input_2 );
connect ( group , output );
The same method applies to a single input. A block of a single
input and a single output shall not be implemented as a group derived
from group<TransformerType>. This class only serves for introspec-
tion purposes.
A group instantiated with block_base is an exceptional case, whe-
re the base class provides necessary block semantics for the group
type, for which there is no other way to acquire.
Lifetime of Blocks
It is forbidden for an RBL block to be destructed in the running state.
A block has to be uninitialized, terminated or invalidated beforehand.
Not obeying this rule will lead to undefined behavior1 . A group shall
terminate all its inner blocks as part of its own termination process
and vice-versa.
Dynamic Blocks Blocks can be allocated dynamically, although the
meaning is more restricted in RBL. A dynamically allocated block
is a block created via rbl::dynamic or rbl::dynamic_ptr functions.
The result is a reference, or rbl::dynamic_block_ptr (RBL’s variant
of std::shared_ptr). The created block is in both cases a self-owning
block, meaning that it is not destructed at least until its terminates.
This (shared) ownership is further prolonged for the lifetime of the
last living external user-held rbl::dynamic_block_ptr instance.
1. Because of leaving invalid pointers in the block lists of the connected blocks.
19
3. Implementation
3.1.2 Messages
Communication between connected blocks works in both direc-
tions and is implemented via message sending. A block always mul-
ticasts its message to the connected blocks in the order, in which the
connections occurred.
Forward Messages A publisher can send three types of forward
messages, according to their informative purpose:
• Data message – rbl::data_message<ValueType> – transmits
data,
• Error message – rbl::error_message<ExceptionType> – trans-
mits errors,
• Termination message – rbl::termination_message – transmits
termination signals.
Backward Messages A subscriber can send a backward message of
one type to its publishers:
• Backward termination message –
rbl::backward_termination_message – transmits backward
termination signals.
Data Messages A data message holds a value of the given type. The
message may be used to transmit data only between a publisher and a
subscriber of the same data type. A data message may be valueless;
RBL provides the empty rbl::event class for this. From now on, we
will refer to rbl::data_messagerbl::event simply as events.
Error Messages An error message transmits exceptions held by
std::exception_ptr or std::shared_ptr<rbl::exception> for plat-
forms without support of standard exceptions (-fno-exceptions com-
piler switch). rbl::exception is a mandatory base class for all excep-
tions that can be transferred this way.
20
3. Implementation
Termination Messages A forward/backward termination message
holds a reference to its originating publisher/subscriber. Based on
this information, the receiving side knows which block to unregister
from its list of publishers/subscribers.
Messages are handled by overriding virtual receive member function
overloads of the publisher_base and subscriber_base classes, e.g.:
Listing 3.2: Squaring Transformer
class squarer : public transformer_base < int , int > {
void receive (
data_message < int > const & message ) override {
// send squared value
this - > send ( message . get () * message . get ());
}
void receive ( error_message const & message ) override {
// forward ( default behavior , in fact )
this - > send ( message );
}
};
The above behavior can be specified a simpler way than with in-
heritance, see 3.2.
Message Containers
Since messages cannot be often processed by blocks instantly, whether
it is because of waiting for an additional message from the same or
another input, they have to be stored as part of the blocks’ internal
state. RBL provides basic message containers in its core, which it later
uses to implement the shipped concrete general-purpose blocks.
Message containers provide a unified, first-in, first-out interface
for storing all three types of forward messages:
• insertion,
• observation – the type of the top-most message, number of
stored messages,
• top-most message destructive/non-destructive access (by mes-
sage type, by polling – callbacks),
21
3. Implementation
• top-most message discarding without reading,
• discarding the given number of future messages.
Message Slot Message slot is a storage unit that can hold at most
one message at a time. An incoming message overwrites the stored
one (if any). The class templates are:
• rbl::message_slot – includes ignoring of future messages;
• rbl::message_slot_basic – does not include ignoring of fu-
ture messages; which requires an internal counter, as a space
optimization.
Message Ring Message ring is a circular buffer with a capacity spec-
ified beforehand. The internal buffer, implemented as a sequence of
message slots, can be possibly dynamically allocated. When full, an
incoming message overwrites the oldest message in the ring. The
corresponding class templates are:
• rbl::message_ring<ValueType>,
• rbl::message_ring_static<ValueType, Capacity>.
Message Queue Message queues are dynamically sized containers,
which, in contrast to slots and rings, store their messages in separate
queue structures for each type. The reason behind this is space opti-
mization. The above containers store their messages in a std::variant
instance, the size of which is at least the size of the biggest type it can
hold. Since data messages can have arbitrary size, we did not want
to incur the same cost on storing of the small, fixed-size error and
termination messages, however less frequent they may be.
A message queue class has to preserve the relative order of mes-
sages, as they were inserted. This is done by assigning an internal
identification number (ID) to each message. The ID is later used as
a deciding factor in choosing from which internal queue the next
message should be accessed.
22
3. Implementation
The queue variants are:
• rbl::message_queue<ValueType> – for data, error and termi-
nation messages, not thread-safe;
• rbl::message_queue_dataless – for error and termination
messages, not thread-safe;
• rbl::message_queue<ValueType> – thread-safe variant
of rbl::message_queue<ValueType>;
• rbl::message_queue_dataless_locked – thread-safe variant
of rbl::message_queue_dataless.
3.2 Built-in Blocks
RBL has a collection of built-in concrete blocks, the behavior of which
can be possibly modified/supplied by (template) arguments (flags/-
functors), instead of more verbose class inheritance.
3.2.1 Convenience Blocks
A user can construct concrete subscriber and transformer blocks from
callable objects or simple functions:
auto sub = rbl :: subscriber < int >([]( auto value ) {
std :: cout << value << std :: endl ;
});
int power ( int value ) { return value * value ; }
auto tran = rbl :: transformer < int , int >( power );
If the power function threw an exception, it would be caught and
sent as an error message.
The type of the block can even be deduced from the type of the
callable argument (equivalent to the previous example):
auto sub = rbl :: make_block ([]( int value ) {});
auto tran =
rbl :: make_block ([]( int value ) { return value ; });
23
3. Implementation
In addition, there exists a block, which simplifies implementation
of a transformer’s body with a nested graph:
auto tran = rbl :: composite < input_type , output_type >(
[]( rbl :: passive_subscriber < output_type >& sink ,
input_type const & value )
{
auto start = rbl :: once ( value );
start >> ... >> sink ;
start ();
});
The body is executed (i.e. the subgraph is re-constructed) for each
input message. The main use case is the parametrization of multiple
temporary blocks according to an input value. The sink parameter
represents an internal subscriber of the composite block, which in turn
directly forwards the messages as output messages of the composite
block itself.
3.2.2 Identity Block
identity can serve two purposes. It can be used as a connection
node between multiple blocks. Secondly, it can perform a conversion
from the input type to the output type, which can differ.
3.2.3 Constant Blocks
The constant block transforms each input event message to an
output data message of a given value or an output error message with
a given exception.
The once block is a version of constant which terminates immedi-
ately after sending the first message.
3.2.4 Termination Signaler
The block terminated sends out an event message before its ter-
mination. The connected subscribers can react to such event regularly.
3.2.5 Error Handlers
By default, each transformer block implements the manipulation
of error messages by forwarding them. If an error message reaches a
24
3. Implementation
subscriber (or a transformer without subscribers) without a handling
strategy at the end of the connection chain, the program is terminated.
There are three types of dedicates blocks that manipulate the flow
of error messages:
• catch_errors – invokes an user-provided handler if an incom-
ing error message contains an exception matching a given type,
otherwise forwards all messages;
• ignore_errors – prevents an error message from being for-
warded if its exception type matches a given type, otherwise
forwards all messages;
• capture_errors – transforms matching exceptions received as
error messages into exceptions sent as data messages; input
data messages are ignored.
The following example shows how the control flow can be ex-
pressed the same way for exceptions as for data, by injection of the
error handling related blocks into the graph:
Listing 3.3: Error Manipulation
produce_or_fail >> ignore_errors < data_t , exception_t >()
>> on_success ;
produce_or_fail >> capture_errors < data_t , exception_t >()
>> print_trying_again_message ;
produce_or_fail >> capture_errors < data_t , exception_t >()
>> produce_or_fail ;
ignore_errors and capture_errors represent two disjunctive paths
to be taken. On failure, a diagnostic message is printed and the cycle
for another attempt is entered.
3.2.6 Collect Block
As you might have noticed in Listing 3.3, if produce_or_fail is
synchronous, we have constructed invalid synchronous cycle (see 2.4).
The collect block, as the only synchronous block, can be used to
"break" such cycles.
25
3. Implementation
The block is reentrant with the following behavior:
Listing 3.4: Collect Block
1 if block is not currently entered
2 enter the block
3 propagate the event further , as normally
4 foreach event in the input queue
5 propagate the event
6 pop the event from the queue
7 leave the block
8 else
9 push the event to the input queue
3.2.7 Switch Blocks
RBL provides group blocks, which have the transformation se-
mantics of identity, but only accept/send messages from/to the cur-
rently active subscriber/publisher. Activation of the next running
subscriber/publisher is instigated by the termination of the currently
active one. The switch blocks terminate when there are no further
subscribers/publishers to switch to. These blocks are useful for im-
plementing the concatenation expression (see 3.6).
The classes are:
• switch_outputs – sequentially chooses the active publisher,
• switch_inputs – sequentially chooses the active subscriber.
3.2.8 State Interfacing Blocks
Lastly, part of the built-in blocks constitutes of blocks for access
and modification of external variables. They exist mainly to support
interaction with imperatively written code, a complete rewriting of
which into RBL code might be undesirable.
The blocks include:
• get – reads the associated variable and sends its value immedi-
ately upon receiving an input event message;
• set – assigns each data message’s value to the associated vari-
able upon receiving;
26
3. Implementation
• read – reads values from a forward iterator range, includes
overloads for reading from standard input streams;
• write – writes values to an output/forward iterator range, in-
cludes overloads for writing to standard output streams;
• insert, push_front and push_back – insert values into a standard-
compliant container.
3.3 Executor Blocks
Executors are transformers which transfer the execution from one
synchronous subgraph to another or change the context of execution
for the remaining subgraph in another way.
3.3.1 Asynchronous Blocks
One of the most common executors should be the ones, which directly
interact with an asynchronous API. RBL shows how to implement
these blocks above Boost.Asio’s contexts and I/O objects (TCP sockets).
They are similar to the above state interfacing blocks, in that they
modify the state of the associated I/O context/object, external objects
to the block graph.
The I/O context operating blocks are (contained within
the rbl::asio::block namespace):
• executor – issues a post/dispatch/defer call to a Boost.Asio’s
io_context according to the policy, the internal completion
handler sends input message stored alongside further;
• strand_executor – issues a post/dispatch/defer call to
a Boost.Asio’s io_context::strand;
• priority_executor – issues a post/dispatch/defer call to a
custom-made rbl::asio::priority_context;
• delay_executor – delays the re-sending of the input message
via an io_context and boost::asio::basic_waitable_timer
objects.
27
3. Implementation
delay_executor has three possible policies:
• all – delays all messages,
• first – ignores messages while there is a waiting message
(inspired by RxCpp’s debounce operator),
• last – a new message cancels the previous waiting message.
The other executors listed above treat the I/O context simply as a
queue (see A.2), which allows to transform the depth-first execution
strategy of what would previously be a synchronous subgraph into a
breadth-first pattern.
Figure 3.2: Controlled Breadth-first Execution
In the above graph, the prepended executors change the execution
order from B, D, E, C, F, G to B, C, D, E, F, G.
The priority executor uses rbl::asio::priority_context, which
is an adapter for prioritized execution on top of a standard io_context.
Each executor has an associated constant priority, which determines
the relative order of execution between executors.
As a result, if A in the above example generated more than one
message in a row and the executors were prioritized with different
value each (consider the order B, D, F, C, E, G), the execution pattern
would be B, B, D, D, C, F, C, F, E, E, G, G.
28
3. Implementation
The blocks for asynchronous TCP operations (see A.2) are (con-
tained within the rbl::asio::tcp::block namespace):
• connect – sends an asynchronous TCP connection request to
the endpoint given as the input message, produces a connected
TCP socket;
• accept – asynchronously accepts an TCP connection request
upon receiving an input event message, produces a connected
TCP socket;
• resolve – asynchronously resolves a query into a TCP endpoint
using the Boost.Asio’s mechanisms;
• read/write – issues an asynchronous read/write operation
from/to the referenced socket, can be either formatted (using
Boost.Serialization), or raw (binary).
One limitation of Boost.Asio sockets is that they can process at most
one asynchronous operation at a time in the way that is meaningful
for us. This is because, internally, the object can partition the asyn-
chronous operations into smaller chunks (to be written, for example),
and these operations would become interleaved with those of another
asynchronous operation. RBL handles this problem by sequencing
requests after completion of previous ones.
3.3.2 Thread Block
The rbl::thread block transfers the execution of its input mes-
sages to its internally managed thread, with use of a buffering message
queue. The thread can be explicitly started and joined, as well as auto-
matically – running since the first connection until block’s termination.
A potential hazard is that the internal queue may become over-
loaded if producing the messages is faster than consuming. There is
currently no support to detect this incident and handle it gracefully,
but the exceptional behavior may be implemented with relative ease.
29
3. Implementation
3.3.3 Lock Block
The rbl::lock block can manage either an external mutex2 , or its
own. The mutex is held during the forwarding of each message and
released before returning control to the invoking publisher. In RBL’s
terms, the block is required as an entry point to a shared synchronous
subgraph connected to concurrently executed subgraphs (e.g. multiple
rbl::thread blocks).
An unlocking block counterpart does not exist. Because of how
synchronous subgraphs work (see 2.4), we would not want to leave
an exclusively held subgraph before returning to the effective locking
block. The block states may become disrupted before backtracking or
immersing into another branch. Therefore, a locked subgraph may be
escaped by a message only through asynchronous or thread blocks.
3.3.4 Termination
Executors only operate in the forward direction (apart of the lock
block). Backward termination messages from executor’s subscribers do
not propagate past the executor in a synchronous, nor asynchronous
manner. That would violate the assumptions of the synchronous sub-
graph model we have established sooner.
As a result, a program may attempt to use a terminated executor. It
is at this point that the source subgraph becomes notified of the termi-
nation. To avoid misconceptions, terminating an executor from either
side (not manually) does not cancel its possible pending asynchronous
effects; the termination is enqueued afterwards and happens as the
last action.
Figure 3.3: Executor Termination
When A terminates, the subscriber side of executor terminates imme-
diately, but the publisher’s side and B is sequenced after all pending
effects. If no effects are pending, the termination of B is also immediate.
2. A mutex is a synchronization primitive used to protect shared data from being
simultaneously accessed by multiple threads [2, Section 42.3.1].
30
3. Implementation
When B terminates, the publisher side of executor terminates im-
mediately, but the subscriber’s side and A is not affected. Later, when
A attempts to communicate a data or an error message, the propaga-
tion of the withheld backward termination message is resumed. The
original message from A does not have an effect in this case.
This is an inherent shortcoming of executors trying to implement
the same semantics as synchronous blocks. Nevertheless, there should
be no visible behavioral difference of a program because of this, other
than possible performance of unnecessary calculations in the source
subgraph.
3.4 Algorithm Blocks
As a continuation of built-in blocks, RBL provides basic concrete blocks
for building algorithms. The implemented algorithms are largely RBL
variants of algorithms that can be found in the C++ standard library.
The RBL algorithms represent computations on inputs (messages)
distributed in time, as opposed to algorithms distributed in space3 .
The latter need to have the whole input data available before starting,
the former work by processing inputs one by one. RBL implements
mostly online algorithms4 because of their natural fitness for this case.
RBL implements these algorithms in categories, which share the
general semantics and the exact behavior is usually specified by user-
provided functions, as with convenience blocks (see 3.2). Moreover,
some blocks implement multiple policies for their behavior, which are
again selected by parametrization.
3.4.1 Mappers
Mappers (essentially transformers) are the simplest of algorithm
blocks – they transform the values from the input sequence to values
of the output sequences in a one-to-one relation. Currently, the only
concrete mappers are the variants of standard clamping functions –
clamp_min, clamp_max and clamp.
3. We have been partly inspired also by RxCpp (see 4.1.2)
4. An online algorithm is an algorithm, which can compute its output data sequen-
tially from its input data, without needing the whole input to be available before
producing a part of its output.
31
3. Implementation
3.4.2 Filters
Filters are blocks, which do not transform messages, but decide,
whether to forward or discard them.
There are several versions:
• filter_if/filter_if_not – accept or ignore values based on
a user-provided predicate;
• filter/filter_not – accept or ignore values based on equality
to a given value;
• filter_if_consecutive/filter_if_not_consecutive –
accept or ignore data messages based on a user-provided binary
predicate, called with two consecutive values
(filter_unique_consecutive is a concrete example);
• filter_unique and filter_unique_unordered remember all
previous unique values and forward only the unique ones, the
variants differ by the usage of the internal storage – std::set
or std::unordered_set.
3.4.3 Accumulators
Accumulators progressively calculate one output value for the
whole input sequence. Accumulators may be constructed with a user-
provided function, which, based on the current state and an input
value, changes the value of the state. The type of the state value may
differ from both input and output value types, for maximum generic-
ity5 . The state type is transformed to the output type, again, with a
user-provided function6 .
Accumulators can be constructed with two policies: total and
partial. The partial policy behavior sends each intermediate result,
while total sends only the final result. If the input sequence was empty
and the block was terminated, the initial value is sent, if provided,
otherwise no output is generated.
5. For example, this is useful for the average accumulator, since it holds both the
sum and the count of yet received values.
6. In the case of the average accumulator, this function performs the division of
the two state’s components.
32
3. Implementation
3.4.4 Combiners
Combiners are algorithm blocks for processing multiple input
sequences in order to generate a single output sequence. They wrap
an user-defined predicate with an arbitrary number of parameters.
There are numerous policies available:
• all – all input values have to be combined (they are stored
in input queues, an output is generated, when all inputs are
available), an error message propagates further and discards
all messages, which would otherwise become combined with a
data message at its place, including future error messages;
• ordered – input values are accepted strictly in the sequential
order, messages from inactive input sequences are ignored, an
error message propagates further and resets the currently active
input to the first one;
• first – the input values are collected and processed when all
of them are present, subsequent messages from the same input
sequence are ignored until an output value is generated;
• last – the input values are collected and processed when all of
them are present, subsequent messages from the same input
sequence overwrite the stored ones;
• first_partial – a variant of first, which sends all intermedi-
ate combinations;
• last_partial – a variant of last, which sends all intermediate
combinations.
3.4.5 Quantifiers
Quantifier blocks are similar to accumulators, but they are limited
to producing exactly one boolean output value, signaling whether the
input sequence has matched the given predicate in combination with
the quantification assertion (existential or universal). As the quantifi-
cation result may become known before the end of the input sequence,
the block may (correctly) terminate prematurely. Quantifiers work
with predicates applied on single or consecutive elements. The later
33
3. Implementation
case allows checking for consecutive value uniqueness and monotonic-
ity of value sequences.
3.4.6 Generators
Generators is a more general category of algorithm blocks, which
produce larger or informatively denser output sequences than the
input ones they accept. The generate block outputs a value on an
incoming input event and internally generates its successor via a user-
provided function. The repeat block sends each input message the
specified number of times, immediately after receiving it. The unpack
block expects a standard forward-iterable container or a homogenous
std::tuple as its input value type, to send their individual elements
one by one.
3.4.7 Other Algorithms
There are numerous uncategorized blocks, such as sliding window
(the inversion of pack), discretely delayed block (by a number of mes-
sages to wait), a block for unzipping a sequence of tuples into multiple
sequences of individual elements (unzip), a block for lexicographical
comparison of two input sequences and lastly, common set operations.
The set operations accept two sorted input sequences (according to
a specifiable comparator) to create one sorted output sequence. Their
variants have been adapted from the standard library – union (and
merge – preserving duplicates), intersection, difference and symmetric
difference [2, Section 32.6.3].
3.5 Introspection Layer
The IoC paradigm accompanied with asynchronicity severely de-
grades debugging options. This is hoped to be compensated by intro-
spection layer, which currently consists of logging and visualization
support. Introspection can be disabled at compile time, incurring no
costs on program’s run-time.
34
3. Implementation
3.5.1 Logging
RBL logs the information about its internal proceedings on multiple
verbosity levels, specifiable via command-line flags:
• operations (-v1) e.g. socket requested to send data,
• state changes (-v2) e.g. block has been terminated,
• communication events (-v3) e.g. a block is sending a value
(calling its subscribers’ receive member functions),
• internal events (-v4) e.g. a block has been removed from the list
of connections.
3.5.2 Visualization
The other, more prominent feature, which does not have an analogy
in pure C++, is the ability to visualize the program’s control flow.
The output has a form of a static graph, as it was at the point of the
user’s request. The graph generation is not an atomic and thread-
safe operation. Therefore the block graph may not be undergoing
modification at the same time.
Visualization is performed by an rbl::intro::visualizer object
bound to a standard output stream. The visualizer allows the user to
customize the subject and options of visualization before they decide
to invoke the write member function. This writes the program’s block
graph to the output stream, usually being an output file stream, in
the DOT format. The DOT format can be visualized externally, using
GraphViz [8].
The following information is visualized:
• generated (default) or user-defined block names;
• block class types (possibly simplified);
• block traits, i.e. whether they are dynamically allocated and/or
executors;
• individual input/output ports of group blocks and their names;
• direction, data types and relative order of connections.
35
3. Implementation
Graph scopes serve during visualization as bounds of the visualized
subgraph. A user can declare a (nested) scope hierarchy and request
visualization of a specific scope they are interested in. Only blocks that
were created during the time the requested scope was on the scope
stack (active) will be visualized, including its child scopes. Otherwise,
scopes do not play any role in the program’s logic.
The following graphic shows two nested scopes, a dynamic (dashed)
block, an executor (filled) block, a block (group) with multiple inputs,
etc. The group block uses the default identifier, generated from the
object’s address, other blocks have been explicitly named by the user
according to their functions.
The graph has been generated from the contrived code of
example/intro/visualization_basic.cpp.
Figure 3.4: Block Graph Visualization
3.6 Expression Template Layer
Expression templates (or just RBL expressions) are used to stamp
out distinct block graphs (see 2.6). Expressions can be looked at as a
building block for a custom C++-internal language. When we have
determined what our atomic expressions are (terminal symbols), we
36
3. Implementation
can continue building compound expressions (nonterminal symbols)
around them.
Each expression is derived from expression<Input, Output> for
common semantics, where Input/Output is the input/output data
type. An expression may, therefore, have at most one input and output,
which are directly connected to the underlying blocks. The type of
a compound expression is automatically deduced from its nested
subexpressions. The purpose of each expression is to generate its
block subgraph and further represent its input and output blocks for
use in enclosing expressions.
3.6.1 Block Expression
In our case, block is the single atomic expression. It is represented
by block_expression or temporary_block_expression. The former
masks named (lvalue) blocks in expressions and the latter the anony-
mous (rvalue) ones.
With some additional techniques, blocks can be used in compound
expressions directly and be converted to block expressions under the
hood. Even more, functions, which are compatible with convenience
blocks (see 3.2) implicitly construct the corresponding convenience
block, which is further fed to the aforementioned step. As a result,
non-RBL callables (lambdas, simple functions) may also appear in the
expressions directly:
publisher < int >()
>> []( int value * value ) { return value * value ; }
>> []( int value ) { std :: cout << value << std :: endl ; };
3.6.2 Chain Expression
Chain expression is the simplest compound expression. Its purpose
is to create connections between its nested expressions. A chain can
potentially spawn a hidden identity block (see 3.2), which is used to
convert messages between two adjacent blocks that are convertible,
but not identical.
37
3. Implementation
Compound expressions are variadic7 and automatically flattened
to simpler types, in favor of optimizations (smaller block graphs) and
shorter compile error messages (in cases of invalid template instan-
tiation [6, Chapter 9.4]). The overloads with two operands are repre-
sented via overloaded binary operators, so the following statements
are equivalent:
chain (A , B , C );
chain ( chain (A , B ) , C );
chain (A , chain (B , C ));
A >> B >> C ;
( A >> B ) >> C ;
A >> ( B >> C );
Read as: A triggers (sends messages to) B, which triggers C
3.6.3 Complex Expressions
There are three kinds of more tricky expressions implemented in
RBL. They are complex in the way that their underlying subgraph is
composed of:
• the subgraphs of the nested expressions (not different from
chains),
• a prepended hidden block with one input and multiple output
connections,
• an appended hidden block with one output and multiple input
connections.
The types of hidden blocks are automatically deduced from the
types of subexpressions. The blocks, along with the deduction rules,
are the only thing that differs between the following expressions.
7. Meaning they can be instantiated with a variable number of distinct types or
deduced-from values [6, Chapter 4].
38
3. Implementation
The topology remains the same in all cases, i.e.:
A: source D: target
B: 1st operand C: 2nd operand
Figure 3.5: Complex Expression Graph
Disjunction Expression
Disjunction expression (operator |) has identity as its both hidden
block types. Because of this, each input message sent to the generated
graph will be synchronously sent to each subexpression’s subgraph.
Output messages from all subexpressions’ subgraphs are collected
and sent as output messages of the generated graph.
A >> ( B | C ) >> D ;
Read as: A triggers B and C, which both trigger D (may happen twice
as often as the invocations from A)
In practice, the disjunction expression can be used to remove the
repetition from Listing 3.3 to the equivalent:
produce_or_fail >> (
(
ignore_errors < data_t , exception_t >() >> on_success
) | (
capture_errors < data_t , exception_t >()
>>
( p r i n t _ t r y i n g _ a g a i n _ m e s s a g e | produce_or_fail )
));
Conjunction Expression
Conjunction expression (operator & or &&) uses a pair of unzip (see
3.4) and zip hidden blocks. The input and output types are tuples.
39
3. Implementation
Operator & uses a zip combiner (see 3.4) with the all policy, while &&
uses the ordered policy.
A >> ( B & C ) >> D ;
Read as: A triggers B and C with individual tuple elements, which
combine their output values into an output tuple to trigger D
This construction has a potential usage in the MapReduce pattern
(splitting the work to be done on each message among a static number
of worker blocks/graphs), with some additional data transformations:
input >> m a p _ v a l u e _ t o _ t u p l e _ o f _ n _ e l e m e n t s
>> ( worker_1 & ... & worker_n )
>> c o m b i n e _ t u p l e _ o f _ n _ e l e m e n t s _ t o _ v a l u e >> output ;
Concatenation expressions are implicitly flattened as well, which
now affects the input and output types. What would previously be
std::tuple<std::tuple<A, B>, C>, becomes std::tuple<A, B, C>.
To prevent this action in case it is unwanted, there is the no_fold
function, which wraps the compound expression we would like to
keep intact. The function works for other compound expressions as
well, though not bringing any semantic difference.
Concatenation Expression
Concatenation expression (operator +) spawns a hidden complemen-
tary switch block pair (see 3.2). As a result, an input message is only
forwarded to the active expression’s subgraph and only the output
messages of the active subgraph are being forwarded as output mes-
sages of the generated graph8 .
A >> ( B + C ) >> D ;
Read as: A triggers only B until it is running, then switches to triggering
only C; D is triggered only by B until it is running, then switches to
being triggered only by C
8. Since a subgraph may appear in a terminated state at one side and running on
the other, the switches could happen to consider subgraph of difference expressions
as the active one.
40
3. Implementation
The practical usage can be seen in the scenario of handling a mes-
sage sequence with transient strategies, e.g. using take block variants:
input >> (
( take_10 >> strategy_1 ) +
( strategy_2 >> t a k e _ w h i l e _ l e s s _ t h a n _ 4 2 ) +
( t a k e _ u n t i l _ c o n s e c u t i v e _ e q u a l >> strategy_3 ) +
final_strategy
) >> output ;
Expression Committing
Committing is the act of expressions taking effect – generating their
block graphs. Because an expression may undergo further expansions,
this is not done immediately at the point of construction.
The chain expression is the only one which commits itself automat-
ically at the destruction (i.e. the end of the statement). Committing is
recursive for all subexpressions, regardless of their types.
An expression may be committed from the input and/or output
side. This supports a lazy block graph generation:
A >> ( B | C );
In the above example, the disjunction expression only instantiates
its hidden input block and its connections because the output of the
expression was never requested.
Expression Capturing
Potential unnamed blocks and created hidden blocks have to be stored
somewhere. In the above cases, they would be individually dynami-
cally allocated, which could result in not using the cache in a coherent
way9 . To fix this, there is the following construction that produces
expressions holding blocks with automatic storage duration (on the
stack):
auto expr = rbl :: capture ( expression );
9. Cache coherence is the practice of storing related structures (in terms of access
time) close to each other in the virtual address space of a program. It is done to
maximize the effect of prefetching from the main memory that is done by the CPU.
41
3. Implementation
Capturing expression moves it to the expr object, and commits it.
Unfortunately, the information about how an outer expression uses its
subexpressions is not reflected in the subexpressions’ types. Therefore,
they always have to create the space for their hidden blocks, including
the unused ones. The extra cost also involves unneeded references to
the named blocks, contained within block_expressions. Expressions
are complex as they are to address these issues with further type
modifications via template metaprogramming, but it may be achieved.
42
4 Evaluation
4.1 Use Case Comparison
In this section, we look at different designs of existing solutions for
asynchronous event handling in terms of syntactic and semantic dif-
ferences. The respective use case examples can be found in the exam-
ple/comparison directory.
4.1.1 Boost.Asio
Compared to Boost.Asio (see A.2), RBL makes the structure of writ-
ing sequences of asynchronous operations more linear and coherent.
Without RBL, the flow of operations is segmented into individual
functions, which are connected to their initiators as callbacks. In RBL,
such sequence of connections can be formed in one or few statements,
tightly packing the information about consequences in one place.
Boost.Asio implements its asynchronous model and minimal API.
It does not attempt to abstract things further, like RBL, leaving syn-
chronous operations to be implemented in the traditional, imperative
manner. This is where RBL takes on and continues.
Since version 1.54.0, Boost.Asio comes with integration of the
Boost’s coroutine (see 4.1.4) implementation [9]. While allowing fine-
grained control over program’s asynchronous execution involving
I/O operations, the code suffers from the same uncertainty of logical
consequences in larger programs. We will skip the analysis of this
combination, as the results can be extrapolated from the elementary
studies.
The comparison example is located in asio.cpp and asio_rbl.cpp
source files.
4.1.2 RxCpp
RxCpp is an implementation of Reactive Extensions in C++. Like RBL,
it simplifies event-driven programming with the use of IoC, but with
abstractions built differently.
43
4. Evaluation
RxCpp’s main building entities are observables (event sources)
and observers (receivers). Unlike RBL’s blocks, observables directly
provide the interface to be extended and composed using procedural-
like syntax. Such syntactic environment is generally referred to as
Language Integrated Query (LINQ)1 .
Here is an example of query construction, chained with operator|:
auto observable = range (0 , 10) |
map ([]( int i ) { return i * i ; }) |
filter ([]( int i ) { return i < 20; }) |
reduce (
std :: vector < int >() , []( std :: vector < int > v , int i ) {
v . push_back ( i );
return v ;
});
observable | subscribe < std :: vector < int > >([]( auto v ) {
std :: copy ( v . begin () , v . end () ,
std :: ostream_iterator < long >( std :: cout , " " ));
});
The code transforms a range of integers to their squares, then filters
and collects them to a vector to be printed out. A more sophisticated
comparison example can be found in rx.cpp and rx_rbl.cpp files.
An observable can be queried by an observer to create potentially
another observable, similarly to RBL’s block chaining. In RBL, we did
not aim to mimic the same; we have chosen a more explicit (connection
declaring) style instead. RBL’s blocks could be hidden away under the
LINQ-styled expression templates to produce a similar interface.
Overall, RxCpp comes with more concise syntax, but RBL is more
transparent, thanks to multiple layers of abstractions and their open-
ness to the user.
4.1.3 Intel® Threading Building Blocks
Intel® Threading Building Blocks (TBB) is a library designed for writ-
ing multi-threaded applications [11], which is another way of asyn-
chronous programming. The library also contains functionality to
define programs as a graph of interconnected nodes (the equivalent of
1. LINQ has first appeared in .NET Framework [10].
44
4. Evaluation
RBL’s blocks), contained withing the tbb::flow namespace. Between
two nodes, the IoC concept applies.
TBB primarily focuses on implementing more complex communica-
tion protocol, intended mainly for multi-threaded producer-consumer
patterns. It places more emphasis on task management and synchro-
nization. The protocol, unlike RBL, is bidirectional. It uses switch-
ing between push-pull mechanics, while RBL is based only on push
mechanics. In the "push" model, communication is initiated by the
producer of data, in the "pull" model, it is requested by their consumer.
TBB’s nodes implicitly implement this complex set of behavior,
which is further specifiable for each node. In RBL, we would most likely
implement additional scheduling mechanisms via executor blocks.
In addition, TBB’s nodes are controlled by a central authority – a
graph object, which can be used as Boost.Asio’s context to some extent,
to run and wait for the execution in each of its nodes to finish. In RBL,
we have more flexibility, and the possibility to register a termination
handler to each block individually.
Lastly, TBB is aimed at and recommends nodes with larger gran-
ularity. RBL should be able to handle smaller grain size with less
overhead, because of its simpler communication protocol. Therefore,
TBB is more suitable for building a data flow graph of more complex,
long-running tasks, instead of discrete asynchronous operations.
See tbb.cpp and tbb_rbl.cpp files for the difference in code.
4.1.4 Coroutines
Coroutines [12] are a relatively new concept for the C++ language.
Their main application is also asynchronous programming, however,
with minimal structural code changes from procedurally written syn-
chronous code. Coroutines change the mechanics of function execu-
tion, without using the IoC principle.
Routines
Normally, a function represents a continuous list of statements, includ-
ing nested function calls. Inside a thread of execution, each function
is executed in a blocking manner without interruption2 .
2. With the exception of interrupt routines, of course.
45
4. Evaluation
Coroutines
With coroutines, execution of a function may become discontinuous.
A coroutine function can be run as a normal one, but it can also put
itself into a suspended state. At that time, the control flow returns
to the calling function, but the coroutine’s internal state is moved
to a separately held, usually dynamically allocated data structure.
The caller is provided with a handle to this structure. They can later
use it to resume the execution of the associated coroutine from the
state, in which it was suspended. The side-allocated coroutine state
is destroyed when no longer needed – the coroutine has no more
instructions to run.
A suspension of a coroutine from within is either associated with
a produced output value of given return type or waiting for another
resource. The mechanism, therefore, allows coroutines to produce
(yield) multiple output values, allowing so-called "generating" func-
tions3 (including infinite ones).
Unlike RBL’s or RxCpp’s IoC approaches, the way of declaring
coroutine functions and dependencies remains at the level of simple
function declarations and function calls. Also, if an RBL block has
multiple values to output, it does so without intermediate suspensions,
and value production is guided by the producer, not the consumer.
Standard Proposal
Coroutines TS4 are expected to become a part of the C++20 standard.
The implementation includes reserved keywords to support expres-
sions of asynchronicity:
• co_await – wait for a coroutine,
• co_yield – produce an output value,
• co_return – produce the last output value.
With this, a matter of writing asynchronous code becomes little
syntactically different to that of synchronous code. Coroutines are
3. These are similar to Python’s yield generators
4. Technical specification ISO/IEC TS 22277:2017.
46
4. Evaluation
therefore relatively low-level and effective primitives while being per-
fectly usable. We think RBL, and other IoC-based can still compete
only under the following requirements:
• mild emphasis on performance – coroutines are expected to per-
form much better, even allowing compilers to perform unusual
intrinsic optimizations (to avoid said dynamic state allocations,
among others);
• strong emphasis on coherence of dependency information –
RBL’s graph declarations dominate in this area.
Extension
The standard implementation of coroutines is open to user-defined
behavior in some places. This is done by providing two counter-
part class interfaces – promise (for the callee, not to be confused with
std::promise) and awaitable (for the caller). Both classes allow users
to insert additional state and behavior, which will be performed at
times like suspension of execution.
The CppCoro5 library makes use of this extensibility to build a
slightly higher-level layer over coroutines.
Future Vision
If standardized, coroutines will eventually become the preferred way
of asynchronous programming in C++, for their incorporation into
the standard, if nothing else. This will make it a common knowledge
base for a modern C++ programmer, which will even further promote
their popularity.
The standard implementation, however, requires a substantial
amount of work to be performed by the compiler and library ven-
dors, as the feature is tied to the most basic concepts of C++’s abstract
machine. This means that the support of coroutines could be delayed
some time, before arriving on some platforms. Coincidentally, embed-
ded platforms, a major target for event-driven programming, are not
among those that follow the cutting edge standards. RBL is built with
5. https://github.com/lewissbaker/cppcoro
47
4. Evaluation
relatively modest requirements, although on the C++17 standard. It
is, therefore, ready to be ported to all platforms for which a C++17
compiler exists with minimal changes.
Other Implementations of the Coroutine Concept
Apart from the standard library and compiler vendors’ implementa-
tions, coroutines have been available in other forms to a certain degree,
most notably the Boost.Coroutine2 library [13]. The development of
this support library began even before C++11. It implements the same
mechanics, although without dedicated language keywords. The func-
tionality comes with classes, which are used as function parameters
and manipulated inside functions that become coroutines. The sup-
port to suspend a function’s execution and save the state comes from
the Boost.Context library [14].
We did not invest the time in a use-case comparison example with
coroutines, as there is yet no definite standardization, and they are
conceptually too distant from RBL. The exact fitness would have to be
evaluated with a number of examples, many of which could be biased
towards one side.
4.2 Performance Analysis
RBL’s implementation of IoC introduces some performance drawbacks.
In this section, we will identify the most prominent ones and assess
their severity with benchmarks of simple non-RBL versus RBL code.
4.2.1 Overhead Analysis
In the following analysis, we leave individual concrete blocks aside
and focus on the common performance factors. We can agree that the
most prominent C++’s optimization feature is function call inlining,
which aids the compiler to perform further optimizations, based on
the available internal code model. Because of the dynamic nature of
RBL (a program’s graph is built at run-time, not compile-time), we
are losing this capability between blocks.
48
4. Evaluation
Static Connection Overhead
More obvious overhead mainly appears in form of pointer indirection
(which prevented inlining in the first place) and dynamic dispatching
[15]. To transmit a message over one connection, an indirection to the
dynamically allocated list of block pointers is performed. Then, each
pointer in the vector is dereferenced (another indirection), to call the
appropriate virtual function.
The first indirection, as well as the dynamic allocation altogether,
can be avoided. RBL can be compiled to create statically sized static
list of block pointers (std::array), instead of std::vector for each
block. This should theoretically also result in better cache coherence.
The static number of block pointers to store is specified by the user.
If a block needs more connections than this number, it will allocate a
dynamic list and the whole static list will be unused. If the static size
is too large, a big fraction of the designated space may be unused.
Since the variation between static and dynamic container types
is implemented with std::variant, a reasonable default value for
the static size is sizeof(std::vector<void*>) / sizeof(void*), not
adding any space overhead, except that of std::variant itself6 . The
number depends on the platform, compiler, and the standard library’s
implementation; e.g. on x86-64 GCC with libstdc++, it is generously 3.
Lest we forget one condition check for each access, which might make
things only worse. In our benchmarks, the performance difference
between these two approaches was not noticeable, so it should rather
be tweaked for each program individually.
We will look at how the implemented inter-block communication
impacts the performance in 4.2.2.
Dynamic Connection Overhead
Connections and disconnections are asymptotically costly, because
each of them searches the underlying associative container in O(n)
time, to avoid multiple connections or find the connection to remove.
But, assuming that these actions form only a fraction of an RBL pro-
6. std::variant stores the index of the type currently being held.
49
4. Evaluation
gram compared to actual communication and are negligible with the
usual, low connection branching factor, we will not dwell on this.
Communication Overhead
Messages are passed from a publisher to its subscriber without the use
of move semantics, as one message may have multiple destinations
and therefore cannot be invalidated, but copied.
Executable Size
C++ templates vastly present in RBL result in multiple instances of
block logic and internal communication functions and to be compiled
for each data type [2, Section 23.2.2]. The binary executable is therefore
visibly larger than that of equivalent non-RBL code7 .
In general, we cannot correlate executable size with the speed of the
program. The binary size, however, is an important trait to consider in
environments with limited program memory, e.g. embedded systems.
By using dynamic dispatching in block graphs, instead of extending
the block templates to hold the type information about connected
blocks at compile-time, we have avoided additional growth of the
executable size (code bloat), at the run-time cost analyzed sooner.
Compile-time Overhead
Extensive use of templates in each layer of RBL visibly degrades com-
pilation times, and deep template instantiations require a substan-
tial amount of system’s memory to compile. This is a well-known
drawback from the generic use of C++ [6, Section 23.3]. The resorting
techniques to be taken by the user include precompiled headers [6, Sec-
tion 9.9], explicit instantiation [6, Section 14.5], and (future) modules
[6, Section 17.11].
7. It is still smaller than the RxCpp executable in the comparison example (rx_-
rbl.cpp compiled to 1.1MB versus rx.cpp with 1.9MB)
50
4. Evaluation
4.2.2 Performance Benchmarks
Each of the following benchmarks measures the performance of RBL
compared to equivalent imperatively written code (see B.3 for technical
details). By doing so, we get the picture of inherent communication
overhead. All benchmarks are constructed around the transmission of
given total data size, with message granularity further parametrized
on the x-axis, increasing exponentially. There are two y-axes:
• left – the absolute time required to complete the transmission
with base-10 logarithmic scale,
• right – the efficiency percentage (performance of RBL relative
to non-RBL) with a linear scale.
The most basic case in which we were interested was synchronous
forward data message transmission between a publisher and a sub-
scriber.
Figure 4.1: Synchronous Benchmarks
As expected, RBL performs the worst with smaller messages, com-
pared to non-RBL, because the communication mechanism (indirec-
tion) is used more intensively and takes the majority of the CPU time.
51
4. Evaluation
With combination of Boost.Asio’s io_context, this cost becomes
much less prominent8 , and with TCP sockets, barely visible.
Figure 4.2: Asynchronous Benchmarks
8. Note that there are even three blocks, two connections in the first case’s loop; four
blocks and four connections in the second (see execute.cpp and read_write.cpp).
52
4. Evaluation
Lastly, also in the case of producer-consumer pattern with separate
threads and a buffering queue, the overhead is acceptable. We could
agree that this pattern is not suitable for fine-grained communication
in either case, so the values for lower message sizes should be of no
interest to us.
Figure 4.3: Asynchronous Benchmarks (thread)
There are many more evaluated versions of all the aforementioned
benchmarks, and performed on executables compiled with Clang 6.0.0,
as well as GCC 8.1.0. The benchmarks were produced on 64-bit Ubuntu
16.04 LTS, running on Intel® Core™ i5-4210U CPU @ 1.70GHz × 4
with 8GB of DDR3, 1600MHz RAM.
4.3 Debugging
The introspection layer does not bring the usual debugging options
back to the levels of code without IoC. We cannot debug IoC code
using breakpoints, step-execution and subsequent state inspection
with the same level of comfort, as the program’s logic is scattered
among loosely-coupled functions. The stack is dominated by RBL’s
internal communication functions over the user-defined ones. There
is a large potential for extension of debugging options (see 5.1).
53
5 Conclusion
In this thesis, we have designed and implemented a C++ library, that
changes the way of expressing event-driven programs. We have ex-
plained the basic building principles, which we have later solidified
by showing their usability to implement concrete blocks for various
operations and algorithms. We have shown the power of C++ in terms
of being able to design our own declarative sublanguage with the
purpose of simplifying the syntax of block graph construction.
We have performed a concise comparison between the library
and similar existing solutions. The library, built on general concepts,
proved to be able to compete with various, more specialized libraries
in terms of usability and performance. Among the last things, we have
described an upcoming C++ language feature – coroutines, which
should, however, significantly narrow down the prominent use cases
of our library.
5.1 Future Work
There are potential improvements in each layer of RBL.
Multi-platform support RBL has been developed on the x86-64 ar-
chitecture and Linux. It may be required for certain parts to be ported
to work on different (embedded) platforms, most notably the bare
metal ones1 , e.g. Espressif ESP32 or Atmel AVR32.
Optimizations The core of RBL is the most critical part but has been
implemented mainly as a proof of concept. It could benefit from deeper
performance analysis and profiling (e.g. using the Callgrind tool)
for the user-targeted platforms. The (micro)optimizations could be
afterward incorporated into RBL’s platform-independent repository.
Additional concrete blocks We have implemented the most com-
mon built-in blocks that we have found use while considering use
1. Bare metal is a computer environment, in which a program is run without a
presence of an operating system.
54
5. Conclusion
cases. There may be more concrete blocks worth adding into the RBL’s
base collection.
Additional asynchronous API adaptations RBL welcomes to be fit
onto existing asynchronous frameworks, other than Boost.Asio. Ex-
ecutors do not fully wrap the functionality of Boost.Asio, either. In
the future, Boost.Asio support could be segregated out of the RBL’s
base to a stand-alone repository with the same rank as other APIs.
Mainly on embedded platforms, transfer of various interrupts into
RBL events seems viable.
Additional/reworked expressions The expression template layer
demonstrates the power of C++ and the ability the abstract RBL’s
low-level API. We could find a reason in implementing other types of
expressions, with slightly different semantics, such as an expression
for creating cyclic graphs (loops). However, the designed expressions
(namely the type deduction rules of complex expressions), may be too
complex for an average user. With the example of the existing expres-
sions, this layer may either be extended or re-built into expressions
with simpler semantics, perhaps using the LINQ approach.
Improved debugging capabilities We could reason about expand-
ing the introspection layer to include run-time, perhaps visual tools
to look into the process of execution. Said extension could, at the very
least, provide the feature of placing breakpoints2 onto selected blocks
and/or their connections, halting the execution upon observing any
events. The breakpoints could either be created via a function call for
each block type, or there could be a designated block for this purpose,
intended for connection to the graph at the critical places.
2. Breakpoints can be created on Linux in-code with raise(SIGTRAP) call (see
RAISE(3) and SIGNAL(7) [1]).
55
A Asynchronous API Model
A.1 Asynchronous Operations
An asynchronous API consisting of functions, which initiate opera-
tions usually taking longer, or unspecified amount time, but return the
control to the caller immediately, regardless of whether the operation
has completed or not. The result of the asynchronous operation can
either be:
• manually queried by the user in an execution blocking or non-
blocking manner,
• automatically conveyed to the user by calling their callback
function.
Synchronous execution would block the caller until completion of
the associated operation, making it unfit for communication with an
outside environment.
Callback A callback function (callable object) is provided by the
user as an argument to each function of the API, which uses IoC. It is
executed as soon as the associated operation completes, either with
success or failure. This is the base case of IoC, and at that point, the
asynchronous API has to be in control of the program. To be in control
means to have at least one thread designated to the execution of the
managing code, typically in a loop.
A.2 Boost.Asio – An Asynchronous API Example
Since RBL serves only as a transformation of programming techniques,
it has to look for asynchronous execution scheduling and synchro-
nization mechanisms elsewhere. The intention behind RBL is to be
fit onto existing asynchronous APIs. One of such APIs can be found
in the well-known and well received Boost C++ Libraries collection,
under the name Boost.Asio [16]. While being designed primarily as
an asynchronous networking support library for C++, the execution
56
A. Asynchronous API Model
model of the Boost.Asio library allows more general use, suitable even
outside the domain of networking and asynchronous I/O.
This section is devoted to explaining the relevant portions of the
Boost.Asio library, for which RBL and some of the use case examples
of this thesis have been constructed.
A.2.1 io_context
The model revolves around an execution managing entity
– io_context. It is an object, which accepts asynchronous requests
from a user, enqueues the operation or forwards the requests to the
operating system and calls the appropriate completion handlers. The
handlers are provided in the form of callbacks as arguments of each
asynchronous operation. The callbacks are then stored in the io_con-
text’s internal queue for processing.
At some point in the program, the user has to delegate the responsi-
bility for control flow to the created io_context instance to execute the
enqueued asynchronous operations. This is done by calling the io_-
context::run function. The run function blocks while the registered
operations are being executed.
It is perfectly valid, and a common practice to chain asynchronous
operations together. In other words, a completion handler of an asyn-
chronous operation can register additional asynchronous operations
for execution, possibly within the same io_context::run call. This
even allows asynchronous cyclic dependencies between operations.
A.2.2 Execution Options
The registered handlers can appear in two states: pending and
ready. Pending handlers are those, for which the associated asyn-
chronous operation has not been completed yet. Ready handlers are
the remaining ones.
Registering an asynchronous operation is roughly equivalent to:
1. registering a ready handler launching the operation,
2. registering a pending handler (if any) as a completion handler
of the operation.
57
A. Asynchronous API Model
Overall, the io_context class provides these functions for greater
control over the execution:
• run_one – executes at most one ready handler including its
completion handler in a blocking fashion;
• run – executes all pending handlers and their completion han-
dlers, blocks until completion, sets the io_context object to a
stopped state afterwards;
• poll_one – executes at most one ready handler and returns
immediately afterwards;
• poll – executes all ready handlers;
• stop – stops the event processing loop, no more handlers will
be executed;
• restart – prepares a stopped io_context instance for repeated
run call.
A.2.3 Asynchronous Operations
An asynchronous Boost.Asio operation can be one of the following:
• I/O operations,
• time-delayed execution of a callback,
• deferred execution of a callback.
I/O Operations
Input/output manipulation operations are the only type among the
ones above with visible side effects, which are their sole purpose.
Creating a context and an I/O object The following code declares
an io_context and a TCP socket as an I/O object instance associated
with it:
boost :: asio :: io_context context ;
boost :: asio :: ip :: tcp :: socket socket ( context , endpoint );
endpoint is a TCP socket identifier (an IP address and a port number).
58
A. Asynchronous API Model
Asynchronous reading This is how an asynchronous read operation
to a buffer called inbound_data is requested:
boost :: asio :: async_read ( socket , inbound_data ,
[]( boost :: system :: error_code ec , size_t length ) {
if ( ec ) { /* an error has occurred */ }
});
The socket and inbound_data objects have to outlive the whole asyn-
chronous operation. The callback function is called with the return
status of the operation and the number of bytes that have been trans-
ferred.
Asynchronous writing This is the analogical data writing code:
boost :: asio :: async_write ( socket , outbound_data ,
[]( boost :: system :: error_code ec , size_t length ) {
if ( ec ) { /* an error has occurred */ }
});
Boost.Asio’s sockets also provide an interface for issuing a read/write
operation of multiple data buffers as one asynchronous operation
(scatter-gather I/O), or for manipulation of the sockets as with I/O
streams of characters. This is a theme beyond the purpose of RBL.
Execution
We can launch the previously registered handlers:
context . run ();
The run call blocks until both read and write operations finish.
In case we wish to execute the operations synchronously in relation
to each other, we can write:
context . run_one ();
context . run_one ();
The first run_one call blocks until reading finishes, the second one
blocks until writing finishes.
59
A. Asynchronous API Model
If we would like to initiate the operations without blocking and
do some other work, we can do:
context . poll_one ();
context . poll_one ();
// other work ...
context . run ();
The first poll_one call initiates an asynchronous read, the second a
write operation.
Time-delayed Execution
Boost.Asio allows registering callbacks that react not on an I/O event,
but an event caused entirely by the passing of time. The library pro-
vides timer classes as a supplement to an I/O object.
The following code launches the user-provided callback function
with a custom time delay:
boost :: asio :: io_context context ;
boost :: asio :: deadline_timer timer ( context );
timer . expires_from_now ( delay );
timer . async_wait ([]( boost :: system :: error_code ec ) {
if ( ec ) { /* the timer has been cancelled */ }
else { /* run the delayed code */ }
});
context . run ();
delay specifies the duration to wait, relative to the expires_from_now
call. As with sockets, the timer object should outlive the delay period.
Otherwise, the operation will be canceled, and the handler will be
called with an error code signifying failure. The associated handler
will only be called in the context of the run function, which blocks
until the completion of the delayed handler.
Deferred Execution
Lastly, it is possible to treat the io_context in a more direct way, as a
queue of handlers (tasks) to be executed. A user can enqueue custom
handlers to be invoked in a specific order, relative to other handlers.
60
A. Asynchronous API Model
This is achieved using the post function and its relatives – defer
and dispatch:
Listing A.1: Deferred Execution
boost :: asio :: io_context context ;
boost :: asio :: post ( context , []() { /* 1. */ });
boost :: asio :: post ( context , [& context ]() {
// 2.
boost :: asio :: post ( context , []() { /* 4. */ });
// 3.
});
context . run ();
The code portions will be executed in increasing order, as marked in
the comments. The last post call represents a chained asynchronous
operation.
A.2.4 Implicit Synchronization
The API of Boost.Asio is thread-safe, and it provides a very useful
guarantee, that a context’s handlers are only called from within the
processing functions (run, run_one, poll, poll_one). These functions
can be called from multiple threads at once. This might lead to syn-
chronization problems between handlers, which Boost.Asio promptly
resolves with the introduction of execution strands.
Strand
A strand is another kind of Boost.Asio I/O object, which holds a
reference to an io_context instance. It can substitute the io_context
instance in the function calls, such as post. The handlers that are
registered through a strand are guaranteed to be run sequentially
without overlapping, even while executing the associated io_context
from multiple threads.
61
B Technical Details
RBL is an open-source project under the Mozilla Public License 2.0.
Its code is managed via the Git version control system. A clone of
the repository is currently available at https://gitlab.fi.muni.cz/
xsevc/rbl.
B.1 Build requirements
The project uses CMake, a build system generation tool1 . The mini-
mum acceptable version is 3.6.
RBL requires C++17 standard support from the compiler. Some
parts require the systems’ multithreading support, such as POSIX
Threads on UNIX-like operating systems. Other parts depend upon
some of the Boost C++ Libraries, specifically Boost.Asio, Boost.Seriali-
zation and Boost.TypeIndex. A few compilation switches can be speci-
fied for a build, as described in README.md.
For the purpose of continuous integration and easier development
environment setup, a docker2 image with the necessary prerequisites
has been created, and is available at https://gitlab.fi.muni.cz/
xsevc/rbl-docker.
B.2 Third-party libraries
B.2.1 Boost.Serialization
The Boost.Serialization library implements data serialization and de-
serialization mechanisms in a more uniform way than the C++ I/O
streams. Under the same syntax, it is possible to write and read data
(built-in types, std::string and user-defined types) with the guaran-
tee, that the data obtained back will be the same. This is not the case
1. https://cmake.org/
2. Docker is open-source software for operating-system-level virtualization of
isolated environments (images and containers) directly interfacing with the host’s
kernel. Docker images represent a lightweight hierarchical packaging structure,
useful for software deployment.
62
B. Technical Details
for I/O streams, e.g. formatted extraction of std::string (originally
with whitespace) stops at the first whitespace [7, $21.3.3.4/(1.3)]; the
remedy is syntactically different to extracting non-problematic types.
The library is currently used only in RBL’s TCP formatted reading and
writing blocks (Section 3.3).
B.2.2 Boost.TypeIndex
Boost.TypeIndex provides portable API for getting static or run-time
(RTTI3 ) C++ type information about objects. RBL uses this to produce
human-readable C++ type names, which are part of RBL’s visualized
block graphs.
B.2.3 Loguru
Loguru (https://github.com/emilk/loguru) is a lightweight C++
logging library. We have chosen it for its relative simplicity and ro-
bustness, advocated performance, modern impression and ongoing
development. The main features we were looking for were modest:
the ability to select a verbosity level, and view timestamps, as well
as thread identifiers in the output. The library is not required with
logging disabled.
B.2.4 Catch2
Catch2 (https://github.com/catchorg/Catch2) is a minimalistic unit
test framework, in which RBL’s unit tests are written. The library con-
sists of macros that are wrapped around user-provided code blocks
representing full test cases, test case sections or asserted conditions.
The framework is explicitly given the control at the entry point of
a program (the main function) to run all the defined test cases auto-
matically. Additional behavioral and test case filtering options can be
supplied as command-line arguments.
3. Run-time type information (RTTI) is a feature of C++ that allows introspection of
the dynamic object types, i.e. objects of types determined at run-time [2, Chapter 22].
63
B. Technical Details
B.3 Project Structure
The project’s repository is organized into the subdirectories the fol-
lowing way:
• bench, bench/results – code and results of performance bench-
marks (Section 4.2.2, B.3);
• doc/draft, doc/html – basic documentation in form of a brief
draft of each layer, source code documentation;
• example – code of basic usage and comparison examples;
• include/rbl – header files;
• src – source files;
• test – unit tests;
• third-party – third-party libraries;
• utils – header files of general convenience functions.
B.3.1 Utilities library
The source code located in the utils directory is a collection of our
own convenience functions, which we have found needs for while the
development of RBL. These functions were too general to be placed
alongside the code that implements RBL’s specialized concepts.
The utilities extend the C++ standard library’s support of:
• generic run-time and compile-time tuple manipulation,
• functional programming,
• Resource Acquisition is Initialization (RAII) applications,
• type traits,
• template metaprogramming with variadic template parameter
packs.
64
B. Technical Details
B.3.2 Source-code Structure
Due to a high amount of generic template code, RBL largely takes
the form of a header library. There is only a small portion of non-
generic code that is compiled into a library.
RBL’s source code is logically divided into different parts (also
folders), each of which focuses on implementing one RBL feature, or
concept. These parts are:
• core – Core Functionality – RBL’s object-oriented block concept
and communication;
• builtin – Built-in Blocks – convenience blocks instantiable by
user-provided functions as their behavior, error handling and
other concrete blocks;
• exec – Executor Blocks – blocks implementing RBL’s executor
concept;
• algo – Algorithm Blocks – blocks implementing algorithms
operating on sequences of messages;
• intro – Introspection Layer – optional support for run-time
logging and visualization of an RBL program;
• expr – Expression Template Layer – syntactic simplification of
block graph creation.
All of the RBL’s source code is located inside the rbl namespace
and its inner namespaces. Namespaces named detail or containing
detail in their names are not to be used by the user.
The namespace hierarchy does not copy the logical and directory
structure. All block classes are contained within a block namespace,
which can be either rbl::block or appear further down the names-
pace structure, e.g. rbl::asio::block.
The blocks are recommended to be instantiated via their creation
functions, which are usually named the same (or similarly). These
functions are located next to the corresponding block namespace. RBL
is forced to use this pattern because class template argument deduc-
tion rules are weaker than those of normal functions4 . Furthermore,
4. Apparently, partial class template argument deduction (essential for explicitly
specifying a block’s input/output type while deducing other types) is not supported
[17, Template argument deduction for class templates].
65
B. Technical Details
there can be more creation functions (or overloads) dedicated to the
construction of one block in different ways.
RBL’s core implementation contains several run-time assertions
that can identify invalid constructions before undefined behavior
should take place.
B.3.3 Documentation
RBL’s sources are briefly documented via Doxygen5 comments. The
implementation details are omitted from the documentation, as well as
obvious parameter meanings (e.g. source object for copy constructor)
or return values (e.g. member access functions).
B.3.4 Unit Tests
As it is expected from larger projects, and support libraries even
more so, we would like to have a tangible assurance of their correctness.
As the main method, we have adopted unit testing, suitable for testing
of decoupled parts, which RBL blocks certainly are.
RBL contains unit tests, although only moderately covering the
functionality. The behavior of all blocks is tested by interaction with
various input messages. Complete instantiation (compile-time) tests
and tests for value semantics are currently lacking. Basic instantiation
validity tests of expressions are only present in examples.
The core layer is tested most thoroughly. Other layers are tested via
contrived block graphs, which usually contain built-in error handling
or message counting blocks to re-use what RBL provides.
B.3.5 Performance Benchmarks
The values in Section 4.2.2 are acquired as a mean of 10 measure-
ments for each parametrization. The compiled executables produce
data tables saved in .csv files. These can be then plotted into graph
images by the gnuplot6 program. For this, there is an automatized Bash
script called plot.bash that is copied to the same directory of the .csv
files (RBL_BENCHMARKS_OUTPUT_DIR CMake option) on each build.
5. http://www.doxygen.nl/
6. http://www.gnuplot.info/
66
Bibliography
1. KERRISK, Michael. Linux Programmer’s Manual [online]. 2019 [vis-
ited on 2019-04-30]. Available from: http://man7.org/linux/
man-pages/index.html.
2. STROUSTRUP, Bjarne. The C++ Programming Language.
4th. Addison-Wesley Professional, 2013. ISBN 0321563840,
9780321563842.
3. CAMPBELL, Lee. Introduction to Rx [online]. 2012 [visited on
2019-04-30]. Available from: http://introtorx.com/.
4. GABBRIELLI, Maurizio; MARTINI, Simone. Programming Lan-
guages: Principles and Paradigms. 1st. Springer Publishing Com-
pany, Incorporated, 2010. ISBN 1848829132, 9781848829138.
5. FOWLER, Martin. InversionOfControl [online]. 2005 [visited
on 2019-04-30]. Available from: https://martinfowler.com/
bliki/InversionOfControl.html.
6. VANDEVOORDE, David; JOSUTTIS, Nicolai M.; GREGOR, Dou-
glas. C++ Templates: The Complete Guide (2nd Edition). Addison-
Wesley Professional, 2017. ISBN 0321714121, 9780321714121.
7. ISO. ISO/IEC 14882:2017 Information technology — Programming
languages — C++. Fifth. 2017. Available also from: https://www.
iso.org/standard/68564.html.
8. GANSNER, Emden R.; NORTH, Stephen C. An open graph vi-
sualization system and its applications to software engineering.
SOFTWARE - PRACTICE AND EXPERIENCE. 2000, vol. 30, no.
11.
9. SCHÄLING, Boris. Boost.Asio Coroutines [online]. 2019 [visited on
2019-04-30]. Available from: https://theboostcpplibraries.
com/boost.asio-coroutines.
10. MICROSOFT DOCS. LINQ (Language-Integrated Query) [online].
2017 [visited on 2019-04-30]. Available from: https : / / docs .
microsoft.com/en- us/previous- versions/bb397926(v=vs.
140).
67
BIBLIOGRAPHY
11. INTEL CORPORATION. Intel® Threading Building Blocks Docu-
mentation [online]. 2018 [visited on 2019-04-30]. Available from:
https://software.intel.com/en-us/node/506211.
12. BAKER, Lewis. Coroutine Theory [online]. 2017 [visited on 2019-
04-30]. Available from: https://lewissbaker.github.io/2017/
09/25/coroutine-theory.
13. KOWALKE, Oliver. Boost.Coroutine2 [online]. 2014 [visited on
2019-04-30]. Available from: https : / / www . boost . org / doc /
libs / 1 _ 70 _ 0 / libs / coroutine2 / doc / html / coroutine2 /
overview.html.
14. KOWALKE, Oliver. Boost.Context [online]. 2014 [visited on 2019-
04-30]. Available from: https://www.boost.org/doc/libs/1_
70_0/libs/context/doc/html/context/overview.html.
15. BENDERSKY, Eli. The cost of dynamic (virtual calls) vs. static (CRTP)
dispatch in C++ [online]. 2013 [visited on 2019-04-30]. Available
from: https : / / eli . thegreenplace . net / 2013 / 12 / 05 / the -
cost - of - dynamic - virtual - calls - vs - static - crtp -
dispatch-in-c.
16. TORJO, John. Boost.Asio C++ Network Programming. Packt Pub-
lishing, 2013. ISBN 9781782163268.
17. BALLO, Botond. Trip Report: C++ Standards Meeting in Oulu, June
2016 [online]. 2016 [visited on 2019-04-30]. Available from: https:
//botondballo.wordpress.com/2016/07/06/trip-report-c-
standards-meeting-in-oulu-june-2016/.
68