Fighting Physics A Tough Battle
Fighting Physics A Tough Battle
FOCUS
HIGH-PERFORMANCE NETWORKS
Fighting
Physics: A Tough Battle
Consider a company that wants to build a distributed in terms of instructions, the fundamental units of work
application that does IPC (interprocess communica- for computers. As an example, we can use a 2003-vintage
tion) over the long haul. The obvious advice is “just say single-core machine: the Intel Pentium 4 Extreme Edi-
no”—don’t do it. If you’re going far outside your local tion, which at a 3.2-GHz clock rate was rated at 9,726 mil-
networking environment, the physics of distance and lion instructions per second: 9,726 × 0.019 is 184 million
the speed of light, combined with the delays that come instructions—sufficient, for example, to search through or
from the Internet’s routing infrastructure, tell us that it sort millions of names.
will be much too slow. These concepts are not generally It is always important to keep in mind that the
understood, however, and even when they are, they’re purpose of computer networking is to interconnect
sometimes forgotten. computers, and that computers operate on very short
So, exactly what are the basic principles related to timescales. Also, a single human operation sometimes
speed of light and network hops that all software devel- translates to many computer operations (i.e., round-trips).
opers need to acquaint themselves with? This article For example, opening a single Web page usually requires
answers that question by first working out some quanti- many round-trips, even if you are getting only a single
tative preliminaries with an example, moving on to the large object (e.g., a large picture), as is discussed further
networking implications, and then covering applications. later in this article.
Finally, it provides some rules of thumb to keep in mind
as applications and architectures evolve in reaction to Propagation, Bandwidth, Latencies, and Hops
new network capabilities and unchanging physics. The traversal of the fiber loop between New York and
San Francisco presumes a data-transfer unit of a single
Preliminaries: The physics encoded binary digit of information. The lower bound
The speed of light in a vacuum is exactly 299,792,458 for that traversal would be 2×19, or 38 ms (or 368 mil-
meters/second.1 This is as fast as you can move a bit of lion instructions). The time for this bit to travel from
data, and according to our current understanding of phys- its source to its destination and back again is called its
ics, it is a fundamental constraint of the universe we live propagation delay.
in. In fiber, the speed of light is 2.14×108 meters/second Propagation delay is important, but compared with
or about 70 percent of the speed of light in a vacuum. If the much more common metric of bandwidth—measured
a fiber were stretched in a straight line from New York to in bits per second—it is rarely quoted as a figure of merit.
San Francisco, it would be about 4,125 kilometers long, At least partially, this is because the observed propaga-
and it would take about 19 (4,125 ÷ 214) milliseconds for tion delay depends on context, whereas bandwidth (say
light to make the one-way trip. Assuming an 8,250-km of a fiber-optic transmission system) can be measured
length of fiber was used, you can just double this time to in isolation. Bandwidth can also be increased through
get an estimate for minimum round-trip time. engineering (for example, through encoding schemes
At first glance, 19 ms might seem like a short time, cer- for transmission systems that encode multiple bits per
tainly on a human scale. As computer scientists, however, symbol) and thus is more attractive as a figure of merit to
we are usually concerned with a different time scale: that those who build transmission systems. Finally, bandwidth
of the computer. Here we can calculate the 19 ms is a measure of work, which is attractive to purchasers.
Bandwidth can also affect latency, which is distinct, in
my view, from propagation delay; the propagation delay
1
Our analysis of an unobstructed fiber did not account for
FIGURE
these routers, nor for the possibility that packets did not
travel “as the crow flies” between the source and destina-
Disparate Network Types
tion. The total propagation delay through this network,
then, is equal to the sum of the propagation time across
H1 H2 H3 each subnet, plus the time required to pass through the
Ethernet routers.
Modern routers such as the Cisco CRS-1 exhibit aver-
H7 R3 H8 age latencies of about 100 microseconds.2 Our Phila-
delphia–Palo Alto example would include roughly 30
R1
point-to- of them in the round-trip path, making the total router
point link latency about 3 ms. The other causes of delay are more
(e.g., ISDN)
FDDI difficult to measure. Routers attempt to optimize a path
H4 R2
ring between two points, but that may be difficult, so in addi-
tion to the delay through the routers we can expect a
Are Overcome
certain delay caused by path selections that deviate from
H5 H6 by Internetworking
a straight lines. Other possible sources of delay include
Technology
slower routers and other intervening appliances (such
as firewalls), queuing (to wait for access to a busy shared
link), poor route selection, and slow links. Nonetheless, 20 microseconds of latency. The largest cause of propaga-
it is impressive that the IP routing infrastructure is only tion delays in IPC is protocols.
about a factor of two “slower” than the speed of light in Protocols are rules for communicating intended to
fiber: 88 ms versus 38 ms. provide desired properties, such as high application
This observation of the difference between pencil throughput, reliability, or ordering. Reliable message
and paper and measured results leads to the definition delivery is a common application requirement and usu-
of the throughput of a system, which is how many bits ally requires confirmation from the receiver to the sender,
per second you can send after taking into account all the thus implying a round-trip. Communications requiring
real-world limitations—propagation delays, bandwidth, more data than a single packet must use multiple packets,
latency, and hops. implying multiple round-trip times. To see the impact of
the physics on a naïve protocol, imagine an IPC system
Interprocess communication and protocols that uses 10-Kbit packets and must move 100 Kbits (10
In a distributed system, processes that need to commu- packets’ worth of data) across the U.S., which as we have
nicate do so via one or more schemes for IPC.3 Example seen (for a single transcontinental piece of fiber) should
schemes include messages, reliable streams, and remote require about 19 ms. If a new packet is sent only when a
procedure calls. It is easiest to think of IPC in terms previous one has been acknowledged, one packet will be
of messages, sometimes called ADUs (application data sent every 38 ms, and the communication will require
units), as they are the building blocks on which other IPC 380 ms, or almost one half second, independent of the
mechanisms, including reliable bytestreams, are built. bandwidth of the network. Yet, it’s clear that with a high-
Messages may require multiple IP packets. The socket API throughput network, one could have sent all 10 of the
is one example of a way in which message and reliable packets in a row and waited for a confirmation that all 10
bytestream services can be accessed. It resembles input/ arrived, and this could be done in 38 ms.
output, supporting a read/write style of interface. The This example along with figure 2 illustrates what
impact of the IPC software on a single message’s latency is often called the bandwidth-delay product, which is a
is typically low; ping measurements of a local loopback measure of the capacity of a path in bits between a source
interface on klondike.cis.upenn.edu show times of about and a destination. Figure 2 shows that there may be
usable capacity that is not being used, illustrated here
2
by the spaces between packets. If the network were fully
FIGURE
3
large distances and small data objects. Noise can also be
FIGURE
bottleneck sive when latencies are long, but no cache when it’s not
bandwidth
necessary.
Spend available resources (such as throughput and
storage capacity) to save precious ones, such as response
time
time. This may be the most important of these rules. An
example is the use of caches, including preemptive cach-
ing of data. In principle, caches can be replicated local to
applications, causing some cost in storage and through- bandwidth, the most commonly used figure of merit for
put (to maintain the cache) to be incurred. In practice, networks. Modern distributed applications require adher-
this is almost always a good bet when replicas can be ence to some rules of thumb to maintain their respon-
made, because growth in storage capacities and network siveness over a wide range of propagation delays. Q
throughputs appears to be increasing on a steady expo-
nential. Prefilling the cache with data likely to be used Acknowledgments
means that some capacity will be wasted (what is fetched Comments from the ACM Queue editorial board, par-
but not needed) but that the effects of some delays will be ticularly Eric Allman, and from Craig Partridge greatly
mitigated when predictions of what is needed are good. improved this article.
Think relentlessly about the architecture of the distrib-
uted application. One key observation is that a distributed References
system can be distributed based on function. To return 1. Mohr, P. J., Taylor, B. N. 2005. CODATA recommended
to the design of a system with a live data store (such as values of the fundamental physical constants. Reviews
a stock market), we might place the program trading of of Modern Physics 77(1): 1-107.
stocks near the relevant exchanges, while placing the user 2. 40-gig router test results. 2004. Light Reading;
interaction functionality, account management, compli- http://www.lightreading.com/document.asp?doc_
ance logging, etc. remotely in less exchange-local real id=63606&page_number=4&image_number=9.
estate. Part of such a functional decomposition exercise is 3. Partridge, C. 1994. Gigabit Networking. Addison-Wesley
identifying where latency makes a difference and where Professional.
the delay must be addressed directly rather than via cach- 4. Shaffer, J. H., Smith, J. M. 1996. A new look at band-
ing techniques. width latency tradeoffs. University of Pennsylvania,
Where possible, adapt to varying latencies. The CIS TR MS-CIS-96-10; http://repository.upenn.edu/cgi/
example of protocols maximizing throughput by adapt- viewcontent.cgi?article=1192&context=cis_reports.
ing to bandwidth-delay capacities shows how a wide
range of latencies can be accommodated. For distributed LOVE IT, HATE IT? LET US KNOW
applications, this might be accomplished by dynamically [email protected]
relocating elements of a system (e.g., via process migra-
tion or remote evaluation). JONATHAN M. SMITH is the Olga and Alberico Pompa
None of these suggestions will allow you to overcome Professor of Engineering and Applied Science and a profes-
physics, although prefetching in the best of circumstances sor of computer and information science at the University
might provide this illusion. With careful design, however, of Pennsylvania. He served as a program manager at DARPA
responsive distributed applications can be architected and from 2004 to 2006 and was awarded the OSD (Office of the
implemented to operate over long distances. Secretary of Defense) Medal for Exceptional Public Service
in 2006. He is an IEEE Fellow. His current research interests
Summary range from programmable network infrastructures and cog-
Propagation delay is an important physical limit. This nitive radios to disinformation theory and architectures for
measure is often given short shrift in system design as computer-augmented immune response.
application architectures evolve, but it may have more © 2009 ACM 1542-7730/ 09/0200 $5.00
performance impact on real distributed applications than
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.
MentorNet’s sponsors include 3M Foundation, ACM, Alcoa Foundation, Agilent Technologies, Amylin Pharmaceuticals,
Bechtel Group Foundation, Cisco Systems, Hewlett-Packard Company, IBM Corporation, Intel Foundation, Lockheed
Martin Space Systems, National Science Foundation, Naval Research Laboratory, NVIDIA, Sandia National Laboratories,
Schlumberger, S.D. Bechtel, Jr. Foundation, Texas Instruments, and The Henry Luce Foundation.