Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
35 views8 pages

Fighting Physics A Tough Battle

Uploaded by

Ajay Barthwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views8 pages

Fighting Physics A Tough Battle

Uploaded by

Ajay Barthwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Q

FOCUS

HIGH-PERFORMANCE NETWORKS

Fighting
Physics: A Tough Battle

20 February/March 2009 ACM QUEUE rants: [email protected]


O
ver the past several
years, SaaS (software
as a service) has
become an attrac-
tive option for companies looking
to save money and simplify their
computing infrastructures. SaaS is
an interesting group of techniques
Thinking of doing IPC over the long haul? for moving computing from the
Think again. The laws of physics say you’re hosed. desktop to the cloud; however, as
it grows in popularity, engineers
should be aware of some of the
fundamental limitations they face
when developing these kinds of dis-
tributed applications—in particular,
the finite speed of light.

Jonathan M. Smith, University of Pennsylvania


more queue: www.queue.acm.org ACM QUEUE February/March 2009 21
Fighting Physics: A Tough Battle

Consider a company that wants to build a distributed in terms of instructions, the fundamental units of work
application that does IPC (interprocess communica- for computers. As an example, we can use a 2003-vintage
tion) over the long haul. The obvious advice is “just say single-core machine: the Intel Pentium 4 Extreme Edi-
no”—don’t do it. If you’re going far outside your local tion, which at a 3.2-GHz clock rate was rated at 9,726 mil-
networking environment, the physics of distance and lion instructions per second: 9,726 × 0.019 is 184 million
the speed of light, combined with the delays that come instructions—sufficient, for example, to search through or
from the Internet’s routing infrastructure, tell us that it sort millions of names.
will be much too slow. These concepts are not generally It is always important to keep in mind that the
understood, however, and even when they are, they’re purpose of computer networking is to interconnect
sometimes forgotten. computers, and that computers operate on very short
So, exactly what are the basic principles related to timescales. Also, a single human operation sometimes
speed of light and network hops that all software devel- translates to many computer operations (i.e., round-trips).
opers need to acquaint themselves with? This article For example, opening a single Web page usually requires
answers that question by first working out some quanti- many round-trips, even if you are getting only a single
tative preliminaries with an example, moving on to the large object (e.g., a large picture), as is discussed further
networking implications, and then covering applications. later in this article.
Finally, it provides some rules of thumb to keep in mind
as applications and architectures evolve in reaction to Propagation, Bandwidth, Latencies, and Hops
new network capabilities and unchanging physics. The traversal of the fiber loop between New York and
San Francisco presumes a data-transfer unit of a single
Preliminaries: The physics encoded binary digit of information. The lower bound
The speed of light in a vacuum is exactly 299,792,458 for that traversal would be 2×19, or 38 ms (or 368 mil-
meters/second.1 This is as fast as you can move a bit of lion instructions). The time for this bit to travel from
data, and according to our current understanding of phys- its source to its destination and back again is called its
ics, it is a fundamental constraint of the universe we live propagation delay.
in. In fiber, the speed of light is 2.14×108 meters/second Propagation delay is important, but compared with
or about 70 percent of the speed of light in a vacuum. If the much more common metric of bandwidth—measured
a fiber were stretched in a straight line from New York to in bits per second—it is rarely quoted as a figure of merit.
San Francisco, it would be about 4,125 kilometers long, At least partially, this is because the observed propaga-
and it would take about 19 (4,125 ÷ 214) milliseconds for tion delay depends on context, whereas bandwidth (say
light to make the one-way trip. Assuming an 8,250-km of a fiber-optic transmission system) can be measured
length of fiber was used, you can just double this time to in isolation. Bandwidth can also be increased through
get an estimate for minimum round-trip time. engineering (for example, through encoding schemes
At first glance, 19 ms might seem like a short time, cer- for transmission systems that encode multiple bits per
tainly on a human scale. As computer scientists, however, symbol) and thus is more attractive as a figure of merit to
we are usually concerned with a different time scale: that those who build transmission systems. Finally, bandwidth
of the computer. Here we can calculate the 19 ms is a measure of work, which is attractive to purchasers.
Bandwidth can also affect latency, which is distinct, in
my view, from propagation delay; the propagation delay

22 February/March 2009 ACM QUEUE rants: [email protected]


is a metric for the first bit, while latency is a metric for Stanford University (cs.stanford.edu), two well-connected
the entire data unit, which may contain more than one sites, as being about 87.5 ms. Rounding this to 88 ms and
bit. In general: subtracting the fiber propagation time of 38 ms leaves a
50-ms difference. (Note that the New York–San Francisco
latency = propagation delay + data unit size ÷ bandwidth numbers are assumed to be roughly equivalent to those
from Philadelphia to Palo Alto.) Since these data units are
What this basically says is that the propagation delay only about 500 bits long, bandwidth between Penn and
is only part of the picture and that bandwidth affects per- Stanford would have to be pretty bad (500 bits in 50 ms
formance as well. A look at the impact of the bandwidth would be about 10 Kbps!) to explain the delay. So what
in an example system shows why propagation delay is could be the cause?
so important. Consider a 10-Gbps transmission system There are at least a couple of possible factors, both
and a 1,250-byte (or equivalently, 10 Kbit, chosen both of which can be explained with the notion of hops. To
to reflect a reasonable maximum transmission unit with understand hops, it helps to understand how a network
Ethernets and to make arithmetic easier!) data unit. The differs from our 8,250-km loop of fiber. A real network is
propagation time for the first bit in the NY/SF loop is 38 constructed of many interconnected pieces—for example,
ms, and the last bit arrives a microsecond (10K/10G) later, local area networks and wide area networks. Figure 1
making the total latency 38.001 ms. The majority of the represents a real physical network topology, with many
latency is propagation delay. An interesting arithmetic types of networks and multiple devices. Hosts are labeled
exercise is to compute the distance at which a transmis- with H, routers with R, and network types are shown to
sion system’s latency is double the propagation delay. For be multiple in nature.
a 10-Gbps transmission system and 10-Kbit data-unit size, Many different packet formats and data units are in
this is about 214 meters, or a few city blocks. For smaller use, and the genius of the Internet is that it has a solu-
data units or longer distances, propagation delay is the tion to make them all work together. This interoperability
majority of the latency. (John Shaffer and I offer more layer consists of a packet format and an address that
detail on propagation delay versus latency in a previous are interpreted by devices called IP routers. The subnets
paper.4) interconnecting the routers can use whatever technology
It is instructive to take a few measurements to see they choose as long as they can carry encapsulated IP
what is what. Using the ping utility to send ICMP ECHO packets between routers. Each router-router path is called
packets, I measured the round-trip latency between the a hop. As before with ping, it is instructive to obtain a
University of Pennsylvania (klondike.cis.upenn.edu) and measurement, which I obtained using traceroute between
the two hosts I used before. There are 17 hops reported.

1
Our analysis of an unobstructed fiber did not account for
FIGURE

these routers, nor for the possibility that packets did not
travel “as the crow flies” between the source and destina-
Disparate Network Types
tion. The total propagation delay through this network,
then, is equal to the sum of the propagation time across
H1 H2 H3 each subnet, plus the time required to pass through the
Ethernet routers.
Modern routers such as the Cisco CRS-1 exhibit aver-
H7 R3 H8 age latencies of about 100 microseconds.2 Our Phila-
delphia–Palo Alto example would include roughly 30
R1
point-to- of them in the round-trip path, making the total router
point link latency about 3 ms. The other causes of delay are more
(e.g., ISDN)
FDDI difficult to measure. Routers attempt to optimize a path
H4 R2
ring between two points, but that may be difficult, so in addi-
tion to the delay through the routers we can expect a
Are Overcome
certain delay caused by path selections that deviate from
H5 H6 by Internetworking
a straight lines. Other possible sources of delay include
Technology
slower routers and other intervening appliances (such
as firewalls), queuing (to wait for access to a busy shared

more queue: www.queue.acm.org ACM QUEUE February/March 2009 23


Fighting Physics: A Tough Battle

link), poor route selection, and slow links. Nonetheless, 20 microseconds of latency. The largest cause of propaga-
it is impressive that the IP routing infrastructure is only tion delays in IPC is protocols.
about a factor of two “slower” than the speed of light in Protocols are rules for communicating intended to
fiber: 88 ms versus 38 ms. provide desired properties, such as high application
This observation of the difference between pencil throughput, reliability, or ordering. Reliable message
and paper and measured results leads to the definition delivery is a common application requirement and usu-
of the throughput of a system, which is how many bits ally requires confirmation from the receiver to the sender,
per second you can send after taking into account all the thus implying a round-trip. Communications requiring
real-world limitations—propagation delays, bandwidth, more data than a single packet must use multiple packets,
latency, and hops. implying multiple round-trip times. To see the impact of
the physics on a naïve protocol, imagine an IPC system
Interprocess communication and protocols that uses 10-Kbit packets and must move 100 Kbits (10
In a distributed system, processes that need to commu- packets’ worth of data) across the U.S., which as we have
nicate do so via one or more schemes for IPC.3 Example seen (for a single transcontinental piece of fiber) should
schemes include messages, reliable streams, and remote require about 19 ms. If a new packet is sent only when a
procedure calls. It is easiest to think of IPC in terms previous one has been acknowledged, one packet will be
of messages, sometimes called ADUs (application data sent every 38 ms, and the communication will require
units), as they are the building blocks on which other IPC 380 ms, or almost one half second, independent of the
mechanisms, including reliable bytestreams, are built. bandwidth of the network. Yet, it’s clear that with a high-
Messages may require multiple IP packets. The socket API throughput network, one could have sent all 10 of the
is one example of a way in which message and reliable packets in a row and waited for a confirmation that all 10
bytestream services can be accessed. It resembles input/ arrived, and this could be done in 38 ms.
output, supporting a read/write style of interface. The This example along with figure 2 illustrates what
impact of the IPC software on a single message’s latency is often called the bandwidth-delay product, which is a
is typically low; ping measurements of a local loopback measure of the capacity of a path in bits between a source
interface on klondike.cis.upenn.edu show times of about and a destination. Figure 2 shows that there may be
usable capacity that is not being used, illustrated here

2
by the spaces between packets. If the network were fully
FIGURE

utilized, then all of the capacity would be fully occupied


by packets in flight. When the network is fully occupied
Packets in Flight between a with packets, a bandwidth-delay product of bits will be
Sender and a Reciever in flight between a source and destination. The challenge
is estimating the available capacity at any given time, as
network dynamics could make this estimate highly vari-
sender receiver able. If we overestimate the capacity, too many packets
will be pushed into the network, resulting in congestion.
If we underestimate the capacity, too few packets will be
in flight and performance will suffer.
Optimizing protocols to the available bandwidth-delay
product has been a long-standing problem of interest to

24 February/March 2009 ACM QUEUE rants: [email protected]


the networking community, resulting in many algorithms some applications and protocols more than others. In a
for flow control and congestion control. TCP/IP, for request/response type of IPC, such as a remote procedure
example, uses acknowledgments from the receiver to pace call, remote copies of data can greatly delay application
the sender, opening and closing a window of unacknowl- execution, since the procedure call is blocked waiting on
edged packets that is a measure of the bandwidth-delay the response. Early Web applications were slow because
product. If a packet loss occurs, TCP/IP assumes it is con- the original HTTP opened a new TCP/IP connection for
gestion and closes the window. Otherwise, it continues each fetched object, meaning that the new connection’s
trying to open the window to discover new bandwidth as estimate of the bandwidth-delay was almost always an
it becomes available. underestimate. Newer HTTPs exhibit persistent learning
Figure 3 shows how TCP/IP attempts to discover the of bandwidth-delay estimates and perform much better.
correct window size for a path through the network. The The implication for distributed systems is that one
line indicates what is available, and significantly, this size does not fit all. For example, use of a centralized data
changes with time, as competing connections come and store will create large numbers of hosts that cannot possi-
go, and capacities change with route changes. When new bly perform well if they are distant from the data store. In
capacity becomes available, the protocol tries to discover some cases, where replicas of data or services are viable,
it by pushing more packets into the network until losses data can be cached and made local to applications. This,
indicate that too much capacity is used; in that case for example, is the logical role of a Web-caching system.
the protocol quickly reduces the window size to protect In other cases, however, such as stock exchanges, the data
the network from overuse. Over time, the “sawtooth” is live and latency characteristics in such circumstances
reflected in this figure results as the algorithm attempts to have significant financial implications, so caching is not
learn the network capacity. effective for applications such as computerized trading.
A major “physics” challenge for TCP/IP is that it is While in principle, distributed systems might be built
learning on a round-trip timescale and is thus affected by that take this latency into account, in practice, it has
distance. Some new approaches based on periodic router proven easier to move the processing close to the market.
estimates of available capacity are not subject to round-
trip time variation and may be better in achieving high Rules of thumb to hold your own with physics
throughputs with high bandwidth-delay paths. Here are a few suggestions that may help software devel-
opers adapt to the laws of physics.
Implications for distributed systems Bandwidth helps latency, but not propagation delay.
Many modern distributed systems are built as if all If a distributed application can move fewer, larger mes-
network locations are roughly equivalent. As we have sages, then this can help the application as the total
seen, even if there is connectivity, delay can affect cost in delay is reduced since fewer round-trip delays are
introduced. The effects of bandwidth are quickly lost for

3
large distances and small data objects. Noise can also be
FIGURE

a big issue for increasingly more common wireless links,


TCP/IP Attempts to Discover the where shorter packets suffer a lower per-packet risk of bit
Available Network Capacity errors. The lesson for the application software designer
is to think carefully about a design’s assumptions about
latency. Assume large latencies, make it work under those
circumstances, and take advantage of lower latencies
when they are available. For example, use a Web-embed-
ded caching scheme to ensure the application is respon-
window

bottleneck sive when latencies are long, but no cache when it’s not
bandwidth
necessary.
Spend available resources (such as throughput and
storage capacity) to save precious ones, such as response
time
time. This may be the most important of these rules. An
example is the use of caches, including preemptive cach-
ing of data. In principle, caches can be replicated local to

more queue: www.queue.acm.org ACM QUEUE February/March 2009 25


Fighting Physics: A Tough Battle

applications, causing some cost in storage and through- bandwidth, the most commonly used figure of merit for
put (to maintain the cache) to be incurred. In practice, networks. Modern distributed applications require adher-
this is almost always a good bet when replicas can be ence to some rules of thumb to maintain their respon-
made, because growth in storage capacities and network siveness over a wide range of propagation delays. Q
throughputs appears to be increasing on a steady expo-
nential. Prefilling the cache with data likely to be used Acknowledgments
means that some capacity will be wasted (what is fetched Comments from the ACM Queue editorial board, par-
but not needed) but that the effects of some delays will be ticularly Eric Allman, and from Craig Partridge greatly
mitigated when predictions of what is needed are good. improved this article.
Think relentlessly about the architecture of the distrib-
uted application. One key observation is that a distributed References
system can be distributed based on function. To return 1. Mohr, P. J., Taylor, B. N. 2005. CODATA recommended
to the design of a system with a live data store (such as values of the fundamental physical constants. Reviews
a stock market), we might place the program trading of of Modern Physics 77(1): 1-107.
stocks near the relevant exchanges, while placing the user 2. 40-gig router test results. 2004. Light Reading;
interaction functionality, account management, compli- http://www.lightreading.com/document.asp?doc_
ance logging, etc. remotely in less exchange-local real id=63606&page_number=4&image_number=9.
estate. Part of such a functional decomposition exercise is 3. Partridge, C. 1994. Gigabit Networking. Addison-Wesley
identifying where latency makes a difference and where Professional.
the delay must be addressed directly rather than via cach- 4. Shaffer, J. H., Smith, J. M. 1996. A new look at band-
ing techniques. width latency tradeoffs. University of Pennsylvania,
Where possible, adapt to varying latencies. The CIS TR MS-CIS-96-10; http://repository.upenn.edu/cgi/
example of protocols maximizing throughput by adapt- viewcontent.cgi?article=1192&context=cis_reports.
ing to bandwidth-delay capacities shows how a wide
range of latencies can be accommodated. For distributed LOVE IT, HATE IT? LET US KNOW
applications, this might be accomplished by dynamically [email protected]
relocating elements of a system (e.g., via process migra-
tion or remote evaluation). JONATHAN M. SMITH is the Olga and Alberico Pompa
None of these suggestions will allow you to overcome Professor of Engineering and Applied Science and a profes-
physics, although prefetching in the best of circumstances sor of computer and information science at the University
might provide this illusion. With careful design, however, of Pennsylvania. He served as a program manager at DARPA
responsive distributed applications can be architected and from 2004 to 2006 and was awarded the OSD (Office of the
implemented to operate over long distances. Secretary of Defense) Medal for Exceptional Public Service
in 2006. He is an IEEE Fellow. His current research interests
Summary range from programmable network infrastructures and cog-
Propagation delay is an important physical limit. This nitive radios to disinformation theory and architectures for
measure is often given short shrift in system design as computer-augmented immune response.
application architectures evolve, but it may have more © 2009 ACM 1542-7730/ 09/0200 $5.00
performance impact on real distributed applications than

26 February/March 2009 ACM QUEUE rants: [email protected]


You’ve come a long way.
Share what you’ve learned.

ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring


network in engineering, science and mathematics. MentorNet’s award-winning
One-on-One Mentoring Programs pair ACM student members with mentors
from industry, government, higher education, and other sectors.

• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.

Make a difference to a student in your field.


Sign up today at: www.mentornet.net
Find out more at: www.acm.org/mentornet

MentorNet’s sponsors include 3M Foundation, ACM, Alcoa Foundation, Agilent Technologies, Amylin Pharmaceuticals,
Bechtel Group Foundation, Cisco Systems, Hewlett-Packard Company, IBM Corporation, Intel Foundation, Lockheed
Martin Space Systems, National Science Foundation, Naval Research Laboratory, NVIDIA, Sandia National Laboratories,
Schlumberger, S.D. Bechtel, Jr. Foundation, Texas Instruments, and The Henry Luce Foundation.

more queue: www.queue.acm.org ACM QUEUE February/March 2009 27

You might also like