Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
833 views12 pages

Torrent Clustering

This paper presents a detailed experimental study of the peer selection strategy in the BitTorrent protocol, validating properties such as clustering of similar-bandwidth peers and the effectiveness of sharing incentives. It highlights the impact of seed provisioning on peer clustering and overall system performance, demonstrating that underprovisioned seeds lead to less effective data sharing. The study also introduces guidelines for content providers and discusses a proposed tracker protocol extension to improve upload utilization.

Uploaded by

sushmsn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
833 views12 pages

Torrent Clustering

This paper presents a detailed experimental study of the peer selection strategy in the BitTorrent protocol, validating properties such as clustering of similar-bandwidth peers and the effectiveness of sharing incentives. It highlights the impact of seed provisioning on peer clustering and overall system performance, demonstrating that underprovisioned seeds lead to less effective data sharing. The study also introduces guidelines for content providers and discusses a proposed tracker protocol extension to improve upload utilization.

Uploaded by

sushmsn
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Clustering and Sharing Incentives in BitTorrent Systems

Arnaud Legout Nikitas Liogkas, Eddie Kohler,


I.N.R.I.A. Lixia Zhang
Sophia Antipolis, France University of California, Los Angeles
[email protected] Los Angeles, CA USA
{nikitas, kohler, lixia}@cs.ucla.edu

ABSTRACT volumes in a global scale is arguably due to their scalability and ro-
Peer-to-peer protocols play an increasingly instrumental role in In- bustness properties. Understanding the mechanisms that affect the
ternet content distribution. It is therefore important to gain a com- performance of such protocols and overcoming the existing short-
plete understanding of how these protocols behave in practice and comings will ensure the continued success of peer-to-peer data de-
how their operating parameters affect overall system performance. livery. To that end, this paper presents a detailed experimental study
This paper presents the first detailed experimental investigation of of the peer selection strategy in BitTorrent, one of the most popular
the peer selection strategy in the popular BitTorrent protocol. By peer-to-peer content distribution protocols.
observing more than 40 nodes in instrumented private torrents, we Recently, researchers have formulated analytical models for the
validate three protocol properties that, though believed to hold, problem of efficient data exchange among peers, and measurement
have not been previously demonstrated experimentally: the clus- studies using actual download traces have attempted to shed light
tering of similar-bandwidth peers, the effectiveness of BitTorrent’s into the success of BitTorrent. However, certain properties of these
sharing incentives, and the peers’ high uplink utilization. In addi- studies have interfered with their accurate evaluation of the dynam-
tion, we observe that BitTorrent’s modified choking algorithm in ics of BitTorrent algorithms and their impact on overall system
seed state provides uniform service to all peers, and that an un- performance. For example, analytical models can provide valu-
derprovisioned initial seed leads to absence of peer clustering and able insight, but they are typically based on unrealistic assump-
less effective sharing incentives. Based on our results, we provide tions, such as giving all participants global system knowledge [21];
guidelines for seed provisioning by content providers, and discuss actual download traces may differ substantially from the their pre-
a tracker protocol extension that addresses an identified limitation dictions [11, 20]. Furthermore, most measurement studies have
of the protocol. evaluated peers connected to public torrents—BitTorrent download
sessions [11, 12, 20]. They provide detailed data about the over-
all behavior of deployed BitTorrent systems, however, the inher-
Categories and Subject Descriptors ent limitations in collecting per-peer information in a public torrent
C.2.2 [Computer-Communication Networks]: Network Proto- obstructs the understanding of individual peer decisions during the
cols; C.2.4 [Computer-Communication Networks]: Distributed download. Legout et al. [15] recently attempted to evaluate those
Systems; C.4 [Performance of Systems] decisions, but only from the viewpoint of a single peer.
To overcome these limitations, we conduct extensive experi-
ments on a private testbed and collect data from all peers in a con-
General Terms trolled environment. In particular, we focus on the so-called chok-
Algorithms, Measurement, Performance ing algorithm for peer selection, which may be the driving factor
behind BitTorrent’s high performance [8]. This approach allows us
Keywords to examine the behavior of individual peers under a microscope and
observe their decisions and interactions during the download.
BitTorrent, choking algorithm, clustering, incentives, seed provi- Our main contribution is to demonstrate that the choking al-
sioning gorithm facilitates the formation of clusters of similar-bandwidth
peers, ensures effective sharing incentives by rewarding peers who
1. INTRODUCTION contribute data to the system, and maintains high upload utilization
In just a few years, peer-to-peer content distribution has come for the majority of the download duration. These properties have
to generate a significant portion of the total Internet traffic [14]. been hinted at in previous work; this study constitutes their first
The widespread adoption of such protocols for delivering large data experimental validation. We also show that, if the seed is under-
provisioned, all peers tend to complete their download around the
same time, independently of how much they upload. Clusters are
no longer formed, and, interestingly, high-capacity peers assist the
Permission to make digital or hard copies of all or part of this work for seed in disseminating data to low-capacity ones, resulting in every-
personal or classroom use is granted without fee provided that copies are one maintaining high upload utilization. Finally, based on our ob-
not made or distributed for profit or commercial advantage and that copies servations, we provide guidelines for seed provisioning by content
bear this notice and the full citation on the first page. To copy otherwise, to providers, and discuss a tracker protocol extension that addresses
republish, to post on servers or to redistribute to lists, requires prior specific an identified limitation of the protocol, namely the low upload uti-
permission and/or a fee.
SIGMETRICS’07, June 12–16, 2007, San Diego, California, USA.
lization at the beginning of a torrent’s lifetime.
Copyright 2007 ACM 978-1-59593-639-4/07/0006 ...$5.00.

301
The rest of this paper is organized as follows. Section 2 provides • Leecher and Seed. A peer can be in one of two states: the
a description of the BitTorrent protocol and an explanation of the leecher state, when it is still downloading pieces of the con-
choking algorithm, as implemented in the official BitTorrent client. tent, and the seed state, when it has all the pieces and is shar-
Section 3 describes our experimental methodology and the ratio- ing them with others.
nale behind the experiments, while Section 4 presents our results.
• Initial Seed. The initial seed is the first peer that offers the
Section 5 discusses seed provisioning guidelines and the proposed
content for download. There can be more than one initial
tracker protocol extension. Lastly, Section 6 sets this study in the
seeds. In this paper, however, we only consider the case of a
context of related work, and Section 7 concludes.
single initial seed.
• Rarest-First Algorithm. The rarest-first algorithm is the
2. BACKGROUND piece selection strategy used by BitTorrent clients. It is also
BitTorrent is a peer-to-peer content distribution protocol that known as the local rarest-first algorithm since it bases the
scales well with the number of participating peers. A BitTorrent selection on the available information locally at each peer.
system capitalizes on the upload capacity of each peer in order to Peers independently maintain a list of the pieces each of their
increase global system capacity as the number of peers increases. remote peers has and build a rarest-pieces set containing the
A major factor behind BitTorrent’s success is a built-in incentives indices of the pieces with the least number of copies. This
mechanism, implemented by its choking algorithm, which is de- set is updated every time a remote peer announces that it ac-
signed to encourage peers to contribute data. The rest of this section quired a new piece, and is used by the local peer to select the
introduces the terminology used in the paper and describes BitTor- next piece to download.
rent’s operation in detail, with a particular focus on the choking
algorithm. • Choking Algorithm. The choking algorithm, also known as
the tit-for-tat algorithm, is the peer selection strategy used by
2.1 Terminology BitTorrent clients. We provide a detailed description of this
The terminology used in the BitTorrent community is not stan- algorithm in Section 2.3.
dardized. For the sake of clarity, we define here the terms used • Official BitTorrent Client. The official BitTorrent client [1],
throughout this paper. also known as the mainline client, was the first BitTorrent
implementation and was initially developed by Bram Cohen,
• Torrent. A torrent is the set of peers cooperating to down- BitTorrent’s creator.
load the same content using the BitTorrent protocol.
2.2 BitTorrent Operation
• Tracker. The tracker is the only centralized component of Prior to distribution, the content is divided into multiple pieces,
the system. It is not involved in the actual distribution of the and each piece into multiple blocks. The metainfo file is then cre-
content, but it keeps track of all peers currently participating ated by the content provider. To join a torrent, a peer P retrieves
in the download, and it collects statistics. the metainfo file out of band, usually from a well-known website,
and contacts the tracker that responds with a peer set of randomly
• Pieces and Blocks. Content transferred using BitTorrent is selected peers, possibly including both seeds and leechers. P then
split into pieces, with each piece being split into multiple starts contacting peers in this set and requesting different pieces of
blocks. Although blocks are the transmission unit, peers can the content.
only share complete pieces with others. Most clients nowadays use the rarest-first algorithm for piece se-
lection. In this manner, peer selects the next piece to download
• Metainfo file. The metainfo file, also called a torrent file, from its rarest-pieces set. A local peer determines which pieces its
contains all the information necessary to download the con- remote peers have based on bitfield messages exchanged upon es-
tent and includes the number of pieces, SHA-1 hashes for all tablishing new connections, which contain a list of all the pieces a
the pieces that are used to verify received data, and the IP peer has. Peers also send have messages to everyone in their peer
address and port number of the tracker. set when they successfully receive and verify a new piece.
A peer uses the choking algorithm to decide which peers to ex-
• Interested and Choked. We say that peer A is interested change data with. The algorithm generally gives preference to
in peer B when B has pieces of the content that A does not those peers who upload data at high rates. Once per rechoke period,
have. Conversely, peer A is not interested in peer B when B typically set to ten seconds, a peer re-calculates the data receiving
only has a subset of the pieces of A. We also say that peer rates from all peers in its peer set. It then selects the fastest ones,
A is choked by peer B when B decides not to send any data a fixed number of them, and uploads only to those for the duration
to A. Conversely, peer A is unchoked by peer B when B is of the period. In BitTorrent parlance, a peer unchokes the fastest
willing to send data to A. Note that this does not necessarily uploaders via a regular unchoke, and chokes all the rest. In addi-
mean that peer B is uploading data to A, but rather that B will tion, it unchokes a randomly selected peer via a so-called optimistic
upload to A if A issues a data request. unchoke. The logic behind this is explained in detail in Section 2.3.
Seeds, who do not need to download any pieces, follow a dif-
• Peer Set. Each peer maintains a list of other peers to which ferent unchoke strategy. Most implementations dictate that seeds
it has open TCP connections. We call this list the peer set, unchoke those leechers that download data at the highest rates, in
and it is also known as the neighbor set. order to better utilize seed capacity in disseminating the content
as efficiently as possible. However, the official BitTorrent client
• Local and Remote Peers. When describing the choking al- recently introduced a modified unchoke algorithm in seed state, in
gorithm, we take the viewpoint of a single peer, which we version 4.0.0. We perform the first detailed experimental evaluation
call the local peer. We refer to the peers in the local peer’s of this modified algorithm and show that it enables a more uniform
peer set as remote peers. utilization of the seed bandwidth across all leechers.

302
2.3 Choking Algorithm discovery mechanism to continually evaluate the upload bandwidth
We now describe the choking algorithm in detail as implemented of peers in the peer set in an effort to discover better partners. They
in the official client, version 4.0.2. The algorithm was initially also enable new peers that do not have any pieces yet to bootstrap
introduced to foster a high level of data exchange reciprocation into the torrent by giving them some initial pieces without requiring
and is one of the main factors behind BitTorrent’s fairness model: any reciprocation.
peers that contribute data to others at high rates should receive In the seed state, older versions of the official client, as well as
high download throughput, and free-riders, peers that do not up- many current versions of other clients, perform the same steps as
load, should be penalized by being unable to achieve high down- in leecher state, with the only difference being that the ordering in
load rates. It is worth noting that, although the algorithm has been step 1 is based on data transmission rates from the seed, rather than
shown to perform well in a variety of scenarios, it has recently been to it. Consequently, peers with high download capacity are favored
found that it does not completely eliminate free-riding [16, 17, 23]. independently of their contribution to the torrent, a fact that could
In particular, a peer may improve its download rates by download- be exploited by free-riders [16].
ing from seeds, acquiring a large view of the peers in the torrent, or In version 4.0.0, the official client introduced a modified chok-
benefiting from many optimistic unchokes. We discuss this issue ing algorithm in seed state. According to this modified algorithm,
further in Section 4.1.2. a seed performs the same fixed number of n parallel uploads as in
As we noted earlier, the choking algorithm is different for leech- leecher state, but with different peer selection criteria. The algo-
ers and seeds. When in leecher state, a peer P unchokes a fixed rithm is executed periodically at every rechoke period, i.e., every
number of remote peers. Unless specified explicitly by the user, ten seconds, and in addition, whenever an unchoked and interested
this number of parallel uploads is determined by P’s upload band- peer leaves the peer set, or whenever an unchoked peer switches its
width. For example, for an upload limit greater than or equal to 15 interest state. Every time the choking algorithm is executed, a new
kB/s but less than 42 kB/s this number is set to 4. For generality, in round starts, and the following steps are taken.
the following we assume that the number of parallel uploads is set
1. The local peer orders the interested and unchoked remote
to n.
leechers according to the time it has sent them an unchoke
In leecher state, the choking algorithm is executed periodically
message, most recently unchoked peers first. This is the ini-
at every rechoke period, i.e., every ten seconds, and in addition,
tial time the local peer had unchoked them; if the local peer
whenever an unchoked and interested peer leaves the peer set, or
keeps uploading to them for more than one rechoke periods,
whenever an unchoked peer switches its interest state. As a re-
it does not send them additional unchoke messages. This step
sult, the time interval between two executions of the algorithm can
only considers leechers to which an unchoke message has
sometime be shorter than a rechoke period. Every time the chok-
been sent recently (less than twenty seconds ago) or leech-
ing algorithm is executed, we say that a new round starts, and the
ers that have pending requests for blocks (to ensure that they
following steps are taken.
get the requested data as soon as possible). In case of a tie,
1. The local peer orders interested remote leechers according leechers are ordered according to their download rates from
to the rates at which it received data from them, and ignores the seed, fastest ones first, just like the old algorithm did.
leechers that have not sent any data in the last thirty seconds. Note that, as leechers do not upload anything to seeds, the
These so-called snubbed peers are excluded from consider- notion of snubbed peers does not exist in seed state.
ation in order to guarantee that only contributing peers are
2. The number of optimistic unchokes to perform over the du-
unchoked.
ration of the next three rechoke periods, i.e., thirty seconds,
2. The n − 1 leechers with the highest rates are unchoked via a is determined using a heuristic. These optimistic unchokes
regular unchoke. are uniformly spread over this duration, performing no op-
timistic unchokes per rechoke period. Due to rounding is-
3. In addition, every three rounds, an interested candidate peer sues, no can be different for each of the three rechoke pe-
is chosen at random to be unchoked via an optimistic un- riods. For instance, when the number of parallel uploads is
choke. If this peer is not unchoked via a regular unchoke, 4, the heuristic dictates that only two optimistic unchokes be
it is unchoked via an optimistic unchoke and the round com- performed in the entire thirty-second period. Thus, one op-
pletes. If this peer is already unchoked via a regular unchoke, timistic unchoke is performed during each of the first two
a new candidate peer is chosen at random. periods and none during the last.
(a) If the candidate peer is interested in the local peer, it 3. At each rechoke period, the first n − no leechers in the list
is unchoked via an optimistic unchoke and the round from step 1 are unchoked via regular unchokes.
completes.
(b) Otherwise, the candidate peer is unchoked anyway, and Step 1 includes the key feature of the modified algorithm in seed
step 3a is repeated with a new randomly chosen can- state. On the one hand, leechers are no longer unchoked based
didate. The round completes when an interested peer on their observed download rates from the seed, but mainly based
is found or when there are no more peers to choose, on the last time an unchoke message was sent to them. Thus, af-
whichever comes first. ter a seed has been sending data to a leecher for six rechoke periods
(when the number of parallel uploads is 4), it will stop doing so and
Although more than n peers can be unchoked by the algorithm, select another leecher to serve. In this manner, a seed will provide
only n interested peers can be unchoked in the same round. Un- service to all leechers sooner or later, preventing any single leecher
choking non-interested peers improves the reaction time in case from monopolizing it. On the other hand, according to the official
one of those peers becomes interested during the following re- client’s version notes, this modified choking algorithm in seed state
choke period; data transfer can begin right away without waiting also aims to reduce the amount of duplicate data a seed needs to
for the choking algorithm to be executed. Furthermore, optimistic upload before it has pushed out a full copy of the content into the
unchokes serve two major purposes. They function as a resource torrent. It strives to achieve that by keeping leechers unchoked for

303
six rechoke periods, in order to prevent high leecher turnover from torrent for the duration of the experiment, while leechers discon-
resulting in the transmission of the same pieces to different leech- nect immediately after completing their download.
ers. Interestingly, the most recent version of the official client has We consider both a well-provisioned and an underprovisioned
reverted back to the original choking algorithm in seed state. Al- initial seed. Seed upload capacity has already been shown to be
though the modified version of the algorithm we described here is critical to the performance at the beginning of a torrent’s lifetime,
more robust to modified free-riding implementations, it might be before the seed has uploaded a complete copy of the content [7, 15].
less efficient in torrents with compliant peers. Since the company However, the impact of an initial seed with limited capacity on sys-
behind the official client has been targeting legal content distribu- tem properties is not clear. Nevertheless, appropriate provisioning
tion, where client alteration would arguably be harder, it may aim of initial seeds is of critical importance to content providers. We at-
to optimize the implementation for this scenario. tempt to sketch recommendations on this issue in Section 5.1 based
Some other implementations have included a super-seeding fea- on our experimental results.
ture with similar goals, in particular to assist a service provider with The available bandwidth of PlanetLab nodes is relatively high
limited upload capacity in seeding a large torrent. A seed with this for typical torrents. We define upload limits on the leechers and
feature masquerades as a normal leecher with no data. As other seed to model realistic scenarios, but do not define any download
peers connect to it, it will advertise a piece that it has never up- limits, nor do we attempt to match our upload limits to inherent
loaded before or that is very rare. After uploading this piece to a limitations of PlanetLab nodes. Thus, we might end up defining a
given leecher, the seed will not advertise any new pieces to that high upload limit on a node that cannot possibly send data that fast,
leecher until it sees another peer’s ’have’ message for the piece, due to network or other problems. Our results include the effects of
indicating that the leecher has indeed shared the piece with others. local network fluctuations, but we believe that the conclusions we
This algorithm has anecdotally resulted in much higher seeding ef- draw are not predicated on such effects. Our experiments utilize 41
ficiencies by reducing the amount of duplicate pieces uploaded by PlanetLab nodes, of which 2 are located in Canada and the rest are
the seed, and limiting the amount of data sent to peers who do not spread across the continental United States. We conduct all runs of
contribute [2]. A single seed running in this mode is rumored to be an experiment consecutively in time on the same set of machines.
able to upload a full copy of the content after only uploading 105% We collect our measurements using a modified version of the
of the content data volume. Since the official client has not imple- official BitTorrent implementation, instrumented to record interest-
mented this feature, our experiments do not measure its effect on ing events and peer interactions. Our instrumented client, which is
the efficiency of the initial seed. We instead measure the number of based on version 4.0.2 of the official client (released in May 2005),
duplicate pieces uploaded when employing the modified choking is publicly available for download [3]. We collect a log of each
algorithm in seed state. message sent or received along with the content of the message, a
log of each state change, the rate estimates for remote peers used
3. METHODOLOGY by the choking algorithm, and other relevant information, such as
the internal states of the choking algorithm. Otherwise specified,
3.1 Experimental Setup we run our experiments with the default client parameters.
All our experiments were performed in private torrents on the 3.2 Torrent Configurations
PlanetLab experimental platform [5]. PlanetLab’s convenient tools We experimented with several torrent configurations. The pa-
for collecting measurements from geographically dispersed clients rameters we changed from configuration to configuration are the
greatly facilitated our work. For instance, in order to deploy and upload rate limits for the seed and leechers and the upload band-
launch BitTorrent clients on PlanetLab nodes, we utilize the pssh width distribution of leechers. As mentioned before, leecher down-
tools [4]. PlanetLab nodes are typically not behind NATs, so each load bandwidth is never artificially limited, although local network
peer in our experiments can be uniquely identified by its IP address. characteristics may impose an effective upload or download limit.
We chose to experiment on private torrents, as opposed to sim- We ran experiments with the following configurations.
ulation, in order to examine both individual peer decisions and the
resulting impact on the torrent. Although simulation would have • Two-class. Leechers are divided into two categories with dif-
enabled us to run many more experiments, it would have been a ferent upload limits. This configuration enables us to observe
difficult task to accurately model the dynamics of a BitTorrent sys- system behavior in highly bipolar scenarios. Our experi-
tem. Private torrents allow us to observe and record the behavior of ments involve similar numbers of slow peers, with 20 kB/s
all peers in real scenarios. We can also vary experimental param- upload limit, and fast peers, with 200 kB/s upload limit.
eters, such as peers’ upload rate limits, which helps us distinguish
which factors are responsible for the observed behavior. • Three-class. Leechers are divided into three categories with
We performed experiments with the different torrent configura- different upload limits. This configuration helps us identify
tions described in Section 3.2. There are no agreed-upon parame- the qualitative behavioral differences of more distinct classes
ters in the BitTorrent community, so we set our experiment param- of peers. Our experiments involve similar numbers of slow
eters empirically and based on current best practice. During each peers, with 20 kB/s upload limit; medium peers, with 50 kB/s
experiment, leechers download a single file of 113 MB that consists upload limit; and fast peers, with 200 kB/s upload limit.
of 453 pieces, 256 kB each. • Uniform-increase. Upload limits are defined on leechers ac-
All our experiments were performed with peers that do not cording to a uniform distribution, with a small 5 kB/s step.
change their available upload bandwidth during the download, or The slowest leecher has an upload limit of 20 kB/s, the sec-
disconnect before receiving a complete copy of the file. There is a ond slowest a limit of 25 kB/s, and so on. This configuration
single initial seed, and in all experiments, all leechers join the tor- provides insight into the behavior of torrents with more uni-
rent at the same time, emulating a flash crowd scenario. Although form distribution of peer bandwidth.
the behavior of the system might be different with other peer ar-
rival patterns, we are interested in examining peer decisions under Our graphs in Section 4 correspond to experiments run with the
circumstances of high load. The initial seed stays connected to the three-class configuration, but the conclusions we draw accord well

304
with the results of other experiments. We stress distinctions where Regular Unchoke Duration (All Runs)
appropriate. We also ran preliminary experiments where the ini- 1200
40
tial seed disconnects after uploading an entire copy of the content,
but leechers remain connected after they complete their download, 35 1000
serving as seeds for a short time. Peers in these experiments have

Downloading peer ID
somewhat lower completion times thanks to the extra help from 30
800
leechers in content dissemination, but appear otherwise similar. 25
3.3 Experiment Rationale 20 600
The goal of our experiments is to understand the dynamics of the
15
choking algorithm. To that end, we consider four metrics. 400
10
Clustering: The choking algorithm aims to encourage high peer 200
reciprocation by favoring peers who upload. Therefore, we 5
expect that peers will more frequently unchoke other peers
0 0
with similar upload capacities, since those are the ones that 0 10 20 30 40
can reciprocate with high enough rates. The rules for peer se- Uploading peer ID
lection by Qiu et al. [21] also support this hypothesis. Conse-
quently, it is expected that the choking algorithm converges Figure 1: Time duration that peers unchoked each other via a reg-
towards good clustering shortly after the beginning of the ular unchoke, averaged over all runs. Darker squares represent
download by grouping together peers with similar upload ca- longer unchoke times (the unit of the color bar on the right is in
pacity. This behavior, however, is not guaranteed and has seconds). Peers 1 to 13 have a 20 kB/s upload limit, peers 14 to 27
never been previously verified experimentally. Indeed, let’s have a 50 kB/s upload limit, and peers 28 to 40 have a 200 kB/s up-
consider a simple example. Peer A will unchoke peer B if load limit. The seed (peer 41) is limited to 200 kB/s. The creation
B has been uploading data at a high rate to A. In order for of clusters is clearly visible.
B to continue uploading to A, A should also start sending
data to B at a high enough rate. The only way to initiate
such a reciprocal relationship is via an optimistic unchoke. 4. EXPERIMENTAL RESULTS
Yet, since optimistic unchokes are performed at random, it We now report the results of representative experiments that
is not clear whether and when A and B will get a chance demonstrate our main observations. For conciseness, we present
to interact. Therefore, in order to preserve clustering, opti- only results drawn from the three-class torrent configuration, but
mistic unchokes should successfully initiate interactions be- our conclusions are consistent with our observations from other
tween peers with similar upload capacities. In addition, such configurations as well.
interactions should persist despite potential disruptions, such
as optimistic unchokes by others or network bandwidth fluc- 4.1 Well-Provisioned Initial Seed
tuations. We first examine a scenario with a well-provisioned initial seed,
i.e., a seed that can sustain high upload rates. We expect this to
Sharing incentives: A major goal of the choking algorithm is to
be common for commercial torrents, whose service providers typ-
give peers an incentive to share data. The algorithm strives
ically make sure there is adequate bandwidth to initially seed the
to encourage peers to contribute, since doing so will improve
torrent. An example might be Red Hat distributing its latest Linux
their own download rates. We evaluate the effectiveness of
distribution. Section 4.2 shows that peer behavior in the presence
these sharing incentives by measuring how peers’ upload
of an underprovisioned initial seed can differ substantially.
contributions affect their download completion time. We ex-
We consider an experiment with a single seed and 40 leechers:
pect that the more a peer contributes, the sooner it will com-
13 slow peers (20 kB/s upload limit), 14 medium peers (50 kB/s
plete its download. However, we do not expect to observe
upload limit), and 13 fast peers (200 kB/s upload limit). The seed,
strict data volume fairness, where all peers contribute the
which is represented as peer 41 in the following figures, is limited
same amount of data; peers who upload at high rates may
to upload 200 kB/s, as fast as a fast peer. Different peer upload
end up contributing more data than others. They should be
limits are defined in order to model different levels of contribution.
rewarded though, by completing their download sooner.
The results we report are based on thirteen experiment runs. Al-
Upload utilization: Upload utilization constitutes a reliable met- though the official BitTorrent implementation would set the num-
ric of efficiency in peer-to-peer content distribution systems, ber of parallel uploads based on the defined upload limit (4 for the
since the total upload capacity of all peers represents the slow, 5 for the medium, and 10 for the fast peers and the seed),
maximum throughput the system can achieve as a whole. As we set this number to 4 for all peers, which in fact is what most
a result, a peer-to-peer content distribution protocol should other clients would do. This ensures homogeneous conditions in
aim at maximizing peers’ upload utilization. We are inter- the torrent and makes it easier to interpret the results.
ested in measuring this utilization in BitTorrent systems, and
identifying the factors that can adversely affect it.
4.1.1 Clustering
As explained in Section 3.3, we expect to observe clustering
Seed service: The modified choking algorithm in seed state bases based on peers’ upload capacities. Figure 1 demonstrates that peers
its decisions on the time peers have been waiting for seed indeed form clusters. The figure plots the total time peers unchoked
service, in addition to their download rates from the seed. each other via a regular unchoke, averaged over all runs of the ex-
Thus, we expect to see uniform sharing of the seed upload periment. It is clear that peers in the same class cluster together,
bandwidth among all peers. It should also be impossible for in the sense that they prefer to upload to each other. This behavior
fast leechers to monopolize the seed. becomes more apparent when considering a metric such as the clus-

305
Regular Unchoke Duration Clustering Index (All Runs) Peer Download Speed (All Runs)
1
fast
medium 60
0.8 slow 150
50
Clustering index

0.6

Time*60s
40
100
30
0.4
20 50
0.2
10

0 0 0
0 10 20 30 40 0 10 20 30 40
peer ID Downloading peer ID

Figure 2: Clustering index for all peers, averaged over all runs, Figure 4: Peer download speeds for all 60-second time intervals
in the presence of a well-provisioned seed. Errorbars represent the during the download, averaged over all runs. Darker rectangles
10th and 90th percentiles. Peers 1 to 13 have a 20 kB/s upload represent higher speeds (the unit of the color bar on the right is in
limit, peers 14 to 27 have a 50 kB/s upload limit, and peers 28 to kB/s). Peers 1 to 13 have a 20 kB/s upload limit, peers 14 to 27
40 have a 200 kB/s upload limit. The seed (peer 41) is limited to have a 50 kB/s upload limit, while peers 28 to 40 have a 200 kB/s
200 kB/s. Peers show a strong preference to unchoke others in the upload limit. The seed (peer 41) is limited to 200 kB/s. Peer 27
same class. achieves lower download rates than other peers in its class, while
peer 8 is the last one to finish.
Download Completion Time (All Runs)
1
download (as shown in Figure 3), and so they perform a higher
number of regular unchokes on average than fast peers. Also no-
Cumulative Fraction of Peers

0.8
tice that medium peer 27 interacts frequently with slow peers. This
peer’s download capacity is inherently limited, arguably due to ma-
0.6 chine or local network limitations, as seen in Figure 4 that plots
observed peer download speeds over time. As a result, it stays con-
fast nected to the torrent even after all other peers of its class have com-
medium pleted their download. During that last period it has to interact with
0.4
slow slow leechers, since those are the only ones left.
Figure 1 also shows that reciprocation is not necessarily mutual.
0.2 Slow peers frequently unchoke medium peers, but the favor is not
returned. Indeed, the slow peers unchoked medium peers for a total
of 501,844 seconds, as shown by the relatively dark center-left par-
0 tition. However, the medium peers unchoked slow peers for only
0 1000 2000 3000
Completion Time (s) 273,985 seconds, as shown by the lighter bottom-center. This lack
of reciprocation is due to the fact that slow peers are of little use to
Figure 3: Cumulative distribution of the download completion medium ones, since they cannot offer high enough upload rates.
time for the three different classes of leechers, in the presence of In summary, the choking algorithm facilitates clustering, where
a well-provisioned seed (limited to 200 kB/s), for all runs. The peers mostly interact with others in the same class, with the occa-
vertical line represents the earliest possible time that the download sional exception of random optimistic unchokes.
could complete. Fast peers finish much earlier than slow ones.
4.1.2 Sharing Incentives
We now examine whether BitTorrent’s choking algorithm pro-
tering index. We define this for a given peer in a given class (fast, vides effective sharing incentives, in the sense that a peer who con-
medium, or slow) as the ratio of the duration of regular unchokes tributes more to the torrent is rewarded by completing its download
to the peers of its class over the duration of regular unchokes to sooner than the rest. Figure 3 indeed demonstrates this to be the
all peers. A high clustering index indicates a strong preference to case. We plot the cumulative distribution of completion time for
upload to peers in the same class. Figure 2 plots this index for all the three classes of leechers in the previous experiment. The ver-
peers and demonstrates that peers prefer to unchoke other peers in tical line in the figure represents the optimal completion time, the
their own class, thereby forming clusters. Further experiments with earliest possible time that any peer could complete its download.
upload limits following a uniform distribution also show that peers This is the time the seed finished uploading a complete copy of
have a clear preference for peers with similar upload capacities. the content. On average, this time is around 650 seconds for the
Although from Figure 1 it might seem that slow peers show a experiment.
proportionally stronger preference for their own class, this is an ar- Fast leechers complete their download soon after the optimal
tifact of the experiment. Slow peers take longer to complete their completion time. Medium and, especially, slow leechers take sig-

306
Aggregate Amount of Uploaded Data (All Runs) 7 Global Upload Utilization (All Runs)
x 10
6
40
1
35 5
Downloading peer ID

30 0.8

Upload utilization
4
25
3 0.6
20

15 2 0.4
10
1 0.2
5

0 0
0 10 20 30 40 0
Uploading peer ID 0 10 20 30 40 50 60
Time slot (60s)
Figure 5: Total number of bytes uploaded by peers to each other,
averaged over all runs. Darker squares represent more data (the unit Figure 6: Scatterplot of peers’ upload utilization for all 60-second
of the color bar on the right is in bytes). Peers 1 to 13 have a 20 time intervals during the download, in the presence of a well-
kB/s upload limit, peers 14 to 27 have a 50 kB/s upload limit, and provisioned seed (limited to 200 kB/s). Each point represents the
peers 28 to 40 have a 200 kB/s upload limit. The seed (peer 41) average upload utilization over all peers for a given experiment run.
is limited to 200 kB/s. Fast peers upload much more data than the Utilization is kept high during most of the download session.
rest.

vantage of the available upload capacity. Average utilization for


nificantly longer to finish. Contributing to the torrent enables a each of the thirteen runs is plotted once per minute. The metric is
leecher to enter the fast cluster and receive data at higher rates. This torrent-wide: for each minute, we sum the upload bandwidth used
in turn ensures a short download completion time. The choking al- by the peers during that minute, and divide by the upload capac-
gorithm does indeed foster reciprocation by rewarding contributing ity available over that minute for all peers still connected at the
peers. In experiments with upload limits following a uniform distri- minute’s end. The total capacity decreases over time as peers com-
bution, the peer completion time is also uniform: completion time plete their downloads and disconnect. Utilization is low at the be-
decreases when a peer’s upload contribution increases. This fur- ginning and the end of the session, but close to optimal for the
ther indicates the algorithm’s consistent properties with respect to majority of the download. It rises slightly after approximately 15
effective sharing incentives. minutes, which corresponds to when fast peers leave the torrent.
Note, however, that this does not imply any notion of data vol- Perhaps the four-peer limit on parallel uploads restricts fast peers’
ume fairness. Fast peers end up uploading significantly more data utilization. In any case, utilization is good overall.
than the rest. Figure 5, which plots the actual volume of uploaded In summary, the choking algorithm, in cooperation with other
data averaged over all runs, demonstrates that fast peers are the ma- BitTorrent mechanisms such as rarest-first piece selection, does
jor contributors to the torrent. Most of their bandwidth is expended a good job of ensuring high utilization of the upload capacity of
on other fast peers, per the clustering principle. Interestingly, the leechers during most of the download. Low utilization during the
slow leechers end up downloading more data from the seed. The startup period may pose a problem for small contents, for which
seed provides equal service to peers of any class, as we show in it could dominate the total download time. We discuss a potential
Section 4.1.4, but slow peers have more opportunities than others solution to this in Section 5.2.
to download from the seed, since they take longer to complete.
In summary, BitTorrent provides effective incentives for peers 4.1.4 Seed Service
to contribute, as doing so will reward a leecher with significantly The official client introduced a modified choking algorithm in
higher download rates. Recent studies [16, 17, 23] have shown that seed state, as described in Section 2.3, although it reverted back to
limited free-riding is possible in BitTorrent under specific circum- the original in the most recent version. The client’s version notes
stances, although such free-riders do not appear to severely impact claim that the modified algorithm aims to reduce the amount of
the quality of service for compliant peers. However, these studies duplicate data a seed needs to upload before it has pushed out a
do not significantly challenge the effectiveness of sharing incen- full copy of the content into the torrent. We study this modified
tives enforced by the choking algorithm. Although free-riding is algorithm for the first time and examine this claim.
possible, such peers typically achieve lower download rates than Figure 7 shows the duration of unchokes, both regular and opti-
they could if they followed the protocol. As a result, if peers wish mistic, performed by the seed in a representative run of the afore-
to obtain the highest possible rates, it is in their best interest to mentioned setup. Leechers are unchoked in a uniform manner, re-
conform to the protocol. gardless of upload speed. Fast peers, those with higher peer IDs,
complete their download sooner, after which time the seed divides
4.1.3 Upload Utilization its upload bandwidth among the remaining leechers. Leecher 8 is
We now turn our attention to performance by examining whether the last to complete (as shown in Figure 4), and receives exclusive
the choking algorithm can maintain high utilization of peers’ up- service from the seed during the end of its download. We therefore
load bandwidth. Figure 6 is a scatterplot of such utilization in the see that the modified choking algorithm in seed state provides uni-
aforementioned setup. A utilization of 1 represents taking full ad- form service; this is because it bases its unchoking decisions on the

307
Seed Unchoke Events Regular Unchoke Duration (All Runs)
40
40
300
35
35

Downloading peer ID
30 250
Downloading peer ID

30
25 200
25
20
20 150
15
15
100
10
10
5 50
5
0 0
0 0 10 20 30 40
0 1000 2000 3000 Uploading peer ID
Time (s)
Figure 9: Time duration that peers unchoked each other via a reg-
Figure 7: Duration of all unchokes (regular and optimistic) per- ular unchoke, averaged over all runs. Darker squares represent
formed by a well-provisioned seed to each peer. Results for a single longer unchoke times (the unit of the color bar on the right is in
representative run. Peers 1 to 13 have a 20 kB/s upload limit, peers seconds). Peers 1 to 12 have a 20 kB/s upload limit, peers 13 to
14 to 27 have a 50 kB/s upload limit, and peers 28 to 40 have a 200 26 have a 50 kB/s upload limit, and peers 28 to 40 have a 200 kB/s
kB/s upload limit. The seed (peer 41) provides uniform service to upload limit. The seed (peer 27) is limited to 100 kB/s. There is no
all leechers. discernible clustering.

Pieces Uploaded by the Seed


2500
ing algorithm in seed state avoids unnecessarily uploading dupli-
cate pieces to a certain extent. This number was consistent across
Cumulative number of pieces

2000 all our experiments, ranging from 11 to 15%. However, to the best
of our knowledge, there has been no experimental evaluation of the
corresponding overhead in the old algorithm, so it is not clear how
1500
much of an improvement this is.
In any case, 14% duplication represents an opportunity for im-
1000 provement. The official client always issues requests for pieces in
the rarest-pieces set in the same order. As a result, leechers might
end up requesting the same piece from the seed at approximately
500
Unique the same time. It would be preferable for leechers to request rarest
Total pieces in random order instead.
0
0 1000 2000
Time (s)
3000 4000 4.2 Underprovisioned Initial Seed
We now turn our attention to a scenario with an underprovisioned
Figure 8: Number of pieces uploaded by the seed (limited to 200 initial seed and demonstrate that the seed upload capacity is critical
kB/s), for a single representative run. The Unique line represents to performance during the beginning of a torrent’s lifetime. The
the pieces that had not been previously uploaded, while the Total experiment we present here involves a single seed and 39 leechers,
line represents the total number of pieces uploaded so far. We ob- 12 slow, 14 medium, and 13 fast. These nodes are different than
serve a 14% duplicate piece overhead. the nodes used in the previous experiment. The initial seed, rep-
resented as peer 27 in the following figures, is in this case limited
to 100 kB/s, instead of 200 kB/s. We set the number of parallel
uploads again to four for the seed and all the leechers. The results
time peers have been waiting for seed service. As a result, the risk we present are based on eight experiment runs and are consistent
of fast leechers downloading the entire content and quickly discon- with our observations from experiments with other torrent configu-
necting from the torrent is significantly reduced. Furthermore, this rations. Peer behavior in the presence of an underprovisioned initial
behavior would mitigate the effectiveness of exploits that attempt seed is substantially different than with a well-provisioned one.
to monopolize seeds [16].
According to anecdotal evidence [2], initial seeds using the old
algorithm might have to upload 150% to 200% of the total content 4.2.1 Clustering
size before other peers become seeds. Our experiments show that Figure 9 shows the total time peers unchoked each other via a
the modified algorithm avoids this problem. Figure 8 plots the num- regular unchoke, averaged over all runs of the experiment. In con-
ber of pieces uploaded by the seed during the download session for trast to Figure 1, there is no discernible clustering among peers in
a representative run. 527 pieces are sent out before an entire copy the same class. The lack of clustering in the presence of an un-
of the content (453 pieces) has been uploaded. Thus, the duplicate derprovisioned initial seed becomes more apparent when consider-
piece overhead is around 14%, indicating that the modified chok- ing the clustering index metric defined in Section 4.1.1. Figure 10

308
Regular Unchoke Duration Clustering Index (All Runs) Download Completion Time (All Runs)
1 1

fast

Cumulative Fraction of Peers


0.8 medium 0.8
slow
Clustering index

0.6 0.6

0.4 0.4

fast
0.2 0.2 medium
slow
0 0
0 10 20 30 40
peer ID 0 1000 2000 3000 4000 5000 6000
Completion Time (s)

Figure 10: Clustering index for all peers in the presence of an Figure 12: Cumulative distribution of the download completion
underprovisioned seed, averaged over all runs. Errorbars represent time for the three different classes of leechers, in the presence of
the 10th and 90th percentiles. Peers 1 to 12 have a 20 kB/s upload an underprovisioned seed (limited to 100 kB/s), for all runs. The
limit, peers 13 to 26 have a 50 kB/s upload limit, and peers 28 to 40 vertical line represents the earliest possible time that the download
have a 200 kB/s upload limit. The seed (peer 27) is limited to 100 could complete. Most peers complete at approximately the same
kB/s. Peers do not show a clear preference to unchoke other peers time, regardless of their contribution, soon after the seed finishes
in any particular class. uploading a complete copy of the content.

Peer Availability (All Runs)


40 1 We can see that the fast peers have poor peer availability to all
other peers. This is because the seed is uploading new pieces at
35 a low rate, so even if it uploaded only to fast peers, those would
0.8
quickly replicate every piece as it was completed, remaining non-
Downloading peer ID

30
interested for the rest of the time. The same is not true for slow
25 0.6 peers, however, since they upload even more slowly than the seed.
In addition, when a fast leecher is unchoked by a slow leecher, it
20 will always reciprocate with high rates, and thereby be preferred
0.4 by the slow leecher. As a result, fast peers will get new pieces
15
even from medium and slow peers. In this manner, fast peers pre-
10 vent clustering by taking up slower peers’ unchoke slots and thus
0.2 breaking any clusters that might be starting to form. This prevents
5 medium and slow peers from clustering together, even though the
seed is fast enough with respect to them. Further experiments with
0 0 other torrent configurations, including one with the initial seed fur-
0 10 20 30 40
Uploading peer ID ther limited to 20 kB/s, confirm this conclusion.
In summary, when the initial seed is underprovisioned, the chok-
Figure 11: Normalized interested time duration for each peer pair, ing algorithm does not enable peer clustering. We study in the next
averaged over all runs. Darker squares represent higher peer avail- section how this lack of clustering affects the effectiveness of shar-
ability. Peers 1 to 12 have a 20 kB/s upload limit, peers 13 to 26 ing incentives.
have a 50 kB/s upload limit, and peers 28 to 40 have a 200 kB/s
upload limit. The seed (peer 27) is limited to 100 kB/s. Fast peers 4.2.2 Sharing Incentives
have poor peer availability to all other peers. We now examine how the lack of clustering affects the effective-
ness of sharing incentives. In particular, we investigate whether fast
peers still complete their download sooner than the rest. Figure 12
shows this metric for all peers. They are all similar, indicating a shows that this is no longer the case. Most peers complete their
lack of preference to unchoke peers in any particular class. download at approximately the same time. The points in the tail of
Figure 11 attempts to explain this behavior by plotting the peer the figure are due to a single slow peer, peer 8, which completed
availability of each peer to every other peer, averaged over all runs its download last in every run. This PlanetLab node has a poor
of the experiment. We define the peer availability of a download- effective download speed independently of the choking algorithm,
ing peer Y to an uploading peer X as the ratio of the time X was likely due to machine or local network limitations. All other peers,
interested in Y to the time that Y spent in the peer set of X. A peer for all runs, complete their download less than 2,000 seconds after
availability of 1 means that the uploading peer was always inter- the beginning of a run. Clearly, seed upload capacity is the per-
ested in the downloading peer, while a peer availability of 0 means formance bottleneck. Once the seed finishes uploading a complete
that the uploading peer was never interested in the downloading copy of the content, all peers complete soon thereafter. Since up-
peer. loading data to others does not shorten a peer’s completion time,

309
Aggregate Amount of Uploaded Data (All Runs) 7 Global Upload Utilization (All Runs)
x 10
40 1
6
35
0.8
Downloading peer ID

30 5

Upload utilization
25 4 0.6
20
3
15 0.4
2
10
1 0.2
5

0 0
0 10 20 30 40 0
Uploading peer ID 0 20 40 60 80 100
Time slot (60s)
Figure 13: Total number of bytes peers uploaded to each other,
averaged over all runs. Darker squares represent more data (the Figure 15: Scatterplot of peers’ upload utilization for all 60-second
unit of the color bar on the right is in bytes). Peers 1 to 12 have a time intervals during the download, in the presence of a severely
20 kB/s upload limit, peers 13 to 26 have a 50 kB/s upload limit, underprovisioned seed (limited to 20 kB/s). Each point represents
and peers 28 to 40 have a 200 kB/s upload limit. The seed (peer 27) the average upload utilization over all peers for a given experiment
is limited to 100 kB/s. Fast peers upload the most data, spreading run. Utilization is poor when the seed is very slow.
their bandwidth evenly.
4.2.3 Upload Utilization
Global Upload Utilization (All Runs)
1
Interestingly, even with a slow seed, upload utilization remains
relatively high, as shown in Figure 14. Leechers manage to ex-
change data productively among themselves once new pieces are
0.8 downloaded from the seed, so that the lack of clustering does not
degrade overall performance significantly. The BitTorrent design
Upload utilization

seems to lead the system to do the right thing: fast peers contribute
0.6 their bandwidth to reduce the burden on the initial seed, helping
disseminate the available pieces to slower peers. Although this de-
stroys clustering, it improves overall efficiency, which is a reason-
0.4 able trade-off given the situation.
We also experimented with a seed limited to an upload capacity
of 20 kB/s. Figure 15 shows that, with this extremely low seed ca-
0.2
pacity, there are few new pieces available to exchange at any point
in time, and each new piece gets disseminated rapidly after it is re-
0 trieved from the seed. The overall upload utilization is now low.
0 5 10 15 20 25 30 35 Slow peers exhibit slightly higher utilization than the rest, since
Time slot (60s)
they do not need many available pieces to use up their available
upload capacity.
Figure 14: Scatterplot of peers’ upload utilization for all 60-second In summary, even in situations where the initial seed is underpro-
time intervals during the download, in the presence of an underpro- visioned, the global upload utilization can be high. However, our
visioned seed (limited to 100 kB/s). Each point represents the av- experiments only involve compliant clients, who do not try to adapt
erage upload utilization over all peers for a given experiment run. their upload contributions according to a utility function of the ob-
Utilization is kept at acceptable levels despite the seed limitation. served download speed. On the other hand, in an environment with
free-riders and an underprovisioned seed, one might expect a lower
upload utilization due to the lack of altruistic peer contributions.
BitTorrent’s sharing incentives do not seem to be effective in this
situation.
Fast peers are again the major contributors in the torrent, but in 5. DISCUSSION
this case their upload bandwidth is expended equally across other We now discuss two limitations of the choking algorithm that we
fast and slower peers alike. Figure 13, which plots the amount of identified through our experiments: the initial seed upload capac-
uploaded data between each peer pair, shows that fast peers made ity is fundamental to the proper operation of the incentives mech-
the most contributions, distributing their bandwidth evenly to all anism, and peers take some time to reach full upload utilization at
other peers. the beginning of the download session.
In summary, when the initial seed is underprovisioned, the chok-
ing algorithm does not provide effective incentives to contribute. 5.1 Seed Provisioning
Nevertheless, the available upload capacity of fast peers is effec- When the initial seed is underprovisioned, the choking algorithm
tively utilized to replicate the pieces being uploaded by the seed. does not lead to the clustering of similar-bandwidth peers. Even

310
without clustering, however, we observed high upload utilization. 6. RELATED WORK
Interestingly, in the presence of a slow initial seed, the protocol There has been a fair amount of work on the performance and
mechanisms lead the fast leechers to contribute to the download of behavior of BitTorrent systems. Bram Cohen, the protocol’s cre-
all other peers, fast or slow, thereby improving performance. ator, has described BitTorrent’s main mechanisms and their design
However, whenever feasible, one should engineer adequate ini- rationale [8]. There have been several measurement studies exam-
tial seed capacity in order to allow fast leechers to achieve optimal ining real BitTorrent traffic. Izal et al. [12] measure several peer
performance. Our results show that the lack of clustering occurs characteristics derived from the tracker log for the Redhat Linux
when fast peers cannot maintain their interest in other fast peers. In 9 ISO image, including the number of active peers, the propor-
order to avoid this situation, the initial seed should at least be able tion of seeds and leechers, and the geographical spread of peers.
to upload data at a speed that matches that of the fastest peers in the They observe that while there is a correlation between upload and
torrent. This suggestion is of course a rule-of-thumb guideline, and download rates, indicating that the choking algorithm is working,
assumes that the service provider knows a priori the maximum up- the majority of content is contributed by only a few leechers and
load capacity of the peers that may join the torrent in the future. In the seeds. Pouwelse et al. [20] study the content availability, in-
practice, reasonable bounds could be derived from measurements tegrity, and download performance for torrents on an once-popular
or from an analysis of deployed network technologies. Further re- tracker website. They observe that the centralized tracker compo-
search is needed to evaluate the exact impact of initial seed capac- nent could potentially be a bottleneck. Andrade et al. [6] study
ity. We are currently developing an analytical model that attempts BitTorrent sharing communities. They find that sharing-ratio en-
to express the effect of this parameter on peer performance. forcement and the use of RSS feeds to advertise new content may
improve peer contributions, yet torrents with a large number of
5.2 Tracker Protocol Extension seeds present ample opportunity for free-riding. Furthermore, Guo
When a new leecher first joins the torrent, it connects to a random et al. [11] demonstrate that the peer arrival and departure rate is
subset of already-connected peers that are returned by the tracker. exponential, and that performance fluctuates widely in small tor-
However, in order to reach its optimal bandwidth utilization, this rents. Inter-torrent collaboration is proposed as an alternative to
new leecher needs to exchange data with those peers that have a providing extra incentives for leechers to stay connected after the
similar upload capacity to itself. If there are few such peers in the completion of their download. A more recent study by Legout et
torrent, it may take some time to discover them, since this has to al. [15] presents the results of extensive experiments on real tor-
be done via random optimistic unchokes that occur only once every rents. They show that the rarest-first and choking algorithms play
30 seconds. a critical role in BitTorrent’s performance, and claim that the re-
Consequently, it might be preferable to utilize the tracker in placement with a volume-based tit-for-tat algorithm, as proposed
matching similar-bandwidth leechers. In this manner, the duration by other researchers [13], is not appropriate. However, they do not
of the discovery period could decrease and the upload utilization identify the reasons behind the properties of the choking algorithm
would be high even at the beginning of a peer’s download. The new and fail to examine its dynamics due to the single-peer viewpoint.
leecher could report its available upload capacity to the tracker Several analytical studies have formulated models for
when joining the torrent. This parameter can be configured in the BitTorrent-like protocols. Qiu et al. [21] provide a solution
client software, or may possibly be the actual maximum upload rate to a fluid model of BitTorrent, where they study the choking algo-
measured during previous downloads. The tracker would then re- rithm and its effect on performance. They observe that optimistic
ply with a random subset of peers as usual, along with their upload unchoking may provide a way for peers to free-ride on the system.
capacities. The new leecher could optionally perform optimistic Their model assumes peer selection based on global knowledge of
unchokes first to peers with similar upload capacity, in an effort to all peers in the torrent, as well as uniform distribution of pieces.
discover the best partners sooner. Massoulie et al. [18] introduce a probabilistic model of BitTorrent-
Using this new tracker protocol extension, if the peer set contains like systems and argue that overall system performance does not
only a few leechers with similar upload capacity, they will discover depend critically on either altruistic peer behavior or the rarest-first
each other quickly. Leechers should employ some means of de- piece selection strategy. Fan et al. [9] characterize the complete
tecting and punishing others who lie about their available upload design space of BitTorrent-like protocols by providing a model
capacity. For instance, if a leecher does not respond to an opti- that captures the fundamental trade-off between performance and
mistic unchoke with an upload rate close to the one it announced to fairness. Whereas all these models provide valuable insight into
the tracker, that leecher will not be unchoked again for some period the behavior of BitTorrent systems, unrealistic assumptions limit
of time. In this manner, the possibility of a remote leecher initiating their applicability in real scenarios [11, 20].
a new interaction is left open, yet the benefit from free-riding be- Other researchers have relied on simulations to understand Bit-
havior is limited since free-riders will eventually end up choked by Torrent’s properties. Felber et al. [10] conducted an initial investi-
most peers. Since the tracker still returns a random subset of peers, gation of the impact of different peer arrival rates, peer capacities,
independently of the advertised upload capacity, there is no risk of and peer and piece selection strategies. Bharambe et al. [7] utilize a
creation of disconnected clusters. In a collaborative environment, discrete event simulator to evaluate the impact of BitTorrent’s core
however, the tracker might even want to return peers based on their mechanisms and observe that the rate-based tit-for-tat strategy is in-
capacity, as previously proposed [7], in order to speed up cluster effective in preventing unfairness in peer contributions. They also
creation even more. Of course, although the proposed tracker ex- find that the rarest-first algorithm outperforms alternative piece se-
tension is promising, further investigation is required to verify that lection strategies. However, they do not evaluate a peer set larger
it will work as expected. than 15 peers, whereas the official implementation has a default
value of 80. This may affect the results since the accuracy of the
piece selection strategy is affected by the peer set size. Further-
more, Tian et al. [24] study peer performance towards the end of
the download and propose a new peer selection strategy which en-

311
ables more clients to complete their download after the departure intuition-based engineering choices; we would like to conduct a
of all the seeds. systematic evaluation of system behavior under different parameter
Researchers have also looked into the feasibility of selfish behav- values.
ior, when peers attempt to circumvent BitTorrent mechanisms to
gain unfair benefit. Shneidman et al. [22] were the first to demon- Acknowledgments
strate that BitTorrent exploits are feasible. They briefly describe an
attack to the tracker and an exploit involving leechers lying about We wish to thank the anonymous reviewers and Michael Sirivianos
the pieces they have. Jun et al. [13] argue that the choking al- for their invaluable feedback.
gorithm is not sufficient to prevent free-riding and propose a new
algorithm to enforce fairness in peers’ data exchanges. Liogkas et 8. REFERENCES
al. [16] design and implement three exploits that allow a peer who [1] BitTorrent mainline client.
does not contribute to maintain high download rates under specific http://www.bittorrent.com/download.html.
circumstances. Even though such selfish peers can obtain more [2] BitTorrent Specification wiki.
http://wiki.theory.org/BitTorrentSpecification/.
bandwidth, there is no considerable degradation of the overall sys- [3] Instrumented BitTorrent client. http://www-sop.inria.fr/
tem’s quality of service. Locher et al. [17] extend the work in [16] planete/Arnaud.Legout/Projects/p2p_cd.html#software.
and demonstrate that limited free-riding is feasible even in the ab- [4] Parallel openssh tools. http://www.theether.org/pssh/.
sence of seeds. They also describe selfish behavior in BitTorrent [5] PlanetLab platform. http://www.planet-lab.org.
sharing communities. In addition, Sirivianos et al. [23] evaluate [6] N. Andrade, M. Mowbray, A. Lima, G. Wagner, and M. Ripeanu. Influences
on Cooperation in BitTorrent Communities. In Proc. of the Workshop on
an exploit based on maintaining a larger-than-normal view of the Economics of Peer-to-Peer Systems (P2PEcon’05), Philadelphia, PA, August
torrent. Piatek et al. [19] observe that high-capacity peers typi- 2005.
cally provide low-capacity ones with an unfair share of the data. [7] A. R. Bharambe, C. Herley, and V. N. Padmanabhan. Analyzing and
Improving a BitTorrent Network’s Performance Mechanisms. In Proc. of
They design a choking algorithm optimization that reallocates the Infocom’06, Barcelona, Spain, April 2006.
superfluous upload bandwidth to others in order to maximize peer [8] B. Cohen. Incentives Build Robustness in BitTorrent. In Proc. of the Workshop
download rates. on Economics of Peer-to-Peer Systems (P2PEcon’03), Berkeley, CA, June
Our work differs from all previous studies in its approach and 2003.
[9] B. Fan, D.-M. Chiu, and J. C. Lui. The Delicate Tradeoffs in BitTorrent-like
results. We perform the first extensive experimental study of Bit- File Sharing Protocol Design. In Proc. of ICNP’06, Santa Barbara, CA,
Torrent in a controlled environment, by monitoring all peers in the November 2006.
torrent and examining peer behavior in a variety of scenarios. Our [10] P. A. Felber and E. W. Biersack. Self-scaling Networks for Content
results validate protocol properties that have not been previously Distribution. In Proc. of the International Workshop on Self-* Properties in
Complex Information Systems (Self-*’04), Bertinoro, Italy, May 31–June 2,
demonstrated experimentally, and identify new properties related 2004.
to the impact of the initial seed on clustering and sharing incen- [11] L. Guo, S. Chen, Z. Xiao, E. Tan, X. Ding, and X. Zhang. Measurements,
tives. Analysis, and Modeling of BitTorrent-like Systems. In Proc. of IMC’05,
Berkeley, CA, October 2005.
[12] M. Izal, G. Urvoy-Keller, E. W. Biersack, P. Felber, A. A. Hamra, and
7. CONCLUSION L. Garcés-Erice. Dissecting BitTorrent: Five Months in a Torrent’s Lifetime.
In Proc. of PAM’04, Antibes Juan-les-Pins, France, April 2004.
In this paper we presented the first experimental investigation [13] S. Jun and M. Ahamad. Incentives in BitTorrent Induce Free Riding. In Proc.
of BitTorrent systems that links per-peer decisions and overall tor- of the Workshop on Economics of Peer-to-Peer Systems (P2PEcon’05),
rent behavior. Our results validate three BitTorrent properties that, Philadelphia, PA, August 2005.
[14] T. Karagiannis, A. Broido, N. Brownlee, kc claffy, and M. Faloutsos. Is P2P
though believed to hold, have not been previously demonstrated ex- dying or just hiding? In Proc. of Globecom’04, Dallas, TX, November
perimentally. We show that the choking algorithm enables cluster- 29–December 3, 2004.
ing of similar-bandwidth peers, fosters effective sharing incentives [15] A. Legout, G. Urvoy-Keller, and P. Michiardi. Rarest First and Choke
by rewarding peers who contribute, and achieves high peer upload Algorithms Are Enough. In Proc. of IMC’06, Rio de Janeiro, Brazil, October
2006.
utilization for the majority of the download duration. We also ex- [16] N. Liogkas, R. Nelson, E. Kohler, and L. Zhang. Exploring the Robustness of
amined the properties of the modified choking algorithm in seed BitTorrent Peer-to-Peer Systems. Concurrency and Computation: Practice
state and the impact of initial seed capacity on the overall system and Experience, 2007. DOI: 10.1002/cpe.1187.
performance. In particular, we showed that an underprovisioned [17] T. Locher, P. Moor, S. Schmid, and R. Wattenhofer. Free Riding in BitTorrent
is Cheap. In Proc. of HotNets-V, Irvine, CA, November 2006.
initial seed does not facilitate the clustering of peers and does not [18] L. Massoulie and M. Vojnovic. Coupon Replication Systems. In Proc. of
provide effective sharing incentives. However, even in such a case, SIGMETRICS’05, Banff, Canada, June 2005.
the choking algorithm facilitates efficient utilization of the avail- [19] M. Piatek, T. Isdal, T. Anderson, A. Krishnamurthy, and A. Venkataramani.
able resources by having fast peers help others with their down- Do incentives build robustness in BitTorrent? In Proc. of NSDI’07,
Cambridge, MA, April 2007.
load. Based on our observations, we offered guidelines for content [20] J. Pouwelse, P. Garbacki, D. Epema, and H. Sips. The BitTorrent P2P
providers regarding seed provisioning, and discussed a proposed file-sharing system: Measurements and Analysis. In Proc. of IPTPS’05,
tracker protocol extension that addresses an identified limitation of Ithaca, NY, February 2005.
the protocol. [21] D. Qiu and R. Srikant. Modeling and Performance Analysis of
BitTorrent-Like Peer-to-Peer Networks. In Proc. of SIGCOMM’04, Portland,
This work opens up many avenues for future research. We are OR, August 30–September 3, 2004.
currently developing an analytical model to express the impact of [22] J. Shneidman, D. Parkes, and L. Massoulie. Faithfulness in Internet
seed capacity on peer performance. It would also be interesting to Algorithms. In Proc. of the Workshop on Practice and Theory of Incentives
and Game Theory in Networked Systems (PINS’04), Portland, OR, September
run experiments with the old choking algorithm in seed state and 2004.
compare its properties to the modified algorithm, especially with [23] M. Sirivianos, J. H. Park, R. Chen, and X. Yang. Free-riding in BitTorrent
respect to the upload of duplicate pieces. In addition, we would Networks with the Large View Exploit. In Proc. of IPTPS’07, Bellevue, WA,
like to investigate the impact of different numbers of regular and February 2007.
[24] Y. Tian, D. Wu, and K. W. Ng. Modeling, Analysis and Improvement for
optimistic unchokes on the protocol’s properties. It has recently BitTorrent-Like File Sharing Networks. In Proc. of Infocom’06, Barcelona,
been argued that there is a fundamental trade-off between these two Spain, April 2006.
kinds of unchokes [9]. The current values used by the protocol are

312

You might also like