Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
8 views4 pages

Dynamodb Part 1

Dynamo is a highly available key-value storage system developed by Amazon to manage application state reliably at massive scale, ensuring continuous service availability despite component failures. It employs techniques like object versioning, application-assisted conflict resolution, and a decentralized architecture to prioritize availability over strict consistency, allowing for flexible performance trade-offs. The system is designed to meet the stringent operational demands of Amazon's e-commerce platform, supporting a wide range of applications with varying storage requirements.

Uploaded by

Sandeep Naidu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Dynamodb Part 1

Dynamo is a highly available key-value storage system developed by Amazon to manage application state reliably at massive scale, ensuring continuous service availability despite component failures. It employs techniques like object versioning, application-assisted conflict resolution, and a decentralized architecture to prioritize availability over strict consistency, allowing for flexible performance trade-offs. The system is designed to meet the stringent operational demands of Amazon's e-commerce platform, supporting a wide range of applications with varying storage requirements.

Uploaded by

Sandeep Naidu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Dynamo: Amazon’s Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash


Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels
Amazon.com

One of the lessons our organization has learned from operating


ABSTRACT Amazon’s platform is that the reliability and scalability of a
Reliability at massive scale is one of the biggest challenges we
system is dependent on how its application state is managed.
face at Amazon.com, one of the largest e-commerce operations in
Amazon uses a highly decentralized, loosely coupled, service
the world; even the slightest outage has significant financial
oriented architecture consisting of hundreds of services. In this
consequences and impacts customer trust. The Amazon.com
environment there is a particular need for storage technologies
platform, which provides services for many web sites worldwide,
that are always available. For example, customers should be able
is implemented on top of an infrastructure of tens of thousands of
to view and add items to their shopping cart even if disks are
servers and network components located in many datacenters
failing, network routes are flapping, or data centers are being
around the world. At this scale, small and large components fail
destroyed by tornados. Therefore, the service responsible for
continuously and the way persistent state is managed in the face
managing shopping carts requires that it can always write to and
of these failures drives the reliability and scalability of the
read from its data store, and that its data needs to be available
software systems.
across multiple data centers.
This paper presents the design and implementation of Dynamo, a
Dealing with failures in an infrastructure comprised of millions of
highly available key-value storage system that some of Amazon’s
components is our standard mode of operation; there are always a
core services use to provide an “always-on” experience. To
small but significant number of server and network components
achieve this level of availability, Dynamo sacrifices consistency
that are failing at any given time. As such Amazon’s software
under certain failure scenarios. It makes extensive use of object
systems need to be constructed in a manner that treats failure
versioning and application-assisted conflict resolution in a manner
handling as the normal case without impacting availability or
that provides a novel interface for developers to use.
performance.
Categories and Subject Descriptors To meet the reliability and scaling needs, Amazon has developed
D.4.2 [Operating Systems]: Storage Management; D.4.5 a number of storage technologies, of which the Amazon Simple
[Operating Systems]: Reliability; D.4.2 [Operating Systems]: Storage Service (also available outside of Amazon and known as
Performance; Amazon S3), is probably the best known. This paper presents the
design and implementation of Dynamo, another highly available
General Terms and scalable distributed data store built for Amazon’s platform.
Algorithms, Management, Measurement, Performance, Design, Dynamo is used to manage the state of services that have very
Reliability. high reliability requirements and need tight control over the
tradeoffs between availability, consistency, cost-effectiveness and
1. INTRODUCTION performance. Amazon’s platform has a very diverse set of
Amazon runs a world-wide e-commerce platform that serves tens applications with different storage requirements. A select set of
of millions customers at peak times using tens of thousands of applications requires a storage technology that is flexible enough
servers located in many data centers around the world. There are to let application designers configure their data store appropriately
strict operational requirements on Amazon’s platform in terms of based on these tradeoffs to achieve high availability and
performance, reliability and efficiency, and to support continuous guaranteed performance in the most cost effective manner.
growth the platform needs to be highly scalable. Reliability is one
of the most important requirements because even the slightest There are many services on Amazon’s platform that only need
outage has significant financial consequences and impacts primary-key access to a data store. For many services, such as
customer trust. In addition, to support continuous growth, the those that provide best seller lists, shopping carts, customer
platform needs to be highly scalable. preferences, session management, sales rank, and product catalog,
the common pattern of using a relational database would lead to
inefficiencies and limit scale and availability. Dynamo provides a
simple primary-key only interface to meet the requirements of
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
these applications.
not made or distributed for profit or commercial advantage and that Dynamo uses a synthesis of well known techniques to achieve
copies bear this notice and the full citation on the first page. To copy
scalability and availability: Data is partitioned and replicated
otherwise, or republish, to post on servers or to redistribute to lists,
requires prior specific permission and/or a fee. using consistent hashing [10], and consistency is facilitated by
SOSP 2007, Oct., 2004, Stevenson, WA, USA. object versioning [12]. The consistency among replicas during
Copyright 2007 ACM XXX…$5.00. updates is maintained by a quorum-like technique and a
decentralized replica synchronization protocol. Dynamo employs
a gossip based distributed failure detection and membership This paper describes Dynamo, a highly available data storage
protocol. Dynamo is a completely decentralized system with technology that addresses the needs of these important classes of
minimal need for manual administration. Storage nodes can be services. Dynamo has a simple key/value interface, is highly
added and removed from Dynamo without requiring any manual available with a clearly defined consistency window, is efficient
partitioning or redistribution. in its resource usage, and has a simple scale out scheme to address
growth in data set size or request rates. Each service that uses
In the past year, Dynamo has been the underlying storage Dynamo runs its own Dynamo instances.
technology for a number of the core services in Amazon’s e-
commerce platform. It was able to scale to extreme peak loads 2.1 System Assumptions and Requirements
efficiently without any downtime during the busy holiday The storage system for this class of services has the following
shopping season. For example, the service that maintains requirements:
shopping cart (Shopping Cart Service) served tens of millions
requests that resulted in well over 3 million checkouts in a single Query Model: simple read and write operations to a data item that
day and the service that manages session state handled hundreds is uniquely identified by a key. State is stored as binary objects
of thousands of concurrently active sessions. (i.e., blobs) identified by unique keys. No operations span
multiple data items and there is no need for relational schema.
The main contribution of this work for the research community is This requirement is based on the observation that a significant
the evaluation of how different techniques can be combined to portion of Amazon’s services can work with this simple query
provide a single highly-available system. It demonstrates that an model and do not need any relational schema. Dynamo targets
eventually-consistent storage system can be used in production applications that need to store objects that are relatively small
with demanding applications. It also provides insight into the (usually less than 1 MB).
tuning of these techniques to meet the requirements of production
systems with very strict performance demands. ACID Properties: ACID (Atomicity, Consistency, Isolation,
Durability) is a set of properties that guarantee that database
The paper is structured as follows. Section 2 presents the transactions are processed reliably. In the context of databases, a
background and Section 3 presents the related work. Section 4 single logical operation on the data is called a transaction.
presents the system design and Section 5 describes the Experience at Amazon has shown that data stores that provide
implementation. Section 6 details the experiences and insights ACID guarantees tend to have poor availability. This has been
gained by running Dynamo in production and Section 7 concludes widely acknowledged by both the industry and academia [5].
the paper. There are a number of places in this paper where Dynamo targets applications that operate with weaker consistency
additional information may have been appropriate but where (the “C” in ACID) if this results in high availability. Dynamo
protecting Amazon’s business interests require us to reduce some does not provide any isolation guarantees and permits only single
level of detail. For this reason, the intra- and inter-datacenter key updates.
latencies in section 6, the absolute request rates in section 6.2 and
outage lengths and workloads in section 6.3 are provided through Efficiency: The system needs to function on a commodity
aggregate measures instead of absolute details. hardware infrastructure. In Amazon’s platform, services have
stringent latency requirements which are in general measured at
2. BACKGROUND the 99.9th percentile of the distribution. Given that state access
Amazon’s e-commerce platform is composed of hundreds of plays a crucial role in service operation the storage system must
services that work in concert to deliver functionality ranging from be capable of meeting such stringent SLAs (see Section 2.2
recommendations to order fulfillment to fraud detection. Each below). Services must be able to configure Dynamo such that they
service is exposed through a well defined interface and is consistently achieve their latency and throughput requirements.
accessible over the network. These services are hosted in an The tradeoffs are in performance, cost efficiency, availability, and
infrastructure that consists of tens of thousands of servers located durability guarantees.
across many data centers world-wide. Some of these services are
stateless (i.e., services which aggregate responses from other Other Assumptions: Dynamo is used only by Amazon’s internal
services) and some are stateful (i.e., a service that generates its services. Its operation environment is assumed to be non-hostile
response by executing business logic on its state stored in and there are no security related requirements such as
persistent store). authentication and authorization. Moreover, since each service
uses its distinct instance of Dynamo, its initial design targets a
Traditionally production systems store their state in relational scale of up to hundreds of storage hosts. We will discuss the
databases. For many of the more common usage patterns of state scalability limitations of Dynamo and possible scalability related
persistence, however, a relational database is a solution that is far extensions in later sections.
from ideal. Most of these services only store and retrieve data by
primary key and do not require the complex querying and 2.2 Service Level Agreements (SLA)
management functionality offered by an RDBMS. This excess To guarantee that the application can deliver its functionality in a
functionality requires expensive hardware and highly skilled bounded time, each and every dependency in the platform needs
personnel for its operation, making it a very inefficient solution. to deliver its functionality with even tighter bounds. Clients and
In addition, the available replication technologies are limited and services engage in a Service Level Agreement (SLA), a formally
typically choose consistency over availability. Although many negotiated contract where a client and a service agree on several
advances have been made in the recent years, it is still not easy to system-related characteristics, which most prominently include
scale-out databases or use smart partitioning schemes for load the client’s expected request rate distribution for a particular API
balancing. and the expected service latency under those conditions. An
example of a simple SLA is a service guaranteeing that it will
production systems have shown that this approach provides a
better overall experience compared to those systems that meet
SLAs defined based on the mean or median.
In this paper there are many references to this 99.9th percentile of
distributions, which reflects Amazon engineers’ relentless focus
on performance from the perspective of the customers’
experience. Many papers report on averages, so these are included
where it makes sense for comparison purposes. Nevertheless,
Amazon’s engineering and optimization efforts are not focused on
averages. Several techniques, such as the load balanced selection
of write coordinators, are purely targeted at controlling
performance at the 99.9th percentile.
Storage systems often play an important role in establishing a
service’s SLA, especially if the business logic is relatively
lightweight, as is the case for many Amazon services. State
management then becomes the main component of a service’s
SLA. One of the main design considerations for Dynamo is to
give services control over their system properties, such as
durability and consistency, and to let services make their own
tradeoffs between functionality, performance and cost-
effectiveness.
Figure 1: Service-oriented architecture of Amazon’s 2.3 Design Considerations
platform Data replication algorithms used in commercial systems
traditionally perform synchronous replica coordination in order to
provide a response within 300ms for 99.9% of its requests for a
provide a strongly consistent data access interface. To achieve this
peak client load of 500 requests per second.
level of consistency, these algorithms are forced to tradeoff the
In Amazon’s decentralized service oriented infrastructure, SLAs availability of the data under certain failure scenarios. For
play an important role. For example a page request to one of the instance, rather than dealing with the uncertainty of the
e-commerce sites typically requires the rendering engine to correctness of an answer, the data is made unavailable until it is
construct its response by sending requests to over 150 services. absolutely certain that it is correct. From the very early replicated
These services often have multiple dependencies, which database works, it is well known that when dealing with the
frequently are other services, and as such it is not uncommon for possibility of network failures, strong consistency and high data
the call graph of an application to have more than one level. To availability cannot be achieved simultaneously [2, 11]. As such
ensure that the page rendering engine can maintain a clear bound systems and applications need to be aware which properties can
on page delivery each service within the call chain must obey its be achieved under which conditions.
performance contract.
For systems prone to server and network failures, availability can
Figure 1 shows an abstract view of the architecture of Amazon’s be increased by using optimistic replication techniques, where
platform, where dynamic web content is generated by page changes are allowed to propagate to replicas in the background,
rendering components which in turn query many other services. A and concurrent, disconnected work is tolerated. The challenge
service can use different data stores to manage its state and these with this approach is that it can lead to conflicting changes which
data stores are only accessible within its service boundaries. Some must be detected and resolved. This process of conflict resolution
services act as aggregators by using several other services to introduces two problems: when to resolve them and who resolves
produce a composite response. Typically, the aggregator services them. Dynamo is designed to be an eventually consistent data
are stateless, although they use extensive caching. store; that is all updates reach all replicas eventually.
A common approach in the industry for forming a performance An important design consideration is to decide when to perform
oriented SLA is to describe it using average, median and expected the process of resolving update conflicts, i.e., whether conflicts
variance. At Amazon we have found that these metrics are not should be resolved during reads or writes. Many traditional data
good enough if the goal is to build a system where all customers stores execute conflict resolution during writes and keep the read
have a good experience, rather than just the majority. For complexity simple [7]. In such systems, writes may be rejected if
example if extensive personalization techniques are used then the data store cannot reach all (or a majority of) the replicas at a
customers with longer histories require more processing which given time. On the other hand, Dynamo targets the design space
impacts performance at the high-end of the distribution. An SLA of an “always writeable” data store (i.e., a data store that is highly
stated in terms of mean or median response times will not address available for writes). For a number of Amazon services, rejecting
the performance of this important customer segment. To address customer updates could result in a poor customer experience. For
this issue, at Amazon, SLAs are expressed and measured at the instance, the shopping cart service must allow customers to add
99.9th percentile of the distribution. The choice for 99.9% over an and remove items from their shopping cart even amidst network
even higher percentile has been made based on a cost-benefit and server failures. This requirement forces us to push the
analysis which demonstrated a significant increase in cost to complexity of conflict resolution to the reads in order to ensure
improve performance that much. Experiences with Amazon’s that writes are never rejected.
The next design choice is who performs the process of conflict Various storage systems, such as Oceanstore [9] and PAST [17]
resolution. This can be done by the data store or the application. If were built on top of these routing overlays. Oceanstore provides a
conflict resolution is done by the data store, its choices are rather global, transactional, persistent storage service that supports
limited. In such cases, the data store can only use simple policies, serialized updates on widely replicated data. To allow for
such as “last write wins” [22], to resolve conflicting updates. On concurrent updates while avoiding many of the problems inherent
the other hand, since the application is aware of the data schema it with wide-area locking, it uses an update model based on conflict
can decide on the conflict resolution method that is best suited for resolution. Conflict resolution was introduced in [21] to reduce
its client’s experience. For instance, the application that maintains the number of transaction aborts. Oceanstore resolves conflicts by
customer shopping carts can choose to “merge” the conflicting processing a series of updates, choosing a total order among them,
versions and return a single unified shopping cart. Despite this and then applying them atomically in that order. It is built for an
flexibility, some application developers may not want to write environment where the data is replicated on an untrusted
their own conflict resolution mechanisms and choose to push it infrastructure. By comparison, PAST provides a simple
down to the data store, which in turn chooses a simple policy such abstraction layer on top of Pastry for persistent and immutable
as “last write wins”. objects. It assumes that the application can build the necessary
storage semantics (such as mutable files) on top of it.
Other key principles embraced in the design are:
Incremental scalability: Dynamo should be able to scale out one 3.2 Distributed File Systems and Databases
storage host (henceforth, referred to as “node”) at a time, with Distributing data for performance, availability and durability has
minimal impact on both operators of the system and the system been widely studied in the file system and database systems
itself. community. Compared to P2P storage systems that only support
flat namespaces, distributed file systems typically support
Symmetry: Every node in Dynamo should have the same set of hierarchical namespaces. Systems like Ficus [15] and Coda [19]
responsibilities as its peers; there should be no distinguished node replicate files for high availability at the expense of consistency.
or nodes that take special roles or extra set of responsibilities. In Update conflicts are typically managed using specialized conflict
our experience, symmetry simplifies the process of system resolution procedures. The Farsite system [1] is a distributed file
provisioning and maintenance. system that does not use any centralized server like NFS. Farsite
Decentralization: An extension of symmetry, the design should achieves high availability and scalability using replication. The
favor decentralized peer-to-peer techniques over centralized Google File System [6] is another distributed file system built for
control. In the past, centralized control has resulted in outages and hosting the state of Google’s internal applications. GFS uses a
the goal is to avoid it as much as possible. This leads to a simpler, simple design with a single master server for hosting the entire
more scalable, and more available system. metadata and where the data is split into chunks and stored in
chunkservers. Bayou is a distributed relational database system
Heterogeneity: The system needs to be able to exploit that allows disconnected operations and provides eventual data
heterogeneity in the infrastructure it runs on. e.g. the work consistency [21].
distribution must be proportional to the capabilities of the
individual servers. This is essential in adding new nodes with Among these systems, Bayou, Coda and Ficus allow disconnected
higher capacity without having to upgrade all hosts at once. operations and are resilient to issues such as network partitions
and outages. These systems differ on their conflict resolution
3. RELATED WORK procedures. For instance, Coda and Ficus perform system level
conflict resolution and Bayou allows application level resolution.
3.1 Peer to Peer Systems All of them, however, guarantee eventual consistency. Similar to
There are several peer-to-peer (P2P) systems that have looked at these systems, Dynamo allows read and write operations to
the problem of data storage and distribution. The first generation continue even during network partitions and resolves updated
of P2P systems, such as Freenet and Gnutella1, were conflicts using different conflict resolution mechanisms.
predominantly used as file sharing systems. These were examples Distributed block storage systems like FAB [18] split large size
of unstructured P2P networks where the overlay links between objects into smaller blocks and stores each block in a highly
peers were established arbitrarily. In these networks, a search available manner. In comparison to these systems, a key-value
query is usually flooded through the network to find as many store is more suitable in this case because: (a) it is intended to
peers as possible that share the data. P2P systems evolved to the store relatively small objects (size < 1M) and (b) key-value stores
next generation into what is widely known as structured P2P are easier to configure on a per-application basis. Antiquity is a
networks. These networks employ a globally consistent protocol wide-area distributed storage system designed to handle multiple
to ensure that any node can efficiently route a search query to server failures [23]. It uses a secure log to preserve data integrity,
some peer that has the desired data. Systems like Pastry [16] and replicates each log on multiple servers for durability, and uses
Chord [20] use routing mechanisms to ensure that queries can be Byzantine fault tolerance protocols to ensure data consistency. In
answered within a bounded number of hops. To reduce the contrast to Antiquity, Dynamo does not focus on the problem of
additional latency introduced by multi-hop routing, some P2P data integrity and security and is built for a trusted environment.
systems (e.g., [14]) employ O(1) routing where each peer Bigtable is a distributed storage system for managing structured
maintains enough routing information locally so that it can route data. It maintains a sparse, multi-dimensional sorted map and
requests (to access a data item) to the appropriate peer within a allows applications to access their data using multiple attributes
constant number of hops. [2]. Compared to Bigtable, Dynamo targets applications that
require only key/value access with primary focus on high
1
http://freenetproject.org/, http://www.gnutella.org availability where updates are not rejected even in the wake of
network partitions or server failures.

You might also like