Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
10 views7 pages

Final Project

Uploaded by

csimoes1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views7 pages

Final Project

Uploaded by

csimoes1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 7

1

HTTP Session Replication for Tomcat Web-server


(May 2012)

Chris Simoes1, Alex Bednarczyk1, Sponsoring Professor: Vijay Garg


1
Software Engineering Master’s Program at the University of Texas at Austin

Session replication is an important problem facing modern webservers. Amazon’s S3 data store provides for an excellent
mechanism to allow webservers to store their session externally to allow for easy session migration between webservers. We have built
an implementation of Apache Tomcat that uses Amazon S3 to backup all of its sessions. We will discuss our design and discoveries,
and then we investigate the overhead performance we will incur by using Amazon S3’s service as an external data store instead of
utilizing Apache Tomcat’s default session replication techniques.

I. INTRODUCTION days a week. For modern, robust web application that power

H TTP is the Hypertext Transfer Protocol that is the sites such as Amazon’s store a single web server is not

foundation for the World Wide Web. HTTP functions sufficient. To power a large site, 10’s if not 100’s of web

as a request-response protocol allowing web servers to servers will be needed. If all of these web servers store their

connect with clients such as web browsers. The web servers sessions in internal memory this presents a new problem.

will return via HTTP the HTML files that make up all of the What happens when one of these servers crashes? What

web pages on the Internet. HTTP is a stateless protocol, happens when a server needs to be taken down for service? A

meaning that every request it handles is independent of all simple answer has been to stop taking on any new sessions,

other requests made to the same web server. This allows the and to allow existing sessions to logout or timeout before we

server to not have to retain state information about all of the shutdown the server. Unfortunately for modern busy

requests made to it. In principle this simplifies server design websites, this can take hours or even days to complete. What

because there is no need to dynamically allocate storage to we would really like, it to be able to immediately direct web

deal with multiple requests in process. Also if a client dies in traffic from one web server to another one without any

mid-transaction, no clean up should be necessary. However disruption to the user’s experience, and without having to wait

this has a big downside in practical applications where we hours or even days.

want to know if a user is returning to our website. In order to accomplish this we have to allow the sessions on

To track users using HTTP various methods of session our web servers to “migrate” from one server to the next. We

management have been created. The most common utilizes a have to stop storing the session information only in internal

cookie that is stored on the client browser to identify the client memory on a single server. We have studied 2 approaches to

with each request. This also add the added overhead to the this problem, and this paper will discuss them both. Our 2

server of needing to track which cookie belongs to which user researched solutions are to store our sessions:

and allocating memory space to track information about each  CENTRAL DATA STORE THAT IS HIGHLY RELIABLE

AND FAULT TOLERANT


user. The server will pass to the client a cookie with a
“session ID” that is then also stored in the web servers internal  DISTRIBUTED ON OUR OTHER WEBSERVERS

memory. This works well for development environments and


small servers that are not required to run 24 hours a day, 7 Below we will discuss our research and findings by
2

comparing and contrasting these 2 approaches. We will also In Janaury 2012, Amazon announced the beta release of
present our performance measurements of each approach, their DynamoDB web service. So our initial research we
followed by our next steps in our research. based on Amazon’s DynamoDB web service. Unfortunately,
it quickly became apparent to us that there were some
significant drawbacks to successfully using DynamoDB as our
II. ENVIRONMENT
central data store. DynamoDB is marketed as a NoSQL
The Apache Tomcat project is an open source web server database service, but in reality it only stores information as
that is used to power some of the largest websites on the strings. This is undesirable since our session information will
World Wide Web. We chose to do our research using the be most naturally represented as an array of bytes. A further
Apache Tomcat web server for several reasons. First, it is an limitation is that the value of any given column is limited to
open source project written in Java that will allow us to easily 64,000 bytes of information. This is also undesirable since
make modifications and investigate behaviors. Second, it has our sessions can be an arbitrary sized array of bytes that could
a large community around it that provides data and support to likely be larger than 64,000 bytes. Upon learning of these
our research. During our initial investigation, it was very easy limitations we abandoned consideration of DynamoDB for our
to find others that had our same goal in mind who were open key-value store.
and interested in sharing their work in performing session We discovered that the correct web service to use is
replication with Tomcat. Third, Tomcat already has a built in Amazon’s S3 (Simple Storage Service). While Amazon S3’s
“high availability” mode (HA) that would allow us to compare documentation does not specifically state that it is using
with our approach of using a centralized data store. Amazon’s Dynamo technology, it does state that:
Our Tomcat instances were all installed on Linux Ubuntu
“Amazon S3 provides a simple web services interface that
servers running in the Amazon EC2 (Elastic Computing)
can be used to store and retrieve any amount of data, at any
cloud. The Amazon cloud allowed us to easily create new
time, from anywhere on the web. It gives any developer
servers to simulate a cluster of computers, and copy server
access to the same highly scalable, reliable, secure, fast,
configurations from one machine to the next. Also Amazon’s
inexpensive infrastructure that Amazon uses to run its own
EC2 cloud was preferable over other cloud providers since we
global network of web sites.” [2]
intended to use Amazon’s Dynamo distributed hash table for
Upon further inspection it became clear to us that Amazon’s
our centralized data store tests.
S3 service was the correct technology for us to build on. It
was released in March of 2006, and it allows for writing,
III. DYNAMO reading and deleting of key value pairs where the value can be
from 1 byte to 5 terabytes in size. It allows for an unlimited
Amazon outlined their proprietary implementation of a
number of objects to be stored, and it provides a 99.9%
highly available key-value store they named “Dynamo” [1].
monthly uptime guarantee. As of March 2012, Amazon S3 is
Amazon needed a massively scaled key-value data store that
currently storing over 905 billion objects [3]. For these
provided high reliability and performance to run their huge
reasons we chose Amazon S3 as our centralized data store.
ecommerce store. Dynamo is designed to provide an easy to
use interface for the programmer that allows a guaranteed
level of service. The actual implementation of Dynamo is
IV. CENTRAL DATA STORE THAT IS HIGHLY RELIABLE AND
hidden from the developer, and it is built on a distributed
FAULT TOLERANT
network of servers spread across the country that provide to
Our main body of research was to see if we could
the user an “always-on” appearance.
3

externalize Apache Tomcat’s session management to a investigated different approaches for integrating with Tomcat.
centralized data store. Before we began writing any code, we Tomcat has a “ManagerBase” class [6] that we extended to
first researched if anyone else had already tried this interface with Tomcat’s session management. This class
approach. controls at a high level session persistence and storage. We
then implemented a class called DynamoSessionService, that
was responsible for actually finding and storing our sessions.
A. memcached
It also provides the methods for serializing and deserializing
We discovered that no one had used Amazon S3 to
our session objects. For serialization we chose to use Java’s
externalize Apache Tomcat’s session management, however
default serialization API, and this requires that all objects
we did find an interesting project called “memcached-session-
placed into our web server’s session implement the
manager” [4] that externalized session management. It used
“java.io.Serailizable” interface. For faster performance other
memcached [5], which is an open source, high performance,
serialization libraries exist. We also extended Tomcat’s
distributed memory object caching system. It provides an in-
“StandardSession” class with our own version called
memory key-value store for small chunks of arbitrary data.
“DynamoBackupSession” that tracks changes to our session so
While similar to our needs, memcached had short comings in
we can know if it is dirty in relation to our in memory cache.
comparison to Amazon S3’s service. Memcached expects
This wrapper class allows us to track all the extra attributes we
clients to understand which server to send data to, and which
need to in order to implement our externalized data store.
servers to fetch data from. In this sense memcached is not a
Apache Tomcat uses “Valves” to represent a component
centralized data store. Also memcached is built to use
that will be inserted into the processing pipeline of a web
physical memory, and it is not ideal for persisting data on
request. We implemented our own valve called
machines that may need to restart. Thus the durability of our
“SessionTrackerValve” that will monitor anytime a session is
data is in question.
modified in internal memory. Our design that we copied from
The “memcached-session-manager” project did provide us
memcached-session-manager will only persist our session to
with an excellent starting point for our research. The project
Amazon S3 if the session has changed. If the session is
was first released in October 2009 by Martin Grotzke, and it
accessed but not changed, then we continue to use our valid
has subsequently has numerous updates and improvements. It
copy in our internal memory cache. This optimization is
supports Apache Tomcat 6 and 7, and it handle many special
critical to minimize the number of external calls our web
cases such as sticky sessions and server failover. We
server makes to our external system.
investigated the code thoroughly and decided to follow their
In our first pass of refactoring the memcached-session-
design for integrating with Apache Tomcat.
manager code, we changed all references using the
memcached client to instead write and read sessions to the
B. Implementation local disk. This allowed us to investigate and debug problems
Our implementation is straightforward in theory. We would quickly of how do sessions get loaded and invalidated from
refactor the memcached-session-manager project to use memory. During this phase we learned that all backing up of
Amazon S3 as a data store instead of memcached. In practice session information happens asynchronously through a task
we learned a lot about the inner workings of Apache Tomcat service. So we wrote our own “BackupSessionTask” to
and session management to complete this work. handle the storing of sessions to disk. Once we got file system
Before we could begin, we first had to download and backups working correctly we built a stand-alone Amazon S3
investigate the source code for Tomcat. We studied to see client to store data in S3. This is the S3Client class in the
how does Tomcat load track and store sessions? We also org.simoes.session.s3 package. The S3Client is responsible
4

for authenticating our program with Amazon Web services. It linearly with the number of servers. This wastes a lot of
is also responsible for providing us with a simple interface for network bandwidth, and introduces many unnecessary
putting and getting key value pairs from our external data messages. A better approach recommended by Apache
store on S3. Upon completion of this component we Tomcat is to group web servers into clusters behind a load
integrated our S3Client with the core Tomcat codebase, and balancer.
successfully ran Tomcat while it’s sessions externally
replicated to Amazon S3. TOMCAT RECOMMENDED CONFIGURATION

We next investigated the performance characteristics of our


implementation. Obviously the big advantage of external
session storage is the ability to easily change the web server a
client is connecting to with no downtime. The big
disadvantage is the latency that is potentially introduced by
needing to make serialization calls over the network to load
and store session information. We wanted to study this
potential limitation to see how much latency we would need to
trade for portability. In order to perform a fair performance Like our Dynamo implementation, Tomcat’s

assessment we also wanted to establish a baseline. We chose SimpleTcpCluster also assumes that all of the objects added to

to also research using the “high availability” feature built into your web server’s session implement the java.io.Serializable

Tomcat that allows the web server to replicate its sessions to interface.

other Tomcat web servers on the network.

VI. PERFORMANCE ANALYSIS

V. DISTRIBUTED ON OUR OTHER WEBSERVERS A. Sample Programs


Tomcat comes bundled with the ability to replicate sessions In order to test the performance of our Amazon S3 backed
to other Tomcat webservers. The class Tomcat uses to version of Apache Tomcat, we needed a sample servlet
perform this replication is the “SimpleTcpCluster” class [7]. program that would store values in our session. We created a
It is a cluster implementation using a simple multicast SampleLogin program that allowed a user to login to a
protocol, and it is responsible for setting up a cluster and website. It stores the user name and password in the session
sending and receiving messages to other servers. The along with the current time for each request of the servlet. We
SimpleTcpCluster configuration enables all-to-all session added the time attribute so that the session’s contents would
replication that will track when a session changes, and then change with every page reload, thus triggering the session to
send the modified session to all other servers. This is a be replicated externally. We used our SampleLogin program
common configuration used, and we hoped that our to test both the Amazon S3 backed version and the default
implementation would perform close to as well as this Apache Tomcat SimpleTcpCluster version.
reference implementation, but with the added benefits of a
centralized store for the session information.
B. Amazon Cloud
One down side of the SimpleTcpCluster approach is that it
To test our Amazon S3 backed Tomcat implementation we
complicates your network architecture. While this
launched 2 modified Tomcat webservers on the same server,
implementation works fine for 2-4 servers, as you expand to
where one used port 8080 and one used port 8081.
10’s or even 100’s of servers the network overhead grows
5

C. JMeter
 http://ec2-23-22-79-203.compute- To automate our testing we chose to use the open source
1.amazonaws.com:8080/SampleLogin/index.html project JMeter [8]. JMeter is designed to load test functional
 http://ec2-23-22-79-203.compute- behavior and measure performance. JMeter is used by the
1.amazonaws.com:8081/SampleLogin/index.html Apache family of projects for load testing of Tomcat and the
Both of these instances of Apache Tomcat would access Apache web server. JMeter however is not a web browser,
Amazon S3 to load and store their sessions. In our and it is not well suited to test web pages that contain
development environment we saw a noticeable lag introduced Javascript or require a lot of client side processing. Thus we
the first time an S3Client was initialized. This is due to the kept our SampleLogin test program free of any of these
time it takes to setup connections and verify credentials. In dependencies so we could focus on testing the performance
order to minimize this lag, Amazon recommends that of servers storing their session externally.
programmers reuse the S3Client class, instead of instantiating For our Apache Tomcat web server backed by Amazon S3
a new one each time. We followed this design tests, we configured JMeter’s cookie manager to enable
recommendation to improve performance. We also chose to session tracking. We then pointed JMeter at our server
locate our test server on the Amazon EC2 network for listening on port 8080. JMeter would contact that server,
performance reasons. The lag in upload speeds from a home and pass it a username and password to login. When
computer using a cable modem is noticeable when you are Apache processes this request it creates a new session that it
measuring in the 100’s ms. The Amazon EC2 cloud provides will then store in Amazon S3. For discussion sake, we will
impressive network response times and throughput say this session has a session id of “1234”. Then JMeter
particularly for calls between Amazon services (in our case would follow the “Click here to stay logged in” link. This
between Amazon EC2 and Amazon S3). would update the time attribute in session 1234, which
For our SimpleTcpCluster configuration we launched 2 would again trigger Apache Tomcat to backup the session to
servers in Amazon EC2 with the exact same configurations. Amazon S3. JMeter then goes to the same web server, but
We then modified their configuration files so that they would this time instead of using port 8080 it uses port 8081. It
broadcast session changes to each other. We again suspected turns out that Apache Tomcat sees this as a request from the
that these 2 servers would benefit from being collocated on the same client so JMeter gets a request for the same session id
Amazon cloud infrastructure. Since these were 2 separate of 1234. However, the web server on port 8081 is a
boxes we struggled with showing that the sessions were different webserver running in a different JVM from the one
replicating properly between the 2 servers. By observing the running on port 8080. Our 8081 version of Apache Tomcat
log files for the 2 servers it was clear that many network calls now looks up session id 1234 in it’s local memory, but it
were occurring between the servers, but modern browsers predictably does not find one there. So it then makes a
discouraged us from trying the hack the session id. We remote call to Amazon S3 to see if session id 1234 can be
ultimately deemed that since this was our base line, and also found in our external session store. Session id 1234 is
due to the extensive use of caching by the default Tomcat found, so the 8081 Apache Tomcat web server loads this
implementation, using time to create a way to hack setting the session into local memory, and to the user they continue to
session was not a top priority. We instead focused on our access the website uninterrupted even though they are now
Apache Tomcat backed by S3 implementation tests, and wrote being served from a completely different web server.
tests to ensure the default Tomcat implementation did have to This test is performed over 1000 times, and then JMeter
store and replicate many session changes. creates a nice plot of the performance. We were encouraged
to find that the average response time was 77ms. As we dug
6

into this finding we realized this is due to our particularly concerning.


implementation caching the changed session in internal
memory and then asynchronously queues up a task to
VII. FUTURE RESEARCH
backup the session to Amazon S3. Below the blue line
represents the load time of each page while the green line We had several ideas on where our research should next
represents the throughput we are achieving. The overall proceed. While our preliminary analysis was encouraging to
throughput of our Tomcat version was 761 requests per simulate more real world conditions we will need to test our
minute. implementation with larger objects stored in our session. Our
tests stored only a few bytes, while a production web server
would probably store sessions on the order of 100 kilobytes to
even megabytes. We also would like to test our solution on 4
and then 8 servers running concurrently. We suspect that our
implementation will scale better, as it is only bottlenecked by
the ability of Amazon S3 to scale and Amazon boasts that this
scaling problem is effectively solved for S3.
We would be interested to see how much adding 4, 8 or
even more servers to Apache Tomcat’s default configuration
We also performed the same type of test against our slows down the servers. Testing on servers outside of
SimpleTcpCluster default Tomcat setup. This test showed an Amazon EC2 would also be a useful data point. Finally we
average response time of 58ms. Again the blue line represents would like to improve our implementation. While we did
the load time of each page while the green line represents the implement the ability to put and get from the Amazon S3
throughput we are achieving. The overall throughput of the external store, we did not implement the ability to delete or
default Tomcat version was 994 requests per minute. expire old session values stored there.

VIII. CONCLUSION

We were very encouraged by the progress we were able to


make on creating a webserver that persisted its sessions to an
external data store. Amazon’s S3 key value store provides a
reliable, scalable, distributed central store that proves to have
very fast response times in the Amazon cloud. Our
This was not overly surprising given the fact that we only implementation was able to service web request in less than
had 2 servers running. We would expect this performance to 100ms, and this leads us to believe that this is a viable
degrade as we added more servers to a cluster. We also implementation for modern websites to build upon.
expected this to perform well given that the code has been
improving for the past 5 years. We would expect that we
REFERENCES
could improve our implementation’s performance since we
have so far spent no time on code optimization. Also given [1] Dynamo: Amazon's Highly Available Key-value Store. Giuseppe
DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati,
that we are measuring in milliseconds, while the difference of
Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter
19ms is statistically meaningful, in the real world this was not
7

Vosshall and Werner Vogels. Proceedings of the 21st ACM Symposium


on Operating Systems Principles 2007, SOSP 2007.
[2] Amazon Simple Storage Service (S3) http://aws.amazon.com/s3/
[3] Amazon Web Serviecs Blog
http://aws.typepad.com/aws/2012/04/amazon-s3-905-billion-objects-
and-650000-requestssecond.html
[4] Memcached-session-manager Project on Google Code
http://code.google.com/p/memcached-session-manager/
[5] Memcached Home Page http://memcached.org/
[6] Tomcat’s ManagerBase class
http://tomcat.apache.org/tomcat-7.0-doc/api/index.html?org/apache/
catalina/session/ManagerBase.html
[7] Tomcat Clustering Documentation http://tomcat.apache.org/tomcat-7.0-
doc/cluster-howto.html
[8] Apache JMeer project http://jmeter.apache.org/

You might also like