 | Level: Introductory Brian Goetz (brian@quiotix.com), Principal Consultant, Quiotix Corp
29 Jul 2004 Most nontrivial Web applications require maintaining some
sort of per-session state, such as the contents of a user's shopping
cart. How state will be managed and replicated in a clustered server
application has a significant impact on the scalability of the
application. Many J2SE and J2EE applications store state in the HttpSession provided by the Servlet API. This month, columnist Brian Goetz examines some of the options for state replication and how to most
effectively use HttpSession to provide good
scalability and performance.
Whether you are building J2EE or J2SE server applications, chances
are that you are using Java Servlets in one form or another -- either
directly, through a presentation layer such as JSP technology, Velocity, or
WebMacro, or through a servlet-based Web services implementation such
as Axis or Glue. One of the most important functions provided by the
Servlet API is session management -- authentication, expiration, and
maintenance of per-user session state through the HttpSession interface.
Session state
Nearly every Web application has some session state, which might be
as simple as remembering whether you are logged in, or might be a more
detailed history of your session, such as the contents of your
shopping cart, cached results of previous queries, or the complete
response history for a 20-page dynamic questionnaire. Because the HTTP
protocol is itself stateless, session state needs to be stored
somewhere and associated with your browsing session in a way that can
be easily retrieved the next time you request a page from the same Web
application. Fortunately, J2EE provides several means of managing
session state -- state could be stored in the data tier, in the Web
tier using the HttpSession interface from
the Servlet API, in the Enterprise JavaBeans (EJB) tier using stateful session beans, or even
in the client tier using cookies or hidden form fields. Unfortunately,
injudicious management of session state can cause serious performance
problems.
If your application is suited to storing per-user state in the
HttpSession, this option is often better
than the alternatives. Storing session state in the client using HTTP
cookies or hidden form fields has significant security risks -- it
exposes a part of your application internals to the untrusted client
layer. (One early e-commerce site stored the shopping cart contents,
including price, in hidden form fields, enabling a relatively simple
exploit that allowed any HTML- and HTTP-savvy user to buy any item for
$0.01. Oops.) Besides, using cookies or hidden form fields is messy,
error-prone, and brittle (and a cookie-based approach won't work at
all if the user has disabled the use of cookies in the browser).
The other alternatives for storing server-side state in J2EE
applications is to use stateful session beans, or store conversational
state in the database. While stateful session beans allow for greater
flexibility in session state management, there are still advantages to
storing the session state in the Web tier where practical. If the
business objects are stateless, then the application can often be scaled by
simply adding more Web servers, rather than more Web servers and more
EJB containers, which is generally less expensive and easier to do.
Another advantage of using the HttpSession to store
conversational state is that the Servlet API offers an easy way to be
notified when a session expires. Storing conversational state in the
database can be prohibitively expensive.
The servlet specification does not mandate that a servlet container
perform any type of session replication or persistence, but it does
suggest that state replication is considered an important part of the
raison d'etre for servlets in the first place, and it imposes
some requirements for containers that choose to do session
replication. Session replication enables a host of benefits -- load
balancing, scalability, fault tolerance, and high availability.
Accordingly, most servlet containers support some form of
HttpSession replication, but the mechanism,
configuration, and timing of replication is
implementation-dependent.
The HttpSession API
Briefly, the HttpSession interface
supports several methods that a servlet, JSP page, or other
presentation-layer component can use to maintain session information
across multiple HTTP requests. The session is tied to a specific user,
but shared across all servlets in a Web application -- it is not
specific to a single servlet. A useful way to think about the session
is that it is like a Map that stores
objects for the duration of a session -- you can store session
attributes by name using setAttribute and
retrieve them using getAttribute. The HttpSession interface also contains session
lifecycle methods, such as invalidate()
(which notifies the container that the session should be discarded).
Listing 1 shows the most commonly used elements of the HttpSession interface:
Listing 1. HttpSession API
public interface HttpSession {
Object getAttribute(String s);
Enumeration getAttributeNames();
void setAttribute(String s, Object o);
void removeAttribute(String s);
boolean isNew();
void invalidate();
void setMaxInactiveInterval(int i);
int getMaxInactiveInterval();
...
}
|
Theoretically, it is possible to completely replicate session state
coherently across a cluster, so that any node in the cluster can
service any request, and a dumb load balancer can simply route the
request in a round-robin fashion, avoiding hosts that have failed.
However, such tight replication has a considerable performance cost
and implementation complexity, and may also have scalability problems
as the cluster approaches a certain size.
A more common approach is to combine load balancing with session
affinity -- the load balancer is able to associate connections with
sessions and route subsequent requests within a session to the same
server. This feature is supported by numerous hardware and software
load balancers and means that replicated session information is only
accessed when the primary connection host fails and the session needs
to be failed over to another server.
Replication approaches
Replication offers a number of potential benefits, including
availability, fault-tolerance, and scalability. In
addition, there are numerous methods available for session replication;
the choice of method will depend on the size of the application
cluster, the goal of replication, and the replication facilities
supported by your servlet container. Replication has performance
costs, including CPU cycles (to serialize objects stored in the
session), network bandwidth (to propagate updates), and, for disk-based
schemes, the cost of writing to the disk or database.
Nearly all servlet containers perform HttpSession replication by serializing objects
stored in the HttpSession, so if you wish
to create a distributable application, you should make sure to
place only serializable objects in the session. (Some containers have
special handling for entities like EJB references, transaction
contexts, and other nonserializeable J2EE object types as well.)
JDBC-based replication
One approach for session replication is simply to serialize the
session contents and write it to a database. This approach is
straightforward enough, and has the advantage that not only can the
session fail over to any other host, but the session data can survive
the failure of the entire cluster. The downside of database-backed
replication is the performance cost -- database transactions are
expensive. While it scales well in the Web tier, it may create a
scaling problem in the data tier -- if the cluster grows large enough,
it may be difficult or cost-prohibitive to scale the data tier to
accommodate the volume of session data.
File-based replication
File-based replication is similar to using a database to store
serialized sessions, except that a shared file server is used to store
the session data, rather than a database. This approach generally has
lower costs (hardware costs, software licenses, and computing overhead) than
using a database, at the cost of some reliability (databases make
stronger persistence guarantees than do file systems).
Memory-based replication
Another approach to replication is to share copies of the
serialized session data with one or more other servers in the cluster.
Replicating all sessions to all hosts provides maximal availability
and is easiest on the load balancer, but will eventually place an
upper limit on the size of the cluster because of the memory
consumption requirements on each node and the network bandwidth
consumed by replication messages. Some application servers support
memory-based replication to "buddy" nodes, where each session exists
on one primary server and one (or more) backup server. Such schemes
scale better than replicating all sessions to all servers, but
complicate the job of the load balancer when it needs to fail the
session over to another server because it has to figure out which
other server(s) has that session.
Timing considerations
In addition to deciding how to store replicated session data,
there is also a question of when to replicate data. The most reliable,
but also most expensive, approach would be to replicate the data every
time it changes (such as at the end of each servlet invocation). Less
expensive, but introducing a risk of some lost data in the event of
failover, would be to replicate data no more than every N seconds.
Related to the question of timing is the question of whether to
replicate the entire session or to try and replicate only the
attributes in the session that have changed (which might comprise
significantly less data). These are all tradeoffs between reliability
and performance, and where to make the tradeoff depends on your
application. Servlet developers should realize that it's possible
that, in the event of failover, the session state might be "stale"
(based on a replica from several requests ago) and should be prepared
to deal with the session contents not being up-to-date. (For example,
if step 3 in an interview creates a session attribute, and when the
user is on step 4, the request fails over to a system whose session
state replica is two requests old, the servlet code for step 4 should
be prepared not to find that attribute in the session, and take action
-- such as redirecting -- accordingly, rather than assuming it to be
there and throwing a NullPointerException
when it is not.)
 |
Container support
Servlet containers vary in their options for HttpSession
replication and how to configure these options. IBM WebSphere® offers
the greatest variety of replication options, offering a choice of
in-memory or database-based replication, a choice of end-of-servlet or
time-based replication timing, and a choice of propagating the full
session snapshot or just the changed attributes. Memory-based
replication is based on JMS publish-subscribe, which can replicate to
all clones, a single "buddy" replica, or a dedicated replication
server.
WebLogic also offers a host of choices, including in-memory (using
a single buddy replica), file-based, or database-based. JBoss, when
using either the Tomcat or Jetty servlet containers, performs
memory-based replication with a choice of end-of-servlet or time-based
replication timing, and an option (in JBoss 3.2 and later) to snapshot
only changed attributes. Tomcat 5.0 offers memory-based replication to
all cluster nodes. In addition, through projects such as WADI, session
replication can be added to servlet containers such as Tomcat or Jetty
through the servlet filtering mechanism.
Improving performance in distributed Web applications
Whatever mechanism you decide on for session replication, you can
improve the performance and scalability of your Web application in a
few ways. First of all, remember that in order to gain the benefit of
session replication, you will need to mark your Web application as
distributable in the deployment descriptor, and make sure that
anything placed in the session is serializeable.
Keep the session minimal
Because replicating sessions has a cost that increases with the size
of the object graphs stored in the session, you should strive to put
as little data in the session as practical. Doing so reduces the
serialization overhead, the network bandwidth requirements, and disk
requirements for replication. In particular, it's generally a bad idea
to store shared objects in the session, because then they will be
replicated for each session in which they belong.
Don't bypass setAttribute
When mutating attributes in the session, beware that if the servlet
container is trying to do some sort of minimal updating (only
propagating attributes that have changed), the container may not
notice that you've changed the attribute if you don't call setAttribute. (Imagine you have a Vector in the session representing the items in
the shopping cart -- if you just call getAttribute() to retrieve the Vector and then add something to it without then
calling setAttribute again, the container
might not realize that Vector has been
changed.)
Use fine-grained session attributes
For containers that support minimal updating, you can reduce the
cost of session replication by placing multiple, finer-grained objects in
the session rather than one big monolithic object. That way, changes
to faster-changing data will not force the container to serialize and
propagate the slower-changing data as well.
Invalidate when done
If you know that the user is finished with the session (for instance, the
user has chosen to log out), make sure to call HttpSession.invalidate(). Otherwise, the session
will persist until it expires, which will consume memory, potentially
for a long time, depending on the session expiration timeout. Many
servlet containers place a limit on the amount of memory that can be
used across all sessions, and when that limit is reached, will
serialize the least recently used session and write it to disk. If you
know that the user is done with the session, save the container the
work and invalidate it.
Keep the session clean
If any large items in the session are only of use for a
portion of the session, remove them when they are no longer needed.
Removing them will reduce the cost of session replication. (This practice is
similar to the use of explicit nulling to help the garbage collector, which
regular readers know that I do not recommend in general, but in
this case the cost of maintaining garbage in a session is so much
higher because of replication that it is worth trying to help the
container in this way.)
Summary
Servlet containers, through HttpSession
replication, can do much of the heavy lifting for you in building a
replicated, highly available Web application. However, a
number of configuration options for replication exist, varying by container,
and the choice of replication strategy has consequences for the
fault-tolerance, performance, and scalability of the application. The
choice of replication strategy should not be an afterthought -- you
should consider it when building your Web applications. And, of
course, don't forget to do load testing to determine your
application's scalability -- before your customers do it for you.
Resources - Participate in the discussion forum.
- Read the complete Java theory and practice series by Brian Goetz.
- Kyle Brown and Keys Botzum's article "Improving HttpSession Performance with Smart Serialization" (developerWorks, November 2003) illustrates how to reduce the cost of serializing objects placed in the session.
- "Storing
objects in HTTP sessions" (developerWorks, August 2001) by Harvey W. Gunther looks at the performance cost of storing large object graphs in an
HttpSession.
- Also by Harvey W. Gunther, "Not
creating HttpSessions in JSPs by default" (developerWorks, August 2001) examines the performance
improvement resulting from
%lt;%@ page session="false"%>.
- Get an idea of the performance, security, and development convenience tradeoffs of various session replication schemes in "Large-scale
Servlet Programming" (developerWorks, November 2000) by Kyle Brown, Rachel Reinitz, and Skyler Thomas.
- Rod Johnson's book, Expert One-On-One J2EE Design and Development (John Wiley & Sons, 2002), offers a wealth of useful information on planning and building J2EE applications.
- For information on Servlet technology, go straight to the source.
- Find hundreds more Java technology resources on the
developerWorks Java technology zone.
- Browse for books on these and other technical topics.
About the author  | |  | Brian Goetz has been a
professional software developer for over 17 years. He is a Principal
Consultant at Quiotix, a
software development and consulting firm located in Los Altos,
California, and he serves on several JCP Expert Groups. See Brian's
published and
upcoming articles in popular industry publications. |
Rate this page
|  |