 | Level: Intermediate Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.
01 Aug 2002 Apache 2.0 has provided many API improvements. Uche Ogbuji gives an example of an Apache 2.0 filter module, and illustrates the new API by example.
Apache became the most popular Web server in part because of the rich
availability of third-party extensions for the server, and because its
open architecture made it quite easy to roll your own extensions. Of
course, nothing is ever just easy enough, so in developing Apache 2.0, one
of the main goals was to improve the Apache API to make it even easier to
develop extensions.
One key change was to specialize a very common
selection of typical extension modules, and make developing this
specialized subset even easier. Apache 2.0 has special APIs for
developing modules that only need to modify the contents of the response
to the user, or that only need to modify the details of the user's HTTP
request. These are called output filters and input filters, respectively. Output
filters are by far the most common, and a good example is the standard
Apache 2.0 module for computing the length of the content returned to the
user, in order to update the appropriate header and log entries. Another
example is a module for automatic spell-checking of outbound content (such
as the cleverly named "mod_speling" in Apache 1.3).
Setting up
Apache 2.0 is still a bit raw, and the docs are one area where the
developers haven't quite caught up (I'm sure they'd be happy for
contributors). I'll outline the steps I took to set up an Apache 2.0
install suitable for developing modules. I grabbed httpd-2.0.39.tar.gz
from the Apache FTP site (see Resources for a
link) and unpacked it to a good spot. Then I built the code using the trio
of commands well-known to any recent UNIX user:
./configure --prefix=/usr/local/apache/
make
make install
|
Use the --prefix option to configure if you'd
prefer not to use the default of "/usr/local/". Now you'll want to build
the full API docs, since these do not seem to be available online, and
they are the life-blood of the Apache module developer. In order to build
these, you'll need ScanDoc (see Resources), which generates HTML
documentation from special comments in the code, similar to JavaDoc. I
downloaded ScanDoc 0.14 and unpacked it to the same
directory as the Apache source. Then from the created directory:
$ cp scandoc ../httpd-2.0.39/srclib/apr/build/scandoc.pl
$ cp -r images/ ../httpd-2.0.39/docs/
|
I also had to patch the Apache source a tad bit to avoid making
scandocs choke. The tiny patch is in Listing 1.
Listing 1. Patch to allow generation of docs
--- include/util_time.h.old 2002-07-26 00:59:28.000000000 -0600
+++ include/util_time.h 2002-07-26 00:59:37.000000000 -0600
@@ -65,7 +65,7 @@
#endif
/**
- * @package Apache date/time handling functions
+ * @package Apache date-time handling functions
*/
/* Maximum delta from the current time, in seconds, for a past time
|
And for this effort, you are partially rewarded with the API docs in the
docs/api subdirectory of the source directory. Start with
docs/api/index.html. The make install command doesn't seem to do anything with
the API docs, so you might want to create a soft link by hand from the
api directory to the "manual" directory created by make install. I say "partially rewarded" because it seems there are some glaring gaps
in the documentation comments in the Apache headers. The generated docs
show quite a few functions whose names aren't given. I ended up using
the generated docs as a good starting point for finding what I needed, and
just reverted to "use the source, Luke" when necessary. I did find that a
handy technique for finding API functions is to search the include
directory using something like:
grep -C7 AP.*_DECLARE /usr/local/include/* | grep -C7 [search-keyword]
|
A simple output filter
An output filter modifies the content or headers generated by other
modules in some way. The simple example I'll demonstrate is a filter
that looks for the magic string "***TIME-COOKIE***" and replaces it with a
display of the current and localized time on the server. Of course, this
could be done as easily with an Apache server-side include or other such
utility, but this example allows us to demonstrate the Apache API. We'll
also throw in a little twist: if the magic string appears multiple times
in the content, our filter will always use the same time-stamp in each
case, even though the clock might have ticked a few times between the
first and subsequent time it finds such strings. This twist demonstrates
the management of filter context.
The Apache 2.0 runtime model is very thoughtfully designed, and acts
like a stream of content passing from one filter to the next. In fact,
the metaphor the Apache team has chosen is that of a bucket brigade. One
filter fills a "bucket" with content and passes it on to the next in the
chain. A filter may thus be called multiple times during the processing
of one HTTP transaction, as different chunks come through the bucket
brigade. For all but the most trivial filter, this means that filters
have to be able to save context of some sort between calls. In the case
of my example, the filter needs to remember what the time stamp was when
it substituted the first string. Listing 2 presents the filter. (The code in this article was tested
under Linux (Debian "sid"). Minor changes are required to use it on Windows.)
Listing 2. Output filter
#include "httpd.h"
#include "util_filter.h"
#include "http_config.h"
#include "http_log.h"
/* The string which is to be replaced by the time stamp */
static char TIME_COOKIE[] = "***TIME-COOKIE***";
/* Declare the module name */
module AP_MODULE_DECLARE_DATA time_cookie;
typedef struct tc_context_ {
apr_bucket_brigade *bb;
apr_time_t timestamp;
} tc_context;
/*
This function passes in the system filter information (f)
and the bucket brigade representing content to be filtered (bb)
*/
static int time_cookie_filter(ap_filter_t *f, apr_bucket_brigade *bb)
{
tc_context *ctx = f->ctx; /* The filter context */
apr_bucket *curr_bucket;
apr_pool_t *pool = f->r->pool; /* The pool for all memory requests */
/* The buffer where we shall place the time stamp string.
APR_RFC822_DATE_LEN the fixed length of such strings */
char time_str[APR_RFC822_DATE_LEN+1];
apr_time_t timestamp;
if (ctx == NULL) {
/* The first time this filter has been invoked for this transaction */
f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
ctx->bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
timestamp = apr_time_now();
ctx->timestamp = timestamp;
}
else {
/* Get the time stamp we've already set */
timestamp = ctx->timestamp;
}
/* Render the time into a string in RFC822 format */
apr_rfc822_date(time_str, timestamp);
/*
Iterate over each bucket in the brigade.
Find each "cookie" in the "kitchen" and replace with the time stamp
*/
APR_BRIGADE_FOREACH(curr_bucket, bb) {
const char *kitchen, *cookie;
apr_size_t len;
if (APR_BUCKET_IS_EOS(curr_bucket) || APR_BUCKET_IS_FLUSH(curr_bucket)) {
APR_BUCKET_REMOVE(curr_bucket);
APR_BRIGADE_INSERT_TAIL(ctx->bb, curr_bucket);
ap_pass_brigade(f->next, ctx->bb);
return APR_SUCCESS;
}
apr_bucket_read(curr_bucket, &kitchen, &len, APR_NONBLOCK_READ);
while (kitchen && strcmp(kitchen, "")) {
/* Return a poiner to the next occurrence of the cookie */
cookie = ap_strstr(kitchen, TIME_COOKIE);
if (cookie) {
/* Write the text up to the cookie, then the cookie
to the next filter in the chain
*/
ap_fwrite(f->next, ctx->bb, kitchen, cookie-kitchen);
ap_fputs(f->next, ctx->bb, time_str);
kitchen = cookie + sizeof(TIME_COOKIE) - 1;
/*
The following is an example of writing to the error log.
The message is actually not really appropriate for the error log,
but it serves as example.
*/
ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, f->r,
"Replacing cookie with \"%s\"", time_str);
} else {
/* No more cookies found, so just write the rest of the
string and flag that we're done
*/
ap_fputs(f->next, ctx->bb, kitchen);
kitchen = "";
}
}
}
return APR_SUCCESS;
}
/* Register the filter function as a filter for modifying the HTTP body (content) */
static void time_cookie_register_hook(apr_pool_t *pool)
{
ap_register_output_filter("TIMECOOKIE", time_cookie_filter,
AP_FTYPE_CONTENT_SET);
}
/* Define the module data */
module AP_MODULE_DECLARE_DATA time_cookie =
{
STANDARD20_MODULE_STUFF,
NULL, /* dir config creater */
NULL, /* dir merger --- default is to override */
NULL, /* server config */
NULL, /* merge server config */
NULL, /* command apr_table_t */
time_cookie_register_hook /* register hook */
};
|
Figuring out which #includes are required can also be a bit of an art,
and should be easier to determine when the 2.0 API documentation matures.
The module name declaration name is very important. In their
configuration files, users will have to explicitly load your module. (I
will show how this works below, by adding the time cookie module to mine.)
They will specify the compiled object file from which it is loaded as well
as the module name, which must match the AP_MODULE_DECLARE_DATA declaration. The next global
construct, the context structure, is also very important. Because the
filter might be called several times in order to complete its operation,
most instances will need to maintain information between each invocation.
Globals or static local variables are not an option, as they usually
aren't in multi-threaded programs because the same function could be
invoked in the handling of multiple simultaneous requests. The Apache API
solves this by maintaining a context for the module between invocations.
In our context we keep track of:
- The bucket brigade, which we pass on to the next filter down the line. To stretch the metaphor, we are gathering a brand new set of buckets, and as the filter before us hands us buckets of content, we'll modify the content as needed and pour it into the brand new buckets, which we'll hand down to the next filter. We keep track of this new batch of buckets in the context.
- The time stamp, which we generated upon first invocation. This ensures that all instances of the cookie in the document will be replaced with the same string regardless of how long the request takes to process.
In the filter function itself, the context is passed in as part of the
filter information structure, as well as the incoming bucket brigade,
which we will use as the source of content for processing. The context is
actually NULL the very first time the function is called per request.
Apache provides very sophisticated functions to offload most
memory-allocation worries from the programmer. We choose to use the pool
that we were handed for our resource requests rather than create our own,
which might be needed in some very advanced cases. Apache provides a
library of time functions as well. The type apr_time_t type is a single number that represents a
point in time. It can be converted to any of several representations of
time, including the human-readable string representation.
The first time the filter is invoked, the context structure is allocated
and an outgoing bucket brigade created. The current time stamp is also
obtained. In all cases, the time is then converted to human-friendly
representation. Then it's time to process the content, using APR_BRIGADE_FOREACH, the macro that Apache provides
for iterating over bucket brigades. The first order of business is to
check for special marker buckets for the end of the brigade. A filter is
always able to say "I've processed enough content for the moment: I'd like
to pass it on and wait my turn again". This is done using the ap_fflush function. When this is called by the
filter before ours, we get a special bucket that is marked by APR_BUCKET_IS_FLUSH. Once this bucket or the end of
the actual end of content is reached, we clean up the bucket brigade and
pass it on to the filter next in line, and return APR_SUCCESS, which lets Apache know we had no
problems.
And finally, we get to the earnest work of the article. From each bucket
we read the content (which becomes the kitchen
variable) and search for instances of the cookie, copying all other text
to the outgoing brigade as is, and writing the time stamp in place of any
cookies found (note that by "cookie," I do mean cookie in the general
programming sense, as a magic value -- and not a Web browser cookie).
Naturally, I use the Apache versions of the stream reading and writing
functions for buckets, but I also use an Apache version of the strstr function, which at first seems unnecessary
since all the relevant parameters are simple character pointers. Apache
provides a full string library, and you should always use Apache's version
for resource management and security reasons.
The Apache API, like many other programming frameworks, uses hooks to
invoke customized code. A hook is a function that is registered with
Apache and gets called at a particular point. There are hooks that get
invoked during configuration, as Apache is being started up, and so on.
This example does not use any of these hooks, so they are represented by
NULL values. There are also hooks for code to be invoked at
request time, which the example does use. The hook registration function
is invoked at start-up time, and specifies other hooks to be registered
with Apache. In this example, the filter registers itself in the normal
chain of filters. I specify the filter as type AP_FTYPE_CONTENT_SET, meaning that it operates on
content. There are also filter types for operating on headers and network
parameters. Then the module declaration that was declared at the top of
the file is filled in. I only need to register the hook for registering
the module. More complex filters may need to register other hooks in the
structure.
 |
Trying it out
It's easy enough to try this module out. Build it as follows:
gcc -fPIC -I$INCLUDE -c time_cookie.c -o time_cookie.o
gcc -shared -L$LIB -lapr -laprutil time_cookie.o -o time_cookie.so
|
Make sure that $INCLUDE and $LIB include the paths to the Apache headers and
libraries. Then copy the resulting time_cookie.so file to your Apache
modules file (say to /usr/local/apache/modules). Then add lines such as
the following to your httpd.conf:
LoadModule time_cookie modules/time_cookie.so
AddOutputFilter TIMECOOKIE .html
|
Then just start or restart the server using apachectl
start or apachectl restart. Then copy
Listing 3 to a file, time-cookie-test.html to a
place where it can be served up by Apache. The easiest way is to copy to
the Apache installation's htdocs directory, in which case you can see
the fruits by browsing to http://localhost/time-cookie-test.html. Since not
everybody has access to superuser, I did my work for this article --
including the Apache installation itself -- as a regular user, to make
sure that it would work for everyone. Since Apache was installed this way,
it precluded listening in port 80, so I set the listening port to 8000 in
httpd.conf and therefore used the URL http://localhost:8000/time-cookie-test.html. Figure 1
shows the result.
Listing 3. An HTML demonstration of the filter
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Apache 2.0 Time Cookie Filter Test (***TIME-COOKIE***)</title>
</head>
<body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
vlink="#000080" alink="#FF0000">
<h3>Apache HTTP Server Version 2.0</h3>
<h1>Apache 2.0 Time Cookie Filter Test</h1>
<p>This page request was generated on ***TIME-COOKIE***</p>
<p>--Uche Ogbuji</p>
</body>
</html>
|
Figure 1. An HTML page rendered through the time cookie filter

Modules
Filters merely modify data generated by other components. To
write an Apache extension that generates original content based on HTTP
requests, you would write a module. Well-known modules include the
built-in module for serving simple files, the Apache module for
server-side includes, mod_cgi for common gateway interface scripts, and
mod_perl and mod_python for invoking scripting languages efficiently and
directly. Writing modules tends to be a bit more complex than writing
filters. But since filters accommodate so many extension needs, it was
very helpful for Apache 2.0 to split them out into a simple API.
Modules use the same AP_MODULE_DECLARE_DATA structure for registering
hooks with the server. The main handler entry point has a simpler
signature, simply taking a request record from the HTTP message. Here is an
example from mod_python 3.0, which is tailored for Apache 2.0:
static int PythonHandler(request_rec *req) {
/* Handler stuff */
}
|
Modules are usually registered to handle particular file types or other
such criteria in the configuration file. This involves using a "magic
string" to identify the handler earmarked for the particular criteria.
Therefore, installing mod_python involves adding something such as the
following to the config file:
<Directory /pydir>
AddHandler python-program .py
PythonHandler script
</Directory>
|
This says that requests with a .py extension in the /pydir path are to be
handled by the handler that understands the magic string "python-program"
(in other words, mod_python). The third line is a special configuration directive
defined and processed by the mod_python module on server startup (the
module is invoked through a registered hook to process the configuration
file). Apache invokes all handlers for every request, so each handler
should quickly decide whether the request is meant for it. Most headers
thus start off with lines such as:
if (!req->handler || strcmp(req->handler, "python-program"))
return DECLINED;
|
This checks whether the magic handler string associated with the request
based on the path and other criteria matches the magic string mod_python
understands. If not, it declines the request.
Much of the rest of the fundamentals of writing a module are the same as
for writing a filter. You create a bucket brigade and put the content in
it as it is generated. You also use the ap_pass_brigade function when you're done, but the
recipient argument you pass it is r->output_filters, assuming you called the request
record structure r. Apache then takes over the
task of running the content through the filters.
Go forth and hack...
The Apache 2.0 API has clearly benefited from the long consideration and
design that the Apache developers have put into it. It is much less
haphazard than the 1.3 API and feels very coherent. Most of the general
tasks common for module writers are provided in functions, and
resource management is much more simple than in many other C APIs. The
main problem is the incompleteness of the documentation, and the hoops you
need to go through to generate them. At least this is something that one
can expect to get better over time. There is, on the other hand, no
improving a bad API design over time, so Apache gets the most important
matter right.
I felt a lot more comfortable with the process of writing a 2.0 module
than with 1.x. You should also feel bold in experimenting. Starting with
filters is definitely the way to get warmed up, and I would actually
suggest, if possible, writing a prototype filter variation on any module
project you might have. I only covered output filters, but the other
types are similar enough. Once you do go on to module writing, you may
want to look at the source code for mod_asis, which comes with Apache
(modules/generators/mod_asis.c), as a nice and simple example.
With the improvements in the Apache 2.0 API, you'll never have to worry
too much if you discover that Apache doesn't accommodate your every
need. After checking your favorite search engine to be sure no one else
has already scratched your itch, just jump in and roll your own Apache
extensions.
Resources - Start at the Apache home page, and in particular, the Apache httpd server home page. Don't forget to bookmark Apache Week.
- Go to the Developer Documentation for Apache-2.0, starting with the Apache API notes to get the authoritative details on Apache module and filter development.
- Learn a different set of details on the Apache 2.0 API from
an insider by reading the Apache 2.0 basics, a series by Ryan Bloom.
- Use ScanDoc, maintained on SourceForge, to generate the very handy Apache 2.0 API documentation from the source package.
- If you would rather use C++ than C for Apache module development, check out Codea, a C++ toolkit for Apache 2.0, and Zachary C. Miller's Apache API C++ Cookbook, which includes coverage of 2.0 API.
- Read David Seager's article "Language support in Apache through negotiation" (developerWorks, February 2001).
- Also take a look at the tutorial "Customizing Apache for maximum performance", by Jonathan Hassell (developerWorks, June 2002).
- Another helpful tutorial is "Inside the Apache directory structure", by Tom Syroid (developerWorks, March 2001).
- Also check out the tutorial "Apache Web Development with IBM DB2 for Linux", by Roger Midgette (developerWorks, January 2002).
About the author  | 
|  | Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and
consultancy specializing in XML solutions for enterprise knowledge
management applications. Fourthought develops 4Suite, an open source platform for XML, RDF,
and knowledge-management applications. Uche is a computer engineer
and author born in Nigeria, living and working in Boulder, Colorado. You can contact him at uche@ogbuji.net.
|
Rate this page
|  |