Skip to main content

skip to main content

developerWorks  >  Linux | Web development  >

Connecting middleware to Apache 2.0

A kinder, gentler, more useful API

developerWorks
Document options

Document options requiring JavaScript are not displayed


Rate this page

Help us improve this content


Level: Intermediate

Uche Ogbuji (uche@ogbuji.net), Principal Consultant, Fourthought, Inc.

01 Aug 2002

Apache 2.0 has provided many API improvements. Uche Ogbuji gives an example of an Apache 2.0 filter module, and illustrates the new API by example.

Apache became the most popular Web server in part because of the rich availability of third-party extensions for the server, and because its open architecture made it quite easy to roll your own extensions. Of course, nothing is ever just easy enough, so in developing Apache 2.0, one of the main goals was to improve the Apache API to make it even easier to develop extensions.

One key change was to specialize a very common selection of typical extension modules, and make developing this specialized subset even easier. Apache 2.0 has special APIs for developing modules that only need to modify the contents of the response to the user, or that only need to modify the details of the user's HTTP request. These are called output filters and input filters, respectively. Output filters are by far the most common, and a good example is the standard Apache 2.0 module for computing the length of the content returned to the user, in order to update the appropriate header and log entries. Another example is a module for automatic spell-checking of outbound content (such as the cleverly named "mod_speling" in Apache 1.3).

Setting up

Apache 2.0 is still a bit raw, and the docs are one area where the developers haven't quite caught up (I'm sure they'd be happy for contributors). I'll outline the steps I took to set up an Apache 2.0 install suitable for developing modules. I grabbed httpd-2.0.39.tar.gz from the Apache FTP site (see Resources for a link) and unpacked it to a good spot. Then I built the code using the trio of commands well-known to any recent UNIX user:

./configure --prefix=/usr/local/apache/
make
make install

Use the --prefix option to configure if you'd prefer not to use the default of "/usr/local/". Now you'll want to build the full API docs, since these do not seem to be available online, and they are the life-blood of the Apache module developer. In order to build these, you'll need ScanDoc (see Resources), which generates HTML documentation from special comments in the code, similar to JavaDoc. I downloaded ScanDoc 0.14 and unpacked it to the same directory as the Apache source. Then from the created directory:

$ cp scandoc ../httpd-2.0.39/srclib/apr/build/scandoc.pl
$ cp -r images/ ../httpd-2.0.39/docs/

I also had to patch the Apache source a tad bit to avoid making scandocs choke. The tiny patch is in Listing 1.


Listing 1. Patch to allow generation of docs
		
--- include/util_time.h.old     2002-07-26 00:59:28.000000000 -0600
+++ include/util_time.h 2002-07-26 00:59:37.000000000 -0600
@@ -65,7 +65,7 @@
 #endif

 /**
- * @package Apache date/time handling functions
+ * @package Apache date-time handling functions
  */

 /* Maximum delta from the current time, in seconds, for a past time

And for this effort, you are partially rewarded with the API docs in the docs/api subdirectory of the source directory. Start with docs/api/index.html. The make install command doesn't seem to do anything with the API docs, so you might want to create a soft link by hand from the api directory to the "manual" directory created by make install.

I say "partially rewarded" because it seems there are some glaring gaps in the documentation comments in the Apache headers. The generated docs show quite a few functions whose names aren't given. I ended up using the generated docs as a good starting point for finding what I needed, and just reverted to "use the source, Luke" when necessary. I did find that a handy technique for finding API functions is to search the include directory using something like:

grep -C7 AP.*_DECLARE /usr/local/include/* | grep -C7 [search-keyword]



Back to top


A simple output filter

An output filter modifies the content or headers generated by other modules in some way. The simple example I'll demonstrate is a filter that looks for the magic string "***TIME-COOKIE***" and replaces it with a display of the current and localized time on the server. Of course, this could be done as easily with an Apache server-side include or other such utility, but this example allows us to demonstrate the Apache API. We'll also throw in a little twist: if the magic string appears multiple times in the content, our filter will always use the same time-stamp in each case, even though the clock might have ticked a few times between the first and subsequent time it finds such strings. This twist demonstrates the management of filter context.

The Apache 2.0 runtime model is very thoughtfully designed, and acts like a stream of content passing from one filter to the next. In fact, the metaphor the Apache team has chosen is that of a bucket brigade. One filter fills a "bucket" with content and passes it on to the next in the chain. A filter may thus be called multiple times during the processing of one HTTP transaction, as different chunks come through the bucket brigade. For all but the most trivial filter, this means that filters have to be able to save context of some sort between calls. In the case of my example, the filter needs to remember what the time stamp was when it substituted the first string. Listing 2 presents the filter. (The code in this article was tested under Linux (Debian "sid"). Minor changes are required to use it on Windows.)


Listing 2. Output filter
		
#include "httpd.h"
#include "util_filter.h"
#include "http_config.h"
#include "http_log.h"

/* The string which is to be replaced by the time stamp */
static char TIME_COOKIE[] = "***TIME-COOKIE***";

/* Declare the module name */
module AP_MODULE_DECLARE_DATA time_cookie;

typedef struct tc_context_ {
    apr_bucket_brigade *bb;
    apr_time_t timestamp;
} tc_context;

/*
This function passes in the system filter information (f)
and the bucket brigade representing content to be filtered (bb)
*/
static int time_cookie_filter(ap_filter_t *f, apr_bucket_brigade *bb)
{
  tc_context *ctx = f->ctx;       /* The filter context */
  apr_bucket *curr_bucket;
  apr_pool_t *pool = f->r->pool;  /* The pool for all memory requests  */
  /* The buffer where we shall place the time stamp string.
     APR_RFC822_DATE_LEN the fixed length of such strings */
  char time_str[APR_RFC822_DATE_LEN+1];
  apr_time_t timestamp;

  if (ctx == NULL) {
    /* The first time this filter has been invoked for this transaction */
    f->ctx = ctx = apr_pcalloc(f->r->pool, sizeof(*ctx));
    ctx->bb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
    timestamp = apr_time_now();
    ctx->timestamp = timestamp;
  }
  else {
    /* Get the time stamp we've already set */
    timestamp = ctx->timestamp;
  }

  /* Render the time into a string in RFC822 format */
  apr_rfc822_date(time_str, timestamp);

  /*
    Iterate over each bucket in the brigade.
    Find each "cookie" in the "kitchen" and replace with the time stamp
   */
  APR_BRIGADE_FOREACH(curr_bucket, bb) {
    const char *kitchen, *cookie;
    apr_size_t len;

    if (APR_BUCKET_IS_EOS(curr_bucket) || APR_BUCKET_IS_FLUSH(curr_bucket)) {
      APR_BUCKET_REMOVE(curr_bucket);
      APR_BRIGADE_INSERT_TAIL(ctx->bb, curr_bucket);
      ap_pass_brigade(f->next, ctx->bb);
      return APR_SUCCESS;
    }
    apr_bucket_read(curr_bucket, &kitchen, &len, APR_NONBLOCK_READ);
    while (kitchen && strcmp(kitchen, "")) {
      /* Return a poiner to the next occurrence of the cookie */
      cookie = ap_strstr(kitchen, TIME_COOKIE);
      if (cookie) {
        /* Write the text up to the cookie, then the cookie
           to the next filter in the chain
        */
        ap_fwrite(f->next, ctx->bb, kitchen, cookie-kitchen);
        ap_fputs(f->next, ctx->bb, time_str);
        kitchen = cookie + sizeof(TIME_COOKIE) - 1;
        /*
          The following is an example of writing to the error log.
          The message is actually not really appropriate for the error log,
          but it serves as example.
        */
        ap_log_rerror(APLOG_MARK, APLOG_ERR, 0, f->r,
                      "Replacing cookie with \"%s\"", time_str);
      } else {
        /* No more cookies found, so just write the rest of the
           string and flag that we're done
        */
        ap_fputs(f->next, ctx->bb, kitchen);
        kitchen = "";
      }
    }
  }
  return APR_SUCCESS;	
}

/* Register the filter function as a filter for modifying the HTTP body (content) */
static void time_cookie_register_hook(apr_pool_t *pool)
{
  ap_register_output_filter("TIMECOOKIE", time_cookie_filter,
                            AP_FTYPE_CONTENT_SET);
}

/* Define the module data */
module AP_MODULE_DECLARE_DATA time_cookie = 
{
  STANDARD20_MODULE_STUFF,
  NULL,                        /* dir config creater */
  NULL,                        /* dir merger --- default is to override */
  NULL,                        /* server config */
  NULL,                        /* merge server config */
  NULL,                        /* command apr_table_t */
  time_cookie_register_hook	   /* register hook */
};

Figuring out which #includes are required can also be a bit of an art, and should be easier to determine when the 2.0 API documentation matures. The module name declaration name is very important. In their configuration files, users will have to explicitly load your module. (I will show how this works below, by adding the time cookie module to mine.) They will specify the compiled object file from which it is loaded as well as the module name, which must match the AP_MODULE_DECLARE_DATA declaration. The next global construct, the context structure, is also very important. Because the filter might be called several times in order to complete its operation, most instances will need to maintain information between each invocation. Globals or static local variables are not an option, as they usually aren't in multi-threaded programs because the same function could be invoked in the handling of multiple simultaneous requests. The Apache API solves this by maintaining a context for the module between invocations. In our context we keep track of:

  • The bucket brigade, which we pass on to the next filter down the line. To stretch the metaphor, we are gathering a brand new set of buckets, and as the filter before us hands us buckets of content, we'll modify the content as needed and pour it into the brand new buckets, which we'll hand down to the next filter. We keep track of this new batch of buckets in the context.
  • The time stamp, which we generated upon first invocation. This ensures that all instances of the cookie in the document will be replaced with the same string regardless of how long the request takes to process.

In the filter function itself, the context is passed in as part of the filter information structure, as well as the incoming bucket brigade, which we will use as the source of content for processing. The context is actually NULL the very first time the function is called per request. Apache provides very sophisticated functions to offload most memory-allocation worries from the programmer. We choose to use the pool that we were handed for our resource requests rather than create our own, which might be needed in some very advanced cases. Apache provides a library of time functions as well. The type apr_time_t type is a single number that represents a point in time. It can be converted to any of several representations of time, including the human-readable string representation.

The first time the filter is invoked, the context structure is allocated and an outgoing bucket brigade created. The current time stamp is also obtained. In all cases, the time is then converted to human-friendly representation. Then it's time to process the content, using APR_BRIGADE_FOREACH, the macro that Apache provides for iterating over bucket brigades. The first order of business is to check for special marker buckets for the end of the brigade. A filter is always able to say "I've processed enough content for the moment: I'd like to pass it on and wait my turn again". This is done using the ap_fflush function. When this is called by the filter before ours, we get a special bucket that is marked by APR_BUCKET_IS_FLUSH. Once this bucket or the end of the actual end of content is reached, we clean up the bucket brigade and pass it on to the filter next in line, and return APR_SUCCESS, which lets Apache know we had no problems.

And finally, we get to the earnest work of the article. From each bucket we read the content (which becomes the kitchen variable) and search for instances of the cookie, copying all other text to the outgoing brigade as is, and writing the time stamp in place of any cookies found (note that by "cookie," I do mean cookie in the general programming sense, as a magic value -- and not a Web browser cookie). Naturally, I use the Apache versions of the stream reading and writing functions for buckets, but I also use an Apache version of the strstr function, which at first seems unnecessary since all the relevant parameters are simple character pointers. Apache provides a full string library, and you should always use Apache's version for resource management and security reasons.

The Apache API, like many other programming frameworks, uses hooks to invoke customized code. A hook is a function that is registered with Apache and gets called at a particular point. There are hooks that get invoked during configuration, as Apache is being started up, and so on. This example does not use any of these hooks, so they are represented by NULL values. There are also hooks for code to be invoked at request time, which the example does use. The hook registration function is invoked at start-up time, and specifies other hooks to be registered with Apache. In this example, the filter registers itself in the normal chain of filters. I specify the filter as type AP_FTYPE_CONTENT_SET, meaning that it operates on content. There are also filter types for operating on headers and network parameters. Then the module declaration that was declared at the top of the file is filled in. I only need to register the hook for registering the module. More complex filters may need to register other hooks in the structure.



Back to top


Trying it out

It's easy enough to try this module out. Build it as follows:

gcc -fPIC -I$INCLUDE -c time_cookie.c -o time_cookie.o
gcc -shared -L$LIB -lapr -laprutil time_cookie.o -o time_cookie.so

Make sure that $INCLUDE and $LIB include the paths to the Apache headers and libraries. Then copy the resulting time_cookie.so file to your Apache modules file (say to /usr/local/apache/modules). Then add lines such as the following to your httpd.conf:

LoadModule time_cookie modules/time_cookie.so
AddOutputFilter TIMECOOKIE .html

Then just start or restart the server using apachectl start or apachectl restart. Then copy Listing 3 to a file, time-cookie-test.html to a place where it can be served up by Apache. The easiest way is to copy to the Apache installation's htdocs directory, in which case you can see the fruits by browsing to http://localhost/time-cookie-test.html. Since not everybody has access to superuser, I did my work for this article -- including the Apache installation itself -- as a regular user, to make sure that it would work for everyone. Since Apache was installed this way, it precluded listening in port 80, so I set the listening port to 8000 in httpd.conf and therefore used the URL http://localhost:8000/time-cookie-test.html. Figure 1 shows the result.


Listing 3. An HTML demonstration of the filter
		
  <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
    "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <title>Apache 2.0 Time Cookie Filter Test (***TIME-COOKIE***)</title>
  </head>

  <body bgcolor="#FFFFFF" text="#000000" link="#0000FF"
        vlink="#000080" alink="#FF0000">
      <h3>Apache HTTP Server Version 2.0</h3>

    <h1>Apache 2.0 Time Cookie Filter Test</h1>

    <p>This page request was generated on ***TIME-COOKIE***</p>
    <p>--Uche Ogbuji</p>


  </body>
</html>


Figure 1. An HTML page rendered through the time cookie filter
An HTML page rendered through the time cookie filter


Back to top


Modules

Filters merely modify data generated by other components. To write an Apache extension that generates original content based on HTTP requests, you would write a module. Well-known modules include the built-in module for serving simple files, the Apache module for server-side includes, mod_cgi for common gateway interface scripts, and mod_perl and mod_python for invoking scripting languages efficiently and directly. Writing modules tends to be a bit more complex than writing filters. But since filters accommodate so many extension needs, it was very helpful for Apache 2.0 to split them out into a simple API.

Modules use the same AP_MODULE_DECLARE_DATA structure for registering hooks with the server. The main handler entry point has a simpler signature, simply taking a request record from the HTTP message. Here is an example from mod_python 3.0, which is tailored for Apache 2.0:

static int PythonHandler(request_rec *req) {
  /* Handler stuff */
}

Modules are usually registered to handle particular file types or other such criteria in the configuration file. This involves using a "magic string" to identify the handler earmarked for the particular criteria. Therefore, installing mod_python involves adding something such as the following to the config file:

<Directory /pydir>
    AddHandler python-program .py
    PythonHandler script
</Directory>

This says that requests with a .py extension in the /pydir path are to be handled by the handler that understands the magic string "python-program" (in other words, mod_python). The third line is a special configuration directive defined and processed by the mod_python module on server startup (the module is invoked through a registered hook to process the configuration file). Apache invokes all handlers for every request, so each handler should quickly decide whether the request is meant for it. Most headers thus start off with lines such as:

    if (!req->handler || strcmp(req->handler, "python-program"))
        return DECLINED;

This checks whether the magic handler string associated with the request based on the path and other criteria matches the magic string mod_python understands. If not, it declines the request.

Much of the rest of the fundamentals of writing a module are the same as for writing a filter. You create a bucket brigade and put the content in it as it is generated. You also use the ap_pass_brigade function when you're done, but the recipient argument you pass it is r->output_filters, assuming you called the request record structure r. Apache then takes over the task of running the content through the filters.



Back to top


Go forth and hack...

The Apache 2.0 API has clearly benefited from the long consideration and design that the Apache developers have put into it. It is much less haphazard than the 1.3 API and feels very coherent. Most of the general tasks common for module writers are provided in functions, and resource management is much more simple than in many other C APIs. The main problem is the incompleteness of the documentation, and the hoops you need to go through to generate them. At least this is something that one can expect to get better over time. There is, on the other hand, no improving a bad API design over time, so Apache gets the most important matter right.

I felt a lot more comfortable with the process of writing a 2.0 module than with 1.x. You should also feel bold in experimenting. Starting with filters is definitely the way to get warmed up, and I would actually suggest, if possible, writing a prototype filter variation on any module project you might have. I only covered output filters, but the other types are similar enough. Once you do go on to module writing, you may want to look at the source code for mod_asis, which comes with Apache (modules/generators/mod_asis.c), as a nice and simple example.

With the improvements in the Apache 2.0 API, you'll never have to worry too much if you discover that Apache doesn't accommodate your every need. After checking your favorite search engine to be sure no one else has already scratched your itch, just jump in and roll your own Apache extensions.



Resources



About the author

Photo of Uche Ogbuji

Uche Ogbuji is a consultant and co-founder of Fourthought Inc., a software vendor and consultancy specializing in XML solutions for enterprise knowledge management applications. Fourthought develops 4Suite, an open source platform for XML, RDF, and knowledge-management applications. Uche is a computer engineer and author born in Nigeria, living and working in Boulder, Colorado. You can contact him at uche@ogbuji.net.




Rate this page


Please take a moment to complete this form to help us better serve you.



YesNoDon't know
 


 


12345
Not
useful
Extremely
useful
 


Back to top