Hacking mod_jk

Okay, can I just complain about what a pain in the neck I've been through?

The Problem

We connect Apache (web server) to Tomcat (app server) using a small module called mod_jk which basically just forwards requests from Apache to Tomcat and shuttles information back and forth.

Since Apache isn't up to the task* of serving the content when the URI looks like "/resource-search/images/select.gif;jsessionid=BLAHBLAH", we need to have Tomcat do it for us.

However, we don't want to map everything to Tomcat, since the whole point of using Apache is to have it serve our static content (like images).

Guess what? The mod_jk connector does not support URI mapping with complex maps like "/resource-search/*;jsessionid=*", so I had to add that feature. Yep, I had to hack the module code. And it's in C. (I love C).

A Solution

My solution was to introduce a new type of URI match: MATCH_TYPE_JSESSIONID. It basically allows you to use a configuration directive like this

JkMount /context/*;jsessionid* my_worker

...and have your URIs containing jsessionid information forwarded to Tomcat.

However, the solution is not as general as that configuration directive might lead you to believe. It does not actually support arbitrary wildcard matching. After the "/context/", the following string *must* be "*;jsessionid*".

At least it works. ;)

Here's the diff for common/jk_uri_worker_map.c that adds this ability. Comments welcome!

Next Steps

Perhaps a better solution would be to actually support full regular expressions instead of very simple wildcards in mod_jk. I mean, all it's really doing is matching a URI to a context and forwarding the request -- it's not rocket science. Then again, that's probably why it's so simple -- because it doesn't have to be very complicated.

Anyhow, a completely-regexp-based solution sucks for several reasons:

  1. Not every Tomcat user knows how to use regular expressions (but they should!)
  2. All existing configurations are broken (d'oh!), since "*" means something different to a regexp parser
  3. Regular expressions are pretty slow when compared to brute-force strstr and strchr matches

Another solution would be to create another type of JkMount directive, like this:

JkMountRegexp /context/.*;jsessionid.* my_worker

This would give you both speed (because you'd probably have few regexp mounts) and flexibility (the regexps), with the cost of configuration complexity. I think it's probably worth it. I think I'll look into this type of solution.

Another Solution

Another solution to this problem is to continue to have Apache serve the files by using some trickery with mod_rewrite. mod_rewrite is a module that lets you mangle URIs during processing. In our case, we just want to strip-out the ";jsessionid=STUFF" and give the rest to Apache.

This can be done with the following incantation:

# This handles URIs that Tomcat will not process.
RewriteEngine On
#RewriteLog /usr/local/apache/logs/rewrite.log
#RewriteLogLevel 0
RewriteRule /context-name/(.*);jsessionid=[0-9A-Z]*(.*) /context-name/$1$2 [PT,L]

A bit of explanation. RewriteEngine On turns-on the rewriting engine for the current host (or virtual host). The log setup should be obvious, except that RewriteLogLevel 0 means no nogging and I think that anything 3 or above will be full-logging.

Lastly, the rule itself. mod_rewrite works using regular expressions. The first argument to RewriteRule is the regex to match, and the second is the replacement string to use. The regex contains some things in parenthesis, and these back-references are used in the replacement string as $1 and $2.

Note that this rule is actually theoretical. I haven't gotton it to work just like this. To get it to work for me, I had to do this:

RewriteRule /context-name/(.*);jsessionid=[0-9A-Z]*(.*) /path/to/webaps/context/$1$2 [L]

I had to do this because I have statically compiled mod_rewrite into Apache, and therefore have no control over what order the in which the URI processors get to handle the request. :(

The last bit of text in the rule are flags. I use two different flags, here: The PT flag and the L flag. The PT flag indicates that after the URI replacement shown above, no further processing should be done before handing the URI to other URI-handlers. This is important because otherwise, mod_rewrite would attempt to map the URI to a filename, which usually doesn't work out the way you want it to.

The second flag is probably unnecessary. The L flag tells mod_rewrite that if this rule matches the URI that no further rules should be processed. I figure that it can't hurt to have this flag in there, and if I add other rules in the future, they won't interfere.

There are several advantages to this solution:

  1. mod_rewrite comes bundled with Apache.
  2. You don't have to re-compile your mod_jk.
  3. You don't have to maintain non-standard mod_jk code.
  4. The mapping configuration exists where it should -- within httpd.conf (instead of partially in the c-code, too)
  5. The solution is much more flexible (i.e. isn't pretty much hard-coded)

There is at least one disadvantage to this solution:

  1. Even when mod_jk is destined to handle the request, mod_rewrite takes a crack at it. I'm pretty sure that this is due to the fact that I've statically compiled mod_rewrite into Apache, rather than using a loadable module. I'll comment on this at some point in the future.

References: