mod_scgi redirection

While working on a new Django project, I noticed something odd about running it under mod_scgi: if you were POSTing to a URL, /foo for example, and the view for that URL did a relative redirect, as in django.http.HttpResponseRedirect('/bar'), the 302 redirect wasn't making it back to the browser. Instead, the browser was acting like the result of POST /foo was a 200 OK followed by the data you'd receive from GET /bar, without the browser knowing that it coming from a new location. The big drawback to this is that if you do a reload, the browser tries to POST to /foo again, instead of just GET /bar. The Django docs recommend always responding to POSTs with redirects, just for this reason.

Strictly speaking, redirects should be absolute URLs (see section 14.30 in the HTTP specs), and if you use one of those, it acts as expected. Django is full of relative redirects, but the framework at this time doesn't seem to try and convert them to absolute. There is ticket #987 in the Django Trac that talks about this a bit.

Browsers seem to handle relative redirects OK through, and that behavior doesn't occur with the Django test http server. Having mod_scgi conceal what Django is doing is not so good.

Digging into the mod_scgi sourcecode apache2/mod_scgi.c reveals a section of code that's causing this change:

location = apr_table_get(r->headers_out, "Location");

if (location && location[0] == '/' &&
    ((r->status == HTTP_OK) || ap_is_HTTP_REDIRECT(r->status))) {

    apr_brigade_destroy(bb);

    /* Internal redirect -- fake-up a pseudo-request */
    r->status = HTTP_OK;

    /* This redirect needs to be a GET no matter what the original
    * method was.
    */
    r->method = apr_pstrdup(r->pool, "GET");
    r->method_number = M_GET;

    ap_internal_redirect_handler(location, r);
    return OK;
}

Tossing that section of code causes mod_scgi to leave the relative redirects alone.

Django, SCGI, and AJP

I've been doing a lot with Django lately, and initially set it up using mod_python as the Django docs recommend, but still have some reservations about that kind of arrangement. I'd like to go back to running it under SCGI or something similar.

Django has support builtin for FastCGI, but after trying to install mod_fastcgi in my Apache 2.0.x setup, decided it was a PITA. mod_scgi is quite easy to setup in Apache (even though the documentation is mostly nonexistent). After finding where Django implements its FastCGI support using the flup module, I saw that with just a few minor tweaks Django could be made to support all of flup's protocols, including SCGI and AJP (Apache Jserv Protocol).

AJP turns out to be very interesting because it's included standard with Apache 2.2 as mod_proxy_ajp, and can work with mod_proxy_balancer - meaning you could setup multiple Django instances and have Apache share the load between them.

After testing a bit, I submitted a patch, and will probably switch to running my Django sites as AJP servers managed by daemontools, and frontended by Apache 2.2

Getting PyBlosxom SCGI working under Lighttpd

Took another whack at getting PyBlosxom/SCGI working with Lighttpd, this time with better success. (I'm still getting up-to-speed with Lighttpd). This is working with the exact same SCGI setup I was working on the other day.

To elaborate a bit, the setup I'm trying to achieve is to:

  • Have the blog to be completely under "/blog/" in the URL namespace
  • Not get it confused with anything else that begins with "/blog" such as "/blog2".
  • Use "/blog/static/" URLs for serving static resources like CSS stylesheets and images off the disk (instead of running those requests through PyBlosxom's CGI code).

This is what I ended up with, seems to work fairly well, and I'm impressed with how Lighttpd makes it easy to put together a understandable configuration.

#
# External redirection to add a trailing "/" if exactly 
# "/blog" is requested
#
url.redirect = (
                "^/blog$" => "http://barryp.org:81/blog/",
               )

#
# The PyBlosxom Blog, lives under the "/blog/" url namespace
#
$HTTP["url"] =~ "^/blog/" {
    #
    # Static resources served from the disk
    #
    $HTTP["url"] =~ "^/blog/static/" {
        alias.url = ("/blog/static/" => "/data/blog/static/")
    }

    #
    # Everything non-static goes through SCGI
    #
    $HTTP["url"] !~ "^/blog/static/" {
        scgi.server = ( "/blog" => (
                                     (
                                     "host" => "127.0.0.1",
                                     "port" => 8040,
                                     "check-local" => "disable",
                                     )
                                   )
        )
    }
}

FastCGI, SCGI, and Apache: Background and Future

Ran across Mark Mayo's blog entry: FastCGI, SCGI, and Apache: Background and Future, which discusses exactly the things I've been struggling with this weekend. I have to agree that sticking an interpreter like Python directly into Apache is a lot of trouble. I've delved into Apache sourcecode, and the mass of macros and #ifdefs is enough to send you running away screaming. To try and graft Python onto that is just begging for trouble - and I've had some experience myself with grafting interpreters onto other things.

Running your webcode in separate processes just makes a lot of sense. You have much more freedom with choice of language and version of language. You can easily run things under different userids, chrooted, in jails/zones, on completely separate machines, completely separate OSes, maybe within virtual machines running different OSes on the same hardware.

Anyhow, thought I'd mention this because Mark's writeup made a lot of sense to me and I thought it was worth keeping a link to it.

Running a SCGI server under daemontools

Yesterday, I was working on Running PyBlosxom through SCGI, but during that time, I was running the SCGI server by hand in a console window. Once it was working I needed to arrange a way to run this in a more permanent fashion. Daemontools seems like an easy way to set this up, I already had it running on my server.

Daemontools runs a process called svscan that looks for directories in /var/service (the default when installed through the FreeBSD port) that contain an executable named run. If svscan also finds a log/run executable in that directory, it starts that too and ties the two together with a pipe. Daemontools includes a multilog program that reads from the pipe (stdin), and writes out and rotates log file for you automatically.

To get PyBlosxom/SCGI running under this, started by making a temporary directory, and copying in the three files needed to run PyBlosxom through SCGI

mkdir /tmp/pyblosxom
cd /tmp/pyblosxom

cp ~/config.py .
cp ~/wsgi_app.py .
cp ~/scgi_server.py .

(The first two files come from the PyBlosxom distribtution (the first one is customized). The third file is the one I came up with yesterday)

Next, I came up with a run script to execute the SCGI server under the www userid, with stderr tied to stdout. Daemontools has a setuidgid program that makes this pretty easy

#!/bin/sh
exec 2>&1
exec setuidgid www ./scgi_server.py

Next, made a log subdirectory, a log/main subdirectory to hold the actual log files (owned by www).

mkdir log
mkdir log/main
chown www:www log/main

And in the log directory put another tiny run script.

#!/bin/sh
exec setuidgid www multilog t ./main

Finally, made both run scripts executable, and moved the whole thing into /var/service

chmod +x run
chmod +x log/run
cd ..
mv pyblosxom /var/service

svscan sees the new directories within a few seconds, starts up both run scripts automatically, and you're in business. See the current contents of the log with:

cat /var/service/pyblosxom/log/main/current

Stop and restart the server with the Daemontools svc utility:

cd /var/service
svc -d pyblosxom ; svc -u pyblosxom

Running PyBlosxom through SCGI

Out of curiosity, ran the Apache Benchmark program ab on the plain CGI installation of PyBlosxom on my little server (-n 100 -c 10), and got around 1.5 requests/second. Decided to give SCGI a try, and got some better results.

Went about this based on what I had read in Deploying TurboGears with Lighttpd and SCGI. Tried Lighttpd at first, and it mostly worked, but I've got an Apache setup right now, so wanted to stick with that for the moment (and it seems a bit quicker anyhow). Basically started by loading flup with easy_install.

 
    easy_install flup

Copied the config.py and wsgi_app.py files from the PyBlosxom distribution into a directory, and added this little script into that same directory:

#!/usr/bin/env python
import sys
from flup.server.scgi_fork import WSGIServer
from wsgi_app import application

server = WSGIServer(application, 
                 scriptName='/blog', 
                 bindAddress=('127.0.0.1', 8040)
                )
ret = server.run()
sys.exit(ret and 42 or 0)

Installed mod_scgi built for Apache2 and added two lines to the config

LoadModule scgi_module libexec/apache2/mod_scgi.so

SCGIMount /blog 127.0.0.1:8040

Notice how the scriptName and bindAddress parameters in the Python code are matched in the SCGIMount Apache directive. With this setup, running the same ab benchmark yields about 10 to 15 requests/second - not too bad. Running the threaded SCGI server (remove the _fork from the first import line) wasn't as good, only 3 or 8 requests/second.

The setup seems a bit shaky in that the benchmark values seem to keep decreasing with every run, especially in the threaded mode. So there may be some problems in my setup or in flup/scgi/pyblosxom_wsgi.

Even if it was working fine, SCGI is probably overkill for running PyBlosxom when you're not expecting a lot of traffic. And if you were, you'd probably run it with --static to generate static pages. But it was a reasonable thing to fool with for the day when you want to run a more dynamic WSGI app.