Running Python WSGI apps with SCGI and inetd

Using scgi-inetd-wsgi

Previously, I wrote about running CGI Scripts with Nginx using SCGI with the help of a super-server such as inetd and a small C shim that takes a SCGI request from stdin and sets up a CGI enviroment.

There's also a companion project on GitHub for doing something similar with Python WSGI apps. The code works on Python 2.6 or higher (including Python 3.x). It can easily be patched for Python 2.5 or lower by with a simple string substitition mentioned in the source file

It's not something you'd want to run a frequently-accessed app with, because there'd be quite a bit of overhead launching a Python process to handle each request. It may be useful however for infrequently used apps where you don't want to have to keep and monitor a long-running process, or for development of a WSGI app where you don't want to have to stop/start a process everytime you make a change.

Let's take a look at a diagram to see what the flow will be:

SCGI<->WSGI

  1. Nginx opens a socket listened to by inetd
  2. inetd spawns a Python script with stdin and stdout connected to the accepted connection
  3. The Python script would import inetd_scgi and call its run_app function passing a WSGI app to actually handle the request. run_app will read the SCGI request from stdin, setup a WSGI enviroment, call the handler, and send the handler's response back to Nginx via stdout.

Here's how you'd wire up the Hello World app from PEP 3333

#!/usr/bin/env python
HELLO_WORLD = b"Hello world!\n"

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type', 'text/plain')]
    start_response(status, response_headers)
    return [HELLO_WORLD]

if __name__ == '__main__':
    import inetd_scgi
    inetd_scgi.run_app(simple_app)

If you had saved that script as say /local/test.py, you might add this to /etc/inetd.conf to serve it up:

:www:www:200:/var/run/test.sock  stream   unix   nowait/4  www /local/test.py /local/test.py

and in Nginx with:

location /test {
    scgi_pass unix:/var/run/test.sock;
    include /usr/local/etc/nginx/scgi_params;
    fastcgi_split_path_info ^(/test)(.*);
    scgi_param  SCRIPT_NAME $fastcgi_script_name;
    scgi_param  PATH_INFO $fastcgi_path_info;
}    

Then, accessing http://localhost/test should show 'Hello world!'

AWStats under Nginx and SCGI

Earlier, I wrote about running CGI Scripts with Nginx using SCGI with the help of a small C shim. One particular CGI app I've had to alter slightly to work under this setup is AWStats, which is a decent-sized Perl app, but only requires one line added to satisfy SCGI's requirement of a Status line at the beginning of a response.

Here's a patch to AWStats 7.0

--- awstats.pl.original 2011-09-11 21:20:40.954555528 -0500
+++ awstats.pl  2011-03-31 00:19:35.867343845 -0500
@@ -750,6 +750,7 @@
 #------------------------------------------------------------------------------
 sub http_head {
        if ( !$HeaderHTTPSent ) {
+                print "Status: 200 OK\n";
                my $newpagecode = $PageCode ? $PageCode : "utf-8";
                if ( $BuildReportFormat eq 'xhtml' || $BuildReportFormat eq 'xml' ) {
                        print( $ENV{'HTTP_USER_AGENT'} =~ /MSIE|Googlebot/i

CGI Scripts with Nginx using SCGI

Using scgi_run with Nginx

Nginx is a great web server, but one thing it doesn't support is CGI scripts. Not all webapps need to be high-performance setups capable of hundreds or thousands of requests per second. Sometimes you just want something capable of handling a few requests now and then, and don't want to keep a long-running process going all the time just for that one webapp. How do you handle something like that under Nginx?

Well, it turns out you're going to have to have something running as a long-running external process to help Nginx out (because Nginx can't spawn processes itself). It just doesn't have to be dedicated to any one particular webapp. One way to go would be to setup another webserver that can do CGI scripts, and have Nginx proxy to that when need be.

Apache is one possibility, something like this:

Nginx <-> Apache

But Apache's a fairly big program, has lots of features, a potentially complicated configuration. Kind of defeats the purpose of going to a lighter-weight program like Nginx. What else can we do?

Super-servers

Many Unix-type systems will have a super-server available to launch daemons as need be when some network connection is made. On BSD boxes it's typically inetd, MacOSX has launchd, Linux distros often have xinetd or other choices available.

If we already have a super-server running on our box, why not setup Nginx to connect to that, and let the super-server take care of launching our CGI script? We just need one extra piece of the puzzle, something to read a web request over the socket Nginx opened up, setup the CGI environment, and execute the script.

Wait, that sounds like a web server - aren't we back to something like Apache again? No, it doesn't have to be anything nearly that complicated if we were to use the SCGI protocol, instead of HTTP.

SCGI

SCGI is a very simple protocol that's supported by Nginx and many other webservers. It's much much simpler than FastCGI, and maps pretty closely to the CGI specfication, with one minor difference to note...

In the CGI RFC, the response may contain an optional Status line, as in:

Status: 200 OK

In the SCGI protocol, the Status line is required, not optional.

Nginx will function with the Status line missing, but there'll be warnings in your error log.

If you can alter your CGI scripts to include a Status line, or live with warnings in logs, we have a way forward now.

scgi_run

I've got a C project on GitHub that implements this small piece of glue to turn a SCGI request into a CGI enviroment. The binary weighs in at around 8 to 12 Kilobytes after being stripped.

Basically, we're looking at a flow like this:

Nginx <-> SCGI

  1. Nginx connects to a socket listened to by inetd
  2. inetd spawns scgi_run, with stdin and stdout wired to the accepted connection
  3. scgi_run reads SCGI request headers from stdin and sets up a CGI environment
  4. scgi_run execs CGI script (stdin and stdout are still connected to the socket to Nginx)
  5. CGI script reads request body if necessary from stdin and writes response out through stdout.

A couple things to note here

  • when we get to the final step, the CGI script is talking directly to Nginx - there's no buffering by any other applications like there would be in an Apache setup.
  • scgi_run is no longer executing, it execed the CGI script so there's not another process hanging around waiting on anything.
  • A super-server like inetd can typically be configured to run the handler under any userid you want, so you basically get SUEXEC-type functionality for free here.

The scgi_run code on GitHub operates in two modes:

  1. If argv[1] ends with a slash /, then argv[1] is taken to be a directory name, and the program will look for the SCRIPT_FILENAME passed by Nginx in that directory.
  2. Otherwise, argv[1] is taken as the path to a specific CGI script (so SCRIPT_FILENAME is ignored), and any additional arguments are passed on to the CGI script.

Configuration

A simple setup looks something like this, assuming you've compiled scgi_run and have the binary stored as /local/scgi_run

For FreeBSD inetd for example, you might add a line to /etc/inetd.conf like this:

:www:www:600:/var/run/scgi_localcgi.sock stream  unix    nowait/16   www /local/scgi_run /local/scgi_run /local/cgi-bin/

Which causes inetd to listen to a Unix socket named /var/run/scgi_localcgi.sock, and when a connection is made, it spawns /local/scgi_run with argv[0] set to /local/scgi_run and argv[1] set to /local/cgi-bin/. As a bonus, the socket ownership is set to www:www and chmoded to 0600, which limits who can connect to it.

In Nginx, you might have something like:

location /local-cgi/ {
    alias /local/cgi-bin/;

    scgi_pass unix:/var/run/scgi_localcgi.sock;
    include /usr/local/etc/nginx/scgi_params;
    scgi_param  SCRIPT_NAME $fastcgi_script_name;
    scgi_param  PATH_INFO $fastcgi_path_info;
    scgi_param  SCRIPT_FILENAME $request_filename;
}

And then for a simple script, you might have /local/cgi-bin/hello.sh as

#!/bin/sh
echo "Status: 200 OK"
echo "Content-Type: text/plain"
echo ""
echo "Hello World"

That you would run by hitting http://localhost/local-cgi/hello.sh

Conclusion

So, with the help of a tiny 8KB binary, Nginx (or any other SCGI client) with the help of a super-server like inetd can execute CGI scripts (keeping in mind though the requirement for the Status line). It's a fairly lightweight solution that may also be useful in embedded situations.

Enjoy, and go buy some harddrives to store your CGI scripts on, I hear SSDs are very nice. :)

mod_scgi redirection

While working on a new Django project, I noticed something odd about running it under mod_scgi: if you were POSTing to a URL, /foo for example, and the view for that URL did a relative redirect, as in django.http.HttpResponseRedirect('/bar'), the 302 redirect wasn't making it back to the browser. Instead, the browser was acting like the result of POST /foo was a 200 OK followed by the data you'd receive from GET /bar, without the browser knowing that it coming from a new location. The big drawback to this is that if you do a reload, the browser tries to POST to /foo again, instead of just GET /bar. The Django docs recommend always responding to POSTs with redirects, just for this reason.

Strictly speaking, redirects should be absolute URLs (see section 14.30 in the HTTP specs), and if you use one of those, it acts as expected. Django is full of relative redirects, but the framework at this time doesn't seem to try and convert them to absolute. There is ticket #987 in the Django Trac that talks about this a bit.

Browsers seem to handle relative redirects OK through, and that behavior doesn't occur with the Django test http server. Having mod_scgi conceal what Django is doing is not so good.

Digging into the mod_scgi sourcecode apache2/mod_scgi.c reveals a section of code that's causing this change:

location = apr_table_get(r->headers_out, "Location");

if (location && location[0] == '/' &&
    ((r->status == HTTP_OK) || ap_is_HTTP_REDIRECT(r->status))) {

    apr_brigade_destroy(bb);

    /* Internal redirect -- fake-up a pseudo-request */
    r->status = HTTP_OK;

    /* This redirect needs to be a GET no matter what the original
    * method was.
    */
    r->method = apr_pstrdup(r->pool, "GET");
    r->method_number = M_GET;

    ap_internal_redirect_handler(location, r);
    return OK;
}

Tossing that section of code causes mod_scgi to leave the relative redirects alone.

Django, SCGI, and AJP

I've been doing a lot with Django lately, and initially set it up using mod_python as the Django docs recommend, but still have some reservations about that kind of arrangement. I'd like to go back to running it under SCGI or something similar.

Django has support builtin for FastCGI, but after trying to install mod_fastcgi in my Apache 2.0.x setup, decided it was a PITA. mod_scgi is quite easy to setup in Apache (even though the documentation is mostly nonexistent). After finding where Django implements its FastCGI support using the flup module, I saw that with just a few minor tweaks Django could be made to support all of flup's protocols, including SCGI and AJP (Apache Jserv Protocol).

AJP turns out to be very interesting because it's included standard with Apache 2.2 as mod_proxy_ajp, and can work with mod_proxy_balancer - meaning you could setup multiple Django instances and have Apache share the load between them.

After testing a bit, I submitted a patch, and will probably switch to running my Django sites as AJP servers managed by daemontools, and frontended by Apache 2.2

Getting PyBlosxom SCGI working under Lighttpd

Took another whack at getting PyBlosxom/SCGI working with Lighttpd, this time with better success. (I'm still getting up-to-speed with Lighttpd). This is working with the exact same SCGI setup I was working on the other day.

To elaborate a bit, the setup I'm trying to achieve is to:

  • Have the blog to be completely under "/blog/" in the URL namespace
  • Not get it confused with anything else that begins with "/blog" such as "/blog2".
  • Use "/blog/static/" URLs for serving static resources like CSS stylesheets and images off the disk (instead of running those requests through PyBlosxom's CGI code).

This is what I ended up with, seems to work fairly well, and I'm impressed with how Lighttpd makes it easy to put together a understandable configuration.

#
# External redirection to add a trailing "/" if exactly 
# "/blog" is requested
#
url.redirect = (
                "^/blog$" => "http://barryp.org:81/blog/",
               )

#
# The PyBlosxom Blog, lives under the "/blog/" url namespace
#
$HTTP["url"] =~ "^/blog/" {
    #
    # Static resources served from the disk
    #
    $HTTP["url"] =~ "^/blog/static/" {
        alias.url = ("/blog/static/" => "/data/blog/static/")
    }

    #
    # Everything non-static goes through SCGI
    #
    $HTTP["url"] !~ "^/blog/static/" {
        scgi.server = ( "/blog" => (
                                     (
                                     "host" => "127.0.0.1",
                                     "port" => 8040,
                                     "check-local" => "disable",
                                     )
                                   )
        )
    }
}

FastCGI, SCGI, and Apache: Background and Future

Ran across Mark Mayo's blog entry: FastCGI, SCGI, and Apache: Background and Future, which discusses exactly the things I've been struggling with this weekend. I have to agree that sticking an interpreter like Python directly into Apache is a lot of trouble. I've delved into Apache sourcecode, and the mass of macros and #ifdefs is enough to send you running away screaming. To try and graft Python onto that is just begging for trouble - and I've had some experience myself with grafting interpreters onto other things.

Running your webcode in separate processes just makes a lot of sense. You have much more freedom with choice of language and version of language. You can easily run things under different userids, chrooted, in jails/zones, on completely separate machines, completely separate OSes, maybe within virtual machines running different OSes on the same hardware.

Anyhow, thought I'd mention this because Mark's writeup made a lot of sense to me and I thought it was worth keeping a link to it.

Running a SCGI server under daemontools

Yesterday, I was working on Running PyBlosxom through SCGI, but during that time, I was running the SCGI server by hand in a console window. Once it was working I needed to arrange a way to run this in a more permanent fashion. Daemontools seems like an easy way to set this up, I already had it running on my server.

Daemontools runs a process called svscan that looks for directories in /var/service (the default when installed through the FreeBSD port) that contain an executable named run. If svscan also finds a log/run executable in that directory, it starts that too and ties the two together with a pipe. Daemontools includes a multilog program that reads from the pipe (stdin), and writes out and rotates log file for you automatically.

To get PyBlosxom/SCGI running under this, started by making a temporary directory, and copying in the three files needed to run PyBlosxom through SCGI

mkdir /tmp/pyblosxom
cd /tmp/pyblosxom

cp ~/config.py .
cp ~/wsgi_app.py .
cp ~/scgi_server.py .

(The first two files come from the PyBlosxom distribtution (the first one is customized). The third file is the one I came up with yesterday)

Next, I came up with a run script to execute the SCGI server under the www userid, with stderr tied to stdout. Daemontools has a setuidgid program that makes this pretty easy

#!/bin/sh
exec 2>&1
exec setuidgid www ./scgi_server.py

Next, made a log subdirectory, a log/main subdirectory to hold the actual log files (owned by www).

mkdir log
mkdir log/main
chown www:www log/main

And in the log directory put another tiny run script.

#!/bin/sh
exec setuidgid www multilog t ./main

Finally, made both run scripts executable, and moved the whole thing into /var/service

chmod +x run
chmod +x log/run
cd ..
mv pyblosxom /var/service

svscan sees the new directories within a few seconds, starts up both run scripts automatically, and you're in business. See the current contents of the log with:

cat /var/service/pyblosxom/log/main/current

Stop and restart the server with the Daemontools svc utility:

cd /var/service
svc -d pyblosxom ; svc -u pyblosxom

Running PyBlosxom through SCGI

Out of curiosity, ran the Apache Benchmark program ab on the plain CGI installation of PyBlosxom on my little server (-n 100 -c 10), and got around 1.5 requests/second. Decided to give SCGI a try, and got some better results.

Went about this based on what I had read in Deploying TurboGears with Lighttpd and SCGI. Tried Lighttpd at first, and it mostly worked, but I've got an Apache setup right now, so wanted to stick with that for the moment (and it seems a bit quicker anyhow). Basically started by loading flup with easy_install.

 
    easy_install flup

Copied the config.py and wsgi_app.py files from the PyBlosxom distribution into a directory, and added this little script into that same directory:

#!/usr/bin/env python
import sys
from flup.server.scgi_fork import WSGIServer
from wsgi_app import application

server = WSGIServer(application, 
                 scriptName='/blog', 
                 bindAddress=('127.0.0.1', 8040)
                )
ret = server.run()
sys.exit(ret and 42 or 0)

Installed mod_scgi built for Apache2 and added two lines to the config

LoadModule scgi_module libexec/apache2/mod_scgi.so

SCGIMount /blog 127.0.0.1:8040

Notice how the scriptName and bindAddress parameters in the Python code are matched in the SCGIMount Apache directive. With this setup, running the same ab benchmark yields about 10 to 15 requests/second - not too bad. Running the threaded SCGI server (remove the _fork from the first import line) wasn't as good, only 3 or 8 requests/second.

The setup seems a bit shaky in that the benchmark values seem to keep decreasing with every run, especially in the threaded mode. So there may be some problems in my setup or in flup/scgi/pyblosxom_wsgi.

Even if it was working fine, SCGI is probably overkill for running PyBlosxom when you're not expecting a lot of traffic. And if you were, you'd probably run it with --static to generate static pages. But it was a reasonable thing to fool with for the day when you want to run a more dynamic WSGI app.