Fun with ones and zeros - scgiBarry's notes on computer software and hardware2024-03-29T08:32:13-07:00urn:uuid:70af3f26-3b4f-288d-2871-c093ebffd639Running Python WSGI apps with SCGI and inetdurn:uuid:17ba402d-28d0-1844-298e-26ae6796046b2011-09-18T06:00:00-07:00Barry Pedersonbp@barryp.org<body><p><em>Using <a href="https://github.com/barryp/scgi-inetd-wsgi">scgi-inetd-wsgi</a></em></p>
<p>Previously, I wrote about running <a href="/blog/entries/cgi-scripts-nginx-using-scgi/">CGI Scripts with Nginx using SCGI</a>
with the help of a super-server such as <code>inetd</code> and a small C shim that
takes a SCGI request from stdin and sets up a CGI enviroment.</p>
<p>There's also a companion project <a href="https://github.com/barryp/scgi-inetd-wsgi">on GitHub</a> for doing something
similar with Python WSGI apps. The code works on Python 2.6 or higher
(including Python 3.x). <em>It can easily be patched for Python 2.5 or lower
by with a simple string substitition mentioned in the source file</em></p>
<p>It's not something you'd want to run a frequently-accessed app with,
because there'd be quite a bit of overhead launching a Python process to
handle each request. It may be useful however for infrequently used
apps where you don't want to have to keep and monitor a long-running
process, or for development of a WSGI app where you don't want to have
to stop/start a process everytime you make a change.</p>
<p>Let's take a look at a diagram to see what the flow will be:</p>
<p><img src="/assets/10/23/scgi_wsgi.png" alt="SCGI<->WSGI"></p>
<ol>
<li>Nginx opens a socket listened to by inetd</li>
<li>inetd spawns a Python script with stdin and stdout connected to the
accepted connection</li>
<li>The Python script would import <code>inetd_scgi</code> and call its <code>run_app</code> function
passing a WSGI app to actually handle the request. <code>run_app</code> will read
the SCGI request from stdin, setup a WSGI enviroment, call the handler, and
send the handler's response back to Nginx via stdout.</li>
</ol>
<p>Here's how you'd wire up the <code>Hello World</code> app from <a href="http://www.python.org/dev/peps/pep-3333/">PEP 3333</a></p>
<div class="source"><pre><span></span><span class="ch">#!/usr/bin/env python</span>
<span class="n">HELLO_WORLD</span> <span class="o">=</span> <span class="sa">b</span><span class="s2">"Hello world!</span><span class="se">\n</span><span class="s2">"</span>
<span class="k">def</span> <span class="nf">simple_app</span><span class="p">(</span><span class="n">environ</span><span class="p">,</span> <span class="n">start_response</span><span class="p">):</span>
<span class="sd">"""Simplest possible application object"""</span>
<span class="n">status</span> <span class="o">=</span> <span class="s1">'200 OK'</span>
<span class="n">response_headers</span> <span class="o">=</span> <span class="p">[(</span><span class="s1">'Content-type'</span><span class="p">,</span> <span class="s1">'text/plain'</span><span class="p">)]</span>
<span class="n">start_response</span><span class="p">(</span><span class="n">status</span><span class="p">,</span> <span class="n">response_headers</span><span class="p">)</span>
<span class="k">return</span> <span class="p">[</span><span class="n">HELLO_WORLD</span><span class="p">]</span>
<span class="k">if</span> <span class="vm">__name__</span> <span class="o">==</span> <span class="s1">'__main__'</span><span class="p">:</span>
<span class="kn">import</span> <span class="nn">inetd_scgi</span>
<span class="n">inetd_scgi</span><span class="o">.</span><span class="n">run_app</span><span class="p">(</span><span class="n">simple_app</span><span class="p">)</span>
</pre></div>
<p>If you had saved that script as say <code>/local/test.py</code>, you might add this to
<code>/etc/inetd.conf</code> to serve it up:</p>
<pre><code>:www:www:200:/var/run/test.sock stream unix nowait/4 www /local/test.py /local/test.py</code></pre>
<p>and in Nginx with:</p>
<pre><code>location /test {
scgi_pass unix:/var/run/test.sock;
include /usr/local/etc/nginx/scgi_params;
fastcgi_split_path_info ^(/test)(.*);
scgi_param SCRIPT_NAME $fastcgi_script_name;
scgi_param PATH_INFO $fastcgi_path_info;
} </code></pre>
<p>Then, accessing <a href="http://localhost/test">http://localhost/test</a> should show 'Hello world!'</p></body>AWStats under Nginx and SCGIurn:uuid:17c6cd77-c49f-74d7-5b49-94439fda56152011-09-14T06:00:00-07:00Barry Pedersonbp@barryp.org<body><p>Earlier, I wrote about running <a href="/blog/entries/cgi-scripts-nginx-using-scgi">CGI Scripts with Nginx using SCGI</a>
with the help of a small C shim. One particular CGI app I've had to
alter slightly to work under this setup is <a href="https://awstats.sourceforge.io/">AWStats</a>, which
is a decent-sized Perl app, but only requires one line added to satisfy
SCGI's requirement of a <code>Status</code> line at the beginning of a response.</p>
<p>Here's a patch to AWStats 7.0</p>
<div class="source"><pre><span></span><span class="gd">--- awstats.pl.original 2011-09-11 21:20:40.954555528 -0500</span>
<span class="gi">+++ awstats.pl 2011-03-31 00:19:35.867343845 -0500</span>
<span class="gu">@@ -750,6 +750,7 @@</span>
#------------------------------------------------------------------------------
sub http_head {
if ( !$HeaderHTTPSent ) {
<span class="gi">+ print "Status: 200 OK\n";</span>
my $newpagecode = $PageCode ? $PageCode : "utf-8";
if ( $BuildReportFormat eq 'xhtml' || $BuildReportFormat eq 'xml' ) {
print( $ENV{'HTTP_USER_AGENT'} =~ /MSIE|Googlebot/i
</pre></div></body>CGI Scripts with Nginx using SCGIurn:uuid:8ea0980f-ef5c-41bb-a921-9f5db644ede32011-09-11T15:37:00-07:00Barry Pedersonbp@barryp.org<body><p><em>Using <a href="https://github.com/barryp/scgi-inetd-cgi">scgi_run</a> with Nginx</em></p>
<p><a href="https://www.nginx.com/resources/wiki/">Nginx</a> is a great web server, but one thing it doesn't support
is <a href="https://en.wikipedia.org/wiki/Common_Gateway_Interface">CGI scripts</a>. Not all webapps need to be high-performance
setups capable of hundreds or thousands of requests per second. Sometimes
you just want something capable of handling a few requests now and then,
and don't want to keep a long-running process going all the time just
for that one webapp. How do you handle something like that under Nginx?</p>
<p>Well, it turns out you're going to have to have <em>something</em> running as a
long-running external process to help Nginx out (because Nginx can't
spawn processes itself). It just doesn't have to be dedicated to any
one particular webapp. One way to go would be to setup <em>another</em>
webserver that <em>can</em> do CGI scripts, and have Nginx proxy to that when
need be. </p>
<p>Apache is one possibility, something like this:</p>
<p><img src="/assets/9/21/nginx_apache.png" alt="Nginx <-> Apache"></p>
<p>But Apache's a fairly big program, has lots of features, a potentially
complicated configuration. Kind of defeats the purpose of going to a
lighter-weight program like Nginx. What else can we do?</p>
<h2>Super-servers</h2>
<p>Many Unix-type systems will have a <em><a href="https://en.wikipedia.org/wiki/Super-server">super-server</a></em>
available to launch daemons as need be when some network connection is
made. On BSD boxes it's typically <code>inetd</code>, MacOSX has <code>launchd</code>, Linux
distros often have <code>xinetd</code> or other choices available. </p>
<p>If we already have a super-server running on our box, why not setup
Nginx to connect to that, and let the super-server take care of launching
our CGI script? We just need one extra piece of the puzzle, something to
read a web request over the socket Nginx opened up, setup the CGI
environment, and execute the script. </p>
<p>Wait, that sounds like a web server - aren't we back to something like
Apache again? No, it doesn't have to be anything nearly that
complicated if we were to use the <a href="https://en.wikipedia.org/wiki/Simple_Common_Gateway_Interface">SCGI</a> protocol, instead of HTTP.</p>
<h2>SCGI</h2>
<p>SCGI is a very simple protocol that's <a href="http://nginx.org/en/docs/http/ngx_http_scgi_module.html">supported by Nginx</a>
and many other webservers. It's much much simpler than FastCGI, and
maps pretty closely to the CGI specfication, with one minor difference
to note...</p>
<p>In the <a href="https://www.ietf.org/rfc/rfc3875.txt">CGI RFC</a>, the response may contain an optional Status line,
as in:</p>
<pre><code>Status: 200 OK</code></pre>
<p>In the SCGI protocol, the <code>Status</code> line is required, not optional. </p>
<p><em>Nginx will function with the <code>Status</code> line missing, but there'll be
warnings in your error log.</em></p>
<p>If you can alter your CGI scripts to include a <code>Status</code> line, or live with
warnings in logs, we have a way forward now.</p>
<h2>scgi_run</h2>
<p>I've got a C <a href="https://github.com/barryp/scgi-inetd-cgi">project on GitHub</a> that implements this small piece of
glue to turn a SCGI request into a CGI enviroment. The binary weighs in at
around 8 to 12 Kilobytes after being stripped.</p>
<p>Basically, we're looking at a flow like this:</p>
<p><img src="/assets/9/22/nginx_scgi.png" alt="Nginx <-> SCGI"></p>
<ol>
<li>Nginx connects to a socket listened to by inetd</li>
<li>inetd spawns <code>scgi_run</code>, with stdin and stdout wired to the accepted
connection</li>
<li><code>scgi_run</code> reads SCGI request headers from stdin and sets up a CGI environment</li>
<li><code>scgi_run</code> execs CGI script (stdin and stdout are still connected to the
socket to Nginx)</li>
<li>CGI script reads request body if necessary from stdin and writes
response out through stdout.</li>
</ol>
<p>A couple things to note here </p>
<ul>
<li>when we get to the final step, the CGI script is talking directly to
Nginx - there's no buffering by any other applications like there would be
in an Apache setup. </li>
<li>scgi_run is no longer executing, it execed the CGI script so there's
not another process hanging around waiting on anything.</li>
<li>A super-server like inetd can typically be configured to run the handler
under any userid you want, so you basically get SUEXEC-type functionality for
free here.</li>
</ul>
<p>The <code>scgi_run</code> code on GitHub operates in two modes:</p>
<ol>
<li>If argv[1] ends with a slash <code>/</code>, then argv[1] is taken to be a directory
name, and the program will look for the <code>SCRIPT_FILENAME</code> passed by Nginx
in that directory.</li>
<li>Otherwise, argv[1] is taken as the path to a specific CGI script
(so <code>SCRIPT_FILENAME</code> is ignored), and any additional arguments are passed
on to the CGI script.</li>
</ol>
<h2>Configuration</h2>
<p>A simple setup looks something like this, assuming you've compiled <code>scgi_run</code> and have
the binary stored as <code>/local/scgi_run</code> </p>
<p>For FreeBSD <code>inetd</code> for example, you might add a line to <code>/etc/inetd.conf</code> like this:</p>
<pre><code>:www:www:600:/var/run/scgi_localcgi.sock stream unix nowait/16 www /local/scgi_run /local/scgi_run /local/cgi-bin/</code></pre>
<p>Which causes <code>inetd</code> to listen to a Unix socket named
<code>/var/run/scgi_localcgi.sock</code>, and when a connection is made, it spawns
<code>/local/scgi_run</code> with argv[0] set to <code>/local/scgi_run</code> and argv[1] set
to <code>/local/cgi-bin/</code>. As a bonus, the socket ownership is set to www:www
and chmoded to 0600, which limits who can connect to it.</p>
<p>In Nginx, you might have something like:</p>
<pre><code>location /local-cgi/ {
alias /local/cgi-bin/;
scgi_pass unix:/var/run/scgi_localcgi.sock;
include /usr/local/etc/nginx/scgi_params;
scgi_param SCRIPT_NAME $fastcgi_script_name;
scgi_param PATH_INFO $fastcgi_path_info;
scgi_param SCRIPT_FILENAME $request_filename;
}</code></pre>
<p>And then for a simple script, you might have <code>/local/cgi-bin/hello.sh</code> as</p>
<div class="source"><pre><span></span><span class="ch">#!/bin/sh</span>
<span class="nb">echo</span> <span class="s2">"Status: 200 OK"</span>
<span class="nb">echo</span> <span class="s2">"Content-Type: text/plain"</span>
<span class="nb">echo</span> <span class="s2">""</span>
<span class="nb">echo</span> <span class="s2">"Hello World"</span>
</pre></div>
<p>That you would run by hitting <code>http://localhost/local-cgi/hello.sh</code></p>
<h2>Conclusion</h2>
<p>So, with the help of a tiny 8KB binary, Nginx (or any other SCGI client)
with the help of a super-server like <code>inetd</code> can execute CGI scripts
(keeping in mind though the requirement for the <code>Status</code> line). It's a
fairly lightweight solution that may also be useful in embedded situations.</p>
<p>Enjoy, and <a href="/">go buy some harddrives</a> to store your CGI scripts on, I hear
SSDs are very nice. :)</p></body>mod_scgi redirectionurn:uuid:0b5e2ff0-4da0-dc79-1223-ac5eb77df3932007-01-11T10:39:38-08:00Barry Pedersonbp@barryp.org
<p>While working on a new Django project, I noticed something odd about running it under mod_scgi: if you were POSTing to a URL, <code>/foo</code> for example, and the view for that URL did a relative redirect, as in <code>django.http.HttpResponseRedirect('/bar')</code>, the 302 redirect wasn't making it back to the browser. Instead, the browser was acting like the result of <code>POST /foo</code> was a <code>200 OK</code> followed by the data you'd receive from <code>GET /bar</code>, without the browser knowing that it coming from a new location. The big drawback to this is that if you do a reload, the browser tries to POST to <code>/foo</code> again, instead of just <code>GET /bar</code>. The Django docs recommend always responding to POSTs with redirects, just for this reason.
</p>
<p>Strictly speaking, redirects <em>should</em> be absolute URLs (see section <a href="http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html">14.30</a> in the HTTP specs), and if you use one of those, it acts as expected. Django is full of relative redirects, but the framework at this time doesn't seem to try and convert them to absolute. There is ticket <a href="http://code.djangoproject.com/ticket/987">#987</a> in the Django Trac that talks about this a bit. <br />
</p>
<p>Browsers seem to handle relative redirects OK through, and that behavior doesn't occur with the Django test http server. Having mod_scgi conceal what Django is doing is not so good.
</p>
<p>Digging into the mod_scgi sourcecode <code>apache2/mod_scgi.c</code> reveals a section of code that's causing this change:
</p>
<div class="source"><pre><span class="n">location</span> <span class="o">=</span> <span class="n">apr_table_get</span><span class="p">(</span><span class="n">r</span><span class="o">-></span><span class="n">headers_out</span><span class="p">,</span> <span class="s">"Location"</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">location</span> <span class="o">&&</span> <span class="n">location</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">==</span> <span class="sc">'/'</span> <span class="o">&&</span>
<span class="p">((</span><span class="n">r</span><span class="o">-></span><span class="n">status</span> <span class="o">==</span> <span class="n">HTTP_OK</span><span class="p">)</span> <span class="o">||</span> <span class="n">ap_is_HTTP_REDIRECT</span><span class="p">(</span><span class="n">r</span><span class="o">-></span><span class="n">status</span><span class="p">)))</span> <span class="p">{</span>
<span class="n">apr_brigade_destroy</span><span class="p">(</span><span class="n">bb</span><span class="p">);</span>
<span class="c">/* Internal redirect -- fake-up a pseudo-request */</span>
<span class="n">r</span><span class="o">-></span><span class="n">status</span> <span class="o">=</span> <span class="n">HTTP_OK</span><span class="p">;</span>
<span class="c">/* This redirect needs to be a GET no matter what the original</span>
<span class="c"> * method was.</span>
<span class="c"> */</span>
<span class="n">r</span><span class="o">-></span><span class="n">method</span> <span class="o">=</span> <span class="n">apr_pstrdup</span><span class="p">(</span><span class="n">r</span><span class="o">-></span><span class="n">pool</span><span class="p">,</span> <span class="s">"GET"</span><span class="p">);</span>
<span class="n">r</span><span class="o">-></span><span class="n">method_number</span> <span class="o">=</span> <span class="n">M_GET</span><span class="p">;</span>
<span class="n">ap_internal_redirect_handler</span><span class="p">(</span><span class="n">location</span><span class="p">,</span> <span class="n">r</span><span class="p">);</span>
<span class="k">return</span> <span class="n">OK</span><span class="p">;</span>
<span class="p">}</span>
</pre></div>
<p>Tossing that section of code causes mod_scgi to leave the relative redirects alone.
</p>
Django, SCGI, and AJPurn:uuid:e96047c1-a7d9-4f2a-c585-2270fe3d378d2006-11-20T18:33:44-08:00Barry Pedersonbp@barryp.org
<p>I've been doing a lot with Django lately, and initially set it up using mod_python as the Django docs recommend, but still have some <a href="/blog/entries/backend_protocols/">reservations</a> about that kind of arrangement. I'd like to go back to running it under SCGI or something similar. <br />
</p>
<p>Django has support builtin for FastCGI, but after trying to install mod_fastcgi in my Apache 2.0.x setup, decided it was a PITA. mod_scgi is quite easy to setup in Apache (even though the documentation is mostly nonexistent). After finding where Django implements its FastCGI support using the <a href="http://www.saddi.com/software/flup/">flup</a> module, I saw that with just a few minor tweaks Django could be made to support all of flup's protocols, including SCGI and AJP (Apache Jserv Protocol).
</p>
<p>AJP turns out to be very interesting because it's included standard with Apache 2.2 as <a href="http://httpd.apache.org/docs/2.2/mod/mod_proxy_ajp.html">mod_proxy_ajp</a>, and can work with <a href="http://httpd.apache.org/docs/2.2/mod/mod_proxy_balancer.html">mod_proxy_balancer</a> - meaning you could setup multiple Django instances and have Apache share the load between them.
</p>
<p>After testing a bit, I <a href="http://code.djangoproject.com/ticket/3047">submitted a patch</a>, and will probably switch to running my Django sites as AJP servers managed by daemontools, and frontended by Apache 2.2
</p>