Going to PyCon 2007

I've got my plane ticket, hotel reservation and conference registration for PyCon 2007 all lined up, so I'll be headed for Texas in 6 weeks.

mod_scgi redirection

While working on a new Django project, I noticed something odd about running it under mod_scgi: if you were POSTing to a URL, /foo for example, and the view for that URL did a relative redirect, as in django.http.HttpResponseRedirect('/bar'), the 302 redirect wasn't making it back to the browser. Instead, the browser was acting like the result of POST /foo was a 200 OK followed by the data you'd receive from GET /bar, without the browser knowing that it coming from a new location. The big drawback to this is that if you do a reload, the browser tries to POST to /foo again, instead of just GET /bar. The Django docs recommend always responding to POSTs with redirects, just for this reason.

Strictly speaking, redirects should be absolute URLs (see section 14.30 in the HTTP specs), and if you use one of those, it acts as expected. Django is full of relative redirects, but the framework at this time doesn't seem to try and convert them to absolute. There is ticket #987 in the Django Trac that talks about this a bit.

Browsers seem to handle relative redirects OK through, and that behavior doesn't occur with the Django test http server. Having mod_scgi conceal what Django is doing is not so good.

Digging into the mod_scgi sourcecode apache2/mod_scgi.c reveals a section of code that's causing this change:

location = apr_table_get(r->headers_out, "Location");

if (location && location[0] == '/' &&
    ((r->status == HTTP_OK) || ap_is_HTTP_REDIRECT(r->status))) {

    apr_brigade_destroy(bb);

    /* Internal redirect -- fake-up a pseudo-request */
    r->status = HTTP_OK;

    /* This redirect needs to be a GET no matter what the original
    * method was.
    */
    r->method = apr_pstrdup(r->pool, "GET");
    r->method_number = M_GET;

    ap_internal_redirect_handler(location, r);
    return OK;
}

Tossing that section of code causes mod_scgi to leave the relative redirects alone.

Full Text Searching with SQLite

I'd like to add a search feature back on this site. Previously, I had an arrangement setup with PyBlosxom and ht://Dig, but now that it's Django-powered, I'd like to do something that worked directly from the database instead of crawling the site like ht://Dig did.

After looking a while at various text search engines, I remembered seeing that SQLite just added an FTS1 module in version 3.3.8, which sounds pretty easy to use. Unfortunately the FreeBSD port databases/sqlite3 doesn't build with that feature.

After poking around a bit, I got it to build with FTS manually, and after a whole bunch more messing around, came up with a patch to add an option to the port to build sqlite3 with FTS. The patch has been submitted to the FreeBSD bug tracker as ports/106281. Hopefully I have all my ducks lined up on that.


Anyhow, in testing the FTS a bit, I found one thing they only hint at on the SQLite website. There's a Porter stemmer built in, even though the wiki says: "The module does not perform stemming of any sort." You activate it by adding tokenize porter to the table declaration, for example (adapted from their example).

CREATE VIRTUAL TABLE recipe USING FTS1(name, ingredients, tokenize porter);

once you've done that, and inserted some sample data:

INSERT INTO recipe VALUES('broccoli stew', 'broccoli peppers cheese tomatoes');

the searches don't have to be quite as exact, for example:

SELECT name FROM recipe WHERE ingredients MATCH 'pepper'

hits the 'broccoli stew' recipe even through it has 'peppers' and you searched for 'pepper'.

Not sure why the Porter stemmer isn't documented in the SQLite wiki, perhaps it's still a work in progress or being changed for FTS2.

Django, SCGI, and AJP

I've been doing a lot with Django lately, and initially set it up using mod_python as the Django docs recommend, but still have some reservations about that kind of arrangement. I'd like to go back to running it under SCGI or something similar.

Django has support builtin for FastCGI, but after trying to install mod_fastcgi in my Apache 2.0.x setup, decided it was a PITA. mod_scgi is quite easy to setup in Apache (even though the documentation is mostly nonexistent). After finding where Django implements its FastCGI support using the flup module, I saw that with just a few minor tweaks Django could be made to support all of flup's protocols, including SCGI and AJP (Apache Jserv Protocol).

AJP turns out to be very interesting because it's included standard with Apache 2.2 as mod_proxy_ajp, and can work with mod_proxy_balancer - meaning you could setup multiple Django instances and have Apache share the load between them.

After testing a bit, I submitted a patch, and will probably switch to running my Django sites as AJP servers managed by daemontools, and frontended by Apache 2.2

Expy Update

I just updated Exim on my home server to 4.63, and built it with my py-exim-localscan (AKA expy) module linked to Python 2.5

Only minor glitch was a C compile warning, that's probably due to better warnings in a newer version of GCC than what I had when the package was originally developed. I fixed it and bundled up a new release - mainly to assure that it's not abandonware.

Fibre Channel part 2

524 bytes per sector on a harddisk? That may work for some IBM RAID array, but FreeBSD isn't going to stand for it. After some searching, it appears the thing to do is to somehow change the sector size and do a low-level format.

Turns out Seagate has a very detailed manual on these drives, and it mentions that the sector size can be between 512 and 528 bytes. The sector size setting is apparently stored on a "SCSI mode page" 3, and the FreeBSD camcontrol utility has support for viewing/editing this. Running

camcontrol modepage da0 -m 3

shows:

Tracks per Zone:  7564 
Alternate Sectors per Zone:  0
Alternate Tracks per Zone:  20
Alternate Tracks per Logical Unit:  0
Sectors per Track:  784
Data Bytes per Physical Sector:  524
Interleave:  1
Track Skew Factor:  272 
Cylinder Skew Factor:  120
SSEC:  0
HSEC:  1
RMB:  0
SURF:  0

which is the current settings, and the same command with the addition of -P 2 to view the default settings shows:

Tracks per Zone:  7564
Alternate Sectors per Zone:  0
Alternate Tracks per Zone:  20
Alternate Tracks per Logical Unit:  0
Sectors per Track:  809
Data Bytes per Physical Sector:  512
Interleave:  1
Track Skew Factor:  272
Cylinder Skew Factor:  120
SSEC:  0
HSEC:  1
RMB:  0
SURF:  0

Those settings would be ideal. Notice how the smaller sector size gives more sectors per track. How to get the drive to switch to them? Adding a -e parameter to the camcontrol command as in: camcontrol modepage da0 -m 3 -e should allow for changing that modepage, but all I got back was: camcontrol: no editable entries. Running camcontrol modepage da0 -m 3 -P 1 to see the changeable values shows:

Tracks per Zone:  0
Alternate Sectors per Zone:  0
Alternate Tracks per Zone:  0
Alternate Tracks per Logical Unit:  0
Sectors per Track:  0
Data Bytes per Physical Sector:  0
Interleave:  0
Track Skew Factor:  0
Cylinder Skew Factor:  0
SSEC:  0
HSEC:  0
RMB:  0
SURF:  0

Dang, the IBM firmware has locked out making changes to the sector size. According to the Seagate manual:

The changeable values list can only be changed by downloading new firmware into the flash E-PROM.

Oh crap, where do you get new firmware, and how would you load it? Doesn't seem that Seagate has new firmware available for easy download (like you get with motherboards and such). Doing some digging with Google turned up some potential sources from deep within Sun and Grass Valley Group websites, but for all I know they're non-standard similar to how the IBM stuff is. This was not looking good.

Some more digging turned up this thread from a FreeBSD mailing list, which suggests setting a sector-size value and immediately reformatting the disk with:

camcontrol cmd da1 -v -c "15 10 0 0 v:i1 0" 12 -o 12 "0 0 0 8  0 0:i3 0 v:i3" 512
camcontrol cmd -n da -u 1 -v -t 7200 -c "4 0 0 0 0 0"

I wasn't sure if this was applicable to my particular FC drive, and would just changing the sector size also adjust the sectors-per-track or other possibly related settings? After doing a lot of reading on the camcontrol command, and various SCSI specs, I felt I had some understanding of what the above commands did, and figured it was worth a shot.

What do you know, it worked! There was a bit of trouble with the second camcontrol command in that it timed out too soon, but the sector size did change and it stayed changed. For my second drive, I tried the regular camcontrol format command instead of the raw one given in the example above.

camcontrol cmd da0 -v -c "15 10 0 0 v:i1 0" 12 -o 12 "0 0 0 8  0 0:i3 0 v:i3" 512
camcontrol format da0

That actually worked a bit better, because camcontrol format shows a nice progress display, instead of making you guess a timeout and waiting for while you hope it's working.

So now I've got a couple 10K rpm FC drives with FreeBSD installed and booting off them.

da0 at isp0 bus 0 target 120 lun 0
da0: <IBM-SSG S0BE146 3706> Fixed Direct Access SCSI-3 device 
da0: 100.000MB/s transfers, Tagged Queueing Enabled
da0: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C)
da1 at isp0 bus 0 target 124 lun 0
da1: <IBM-SSG S0BE146 3709> Fixed Direct Access SCSI-3 device 
da1: 100.000MB/s transfers, Tagged Queueing Enabled
da1: 140014MB (286749488 512 byte sectors: 255H 63S/T 17849C)

I noticed the smaller sector size gives more total space on the drive. They're fairly fast, but not life-changingly fast. It's been an interesting experience messing with them, but next time around I'll probably go back to SATA drives (or maybe SAS will be cheap on eBay by then....)

Fibre Channel

I'm going to be updating my home server soon, and I've often thought it would be nice to have some fast server-class harddisks for speed and reliability, maybe even arranged in a mirror because I've got a lot of stuff I wouldn't want to lose. A couple weeks ago I started looking into Fibre Channel gear that's available on eBay and was surprised to see how cheap some of this stuff was, with several 10k rpm and 15k rpm drives advertised as new going for 1/4 or less of the retail prices. I bit the bullet and bought a bunch of HBAs, cables, and drives for under $300.

The first HBA I got was a Qlogic QLA2000 for $1.99, which seems to be a stripped down version of the QLA2100 that has a 32-bit PCI interface instead of the 2100's 64-bit PCI-X setup. The machine I'm putting this into only has regular 32-bit PCI slots, so I'm not really loosing out on anything by using the cheaper card. Also got a QLA2200 which is a 64-bit card, but it works fine in a 32-bit PCI slot.

The drives are a pair of "new" 146.8GB Seagate Cheetah 10K.7 drives, part # ST3146707FC with an IBM label on them for $230 total. To buy the SCSI versions of those drives from NewEgg at today's price would cost $430 each, so I've saved over $600 by going this route. However, I entered one of the drive's serial numbers into Seagate's warranty webpage, and found that it's not eligible for warranty through them - you must go through the OEM they sold the drives to (IBM in this case). Did some poking around on IBM's site and it's not obvious if/how you'd get warranty service through them. I guess that's the price you pay buying this type of drive.

A 5-pack of HSSDC-DB9 cables was just $20, and the final piece was a "Start" T-Card directly from CK Computer Systems for $34. (the FAQ on that site was really helpful)

Basically just plugged it all together, and fired the machine up. The Qlogic card shows a BIOS boot message saying to hit ALT-Q to get into their setup. There's one part in their BIOS utility where it scans your loop for devices - it showed the card, and then 15 blank spots. I thought I was screwed at first, but after hitting page down several times I found the drive at id 120. Wow, FC can handle a lot of devices compared to SCSI.

Booted FreeBSD 6.1 (already installed on a regular ATA disk), and saw it detect the Qlogic card with the isp(4) driver, and a da0 drive. Once I saw that working, I tried adding ispfw_load="YES" to /boot/loader.conf. On the next reboot, it paused after it detected the isp card, presumably loading the firmware that comes with FreeBSD. The relevant dmesg parts are:

isp0: <Qlogic ISP 2100 PCI FC-AL Adapter> port 0xce00-0xceff mem 
      0xfe7df000-0xfe7dffff irq 17 at device 2.0 on pci2
isp0: [GIANT-LOCKED]
----
da0 at isp0 bus 0 target 120 lun 0
da0: <IBM-SSG S0BE146 3709> Fixed Direct Access SCSI-3 device
da0: 100.000MB/s transfers, Tagged Queueing Enabled
da0: 137501MB (275154368 524 byte sectors: 255H 63S/T 17127C)

The camcontrol utility seems to work with the HBA/drive combination no problem at all. However, when I tried to do a fdisk /dev/da0 it errored out with:

fdisk: can't read fdisk partition table
fdisk: /boot/mbr: length must be a multiple of sector size

Oops, didn't like the 524 byte sectors. I'll cover how I dealt with that in part 2.

USB GPS on MacOSX

Since I'm fooling around with USB GPS stuff today, also figured I'd stick the Holux in my MacBook (which I'm really loving). Found the gpsdX FAQ was a good starting point. Downloaded and installed the Prolific driver from the link in the FAQ, rebooted, and now see a /dev/tty.usbserial appeared.

Once the machine was back up, installed gpsdX which installed like most other Mac programs, ran the gpsdXConfig app to select the tty.usbserial device, and that was about it. Am now able to telnet localhost 2947 and type some simple commands like d to get the date from the GPS. KisMAC seems to work fine with it, and the gps2geX app fired up Google Earth, zoomed down and put an icon right on the roof of my house - pretty slick.

USB GPS on FreeBSD

A while ago I picked up a Holux GR-213U USB GPS receiver for pretty cheap on eBay. It's worked well in Windows, even on Windows within-a-Mac using Parallels. I thought I should give it a try using gpsd on FreeBSD, since I see nobody's reported it as working or not on their hardware page.

Stuck it into one of my FreeBSD 6.1 boxes, and saw in /var/log/messages:

ugen0: Prolific Technology Inc. USB-Serial Controller, rev 1.10/3.00, addr 2

That sounded pretty good, never messed with USB serial on FreeBSD before, so wasn't sure if the /dev/ugen0 device was what gpsd needed to talk to. Turns out it wasn't. After digging for a while, tried

kldload uplcom

and then unplugged/replugged the USB receiver - and now it shows up as

ucom0: Prolific Technology Inc. USB-Serial Controller, rev 1.10/3.00, addr 2

and a /dev/cuaU0 device showed up. I guess that makes sense in now that I see it working. The uplcom(4) module is required because the device is a Prolific chip, and that module also brings in the ucom(4) module automatically which provides the tty interface (/dev/cuaU*) gpsd needs to operate. Other USB serial devices might require a different modlue than "uplcom" - the SEE ALSO section of the ucom man page shows other possibilities.

Tried running gpsd in debug mode with

gpsd -N -n D 2 /dev/cuaU0

and was rewarded with lots of output from the receiver. Ran "cgps" and saw a human-friendly display of the GPS readings, but it kept flipping between 2D and 3D fix. Not sure what that's about yet, but at least the USB connection is working.

Django-powered

Haven't posted anything in a while, because I've been redoing this site in Django. Previously I had a photo-gallery written as a direct mod_python app, the software part was Zope 2.x, and this blog was in PyBlosxom.

mod_python is pretty bare-bones (as it should be), and I've been down on Zope for some time now. PyBlosxom was nice, but I've become quite a Django fan, and felt I could do much more with that framework. So I figured it would be good to do a kind of unification - and learn some more Django at the same time.

I'm using Markdown for editing the bodies of blog entries now, and found it was pretty easy to transfer the old PyBlosxom files into Django database records, with Markdown mostly able to handle the HTML I had entered for those old entries with just a few minor tweaks.

The Django URLs were planned so that Apache would be able to rewrite the old PyBlosxom URLs into the new format - so hopefully existing links will still work. URLs for the old feeds should be handled transparently, but I'm omitting the old entries from the feeds because their links had changed, and didn't want them to reappear as new entries for whoever's subscribed to them.