Daemontools mishap

While working on my DiskCompare.com website I had a weird problem with daemontools. The Django process I had running under daemontools became unresponsive, it wouldn't shutdown normally with svc -d, and after kill -9 and restarting, it still wouldn't respond - it just seemed hung up or frozen. No messages in /var/log/messages, but running the service manually outside of daemontools worked fine. What the heck?

I had the service also setup to run multilog, and it turns out that I had mistakenly changed the ownership of the log directory it was supposed to be writing to. My guess is that it was the multilog process that was really hung up, since it couldn't write to the files it wanted to, and that my Django processes were then blocked because the pipe between them and multilog was full.

Simply correcting the permissions of the log directory cleared the jam and things took off again. So if you're seeing strange behavior with daemontools, that's one area to check out.

DiskCompare.com

What's a good deal in hard disks? I don't mean which reseller sells a particular model for less than other resellers - but how does one model compare to another in terms of what you get for the money. If you're looking to buy a 400gb disk, does it make sense to spend a little more for a 500gb disk? or are you paying an excessive premium for that extra 100gb?

A good way to compare disks is to look at their value in terms of Gigabytes-per-dollar. Resellers usually don't display that information or make it easy to compare one disk against another, so I whipped together a website named DiskCompare.com that does that, based on prices from Newegg.

So for example, if you're interested in a Seagate 7200.10 500gb disk, you'd see a page like the one below. It compares the Seagate to other 500gb drives, and possibly to larger drives that are not too far off in price - and shows that some Western Digital models give you more for the money (at least at the moment I took this screenshot), although one is at the expense of a smaller cache.

screengrab of dicksompare.com

Other things it shows are relevant reviews, recent Newegg price changes, and price histories for each drive (some of those price changes are a bit shocking, today one SAS drive went up $390)

The site is still a work-in-progress, powered by Django, Python, PostgreSQL, and jQuery. It's been an interesting winter-time project, something to do when it's too dang cold to be outside. If you happen to follow the links on the site to Newegg and make a purchase, I'll get a small commission to help defray my bandwith costs, although I hope the site isn't too obnoxious about it.

Django comments

I've tried adding a comments feature to this site, using the stock django.contrib.comments application, mostly following the instructions from the Django Wiki, and also using James Bennet's comment-utils described here.

The Wiki is pretty thorough, but one thing I think it missed was that in the template for displaying comments, you should have a

<a name="c{{ comment.id }}" />

Near the beginning of where your comment starts being output, so the standard get_absolute_url() method for the FreeComment model has a place to point to.

A couple things that were not really talked about in the comment-utils docs were where to get the Python Akismet library, and the need for a comment_utils/comment_notification_email.txt template if you want e-mail notifications. I ended up with something simple like:

Comment: http://127.0.0.1{{ comment.get_absolute_url }}
From: {{ comment.person_name }}

-----
{{ comment.comment }}
-----

Admin: http://127.0.0.1/admin/comments/freecomment/{{comment.id}}/

I'm a bit skeptical about how well the Akismet spam-filtering service will work, but we'll see how it goes.

Apache in FreeBSD Jail Error

In one of my FreeBSD 6.2 jails running Apache, even though the server seemed to respond ok, I saw lots of these errors in the logfile:

[warn] (61)Connection refused: connect to listener on 0.0.0.0:443

Google searching found lots of other people asking about this, but I didn't really see any good answers. Others complained about the same thing on port 80

[warn] (61)Connection refused: connect to listener on 0.0.0.0:80

I think the problem is just that Apache in a jail can't listen to :443 or 0.0.0.0:443 (or :80 or 0.0.0.0:80). If your jail has the IP 1.2.3.4 for example, then in httpd.conf, changing

Listen 80

to

Listen 1.2.3.4:80

and/or in extra/httpd-ssl.conf

Listen 443

to

Listen 1.2.3.4:443

Seems to fix the problem

Split tunnel between FreeBSD boxes using OpenVPN

At work there are machines I'd like to access from home using Windows networking (Samba servers mostly), but the catch is that the work firewall is blocking NetBIOS traffic (an awfully good idea). My home network uses a FreeBSD box for NAT (Network Address Translation). Here's a diagram of what we're talking about.

Diagram of NetBIOS being blocked

In the picture above, my home NAT box has the public IP 1.2.3.4, the internal home network is 10.0.0.0/24, and I'm trying to reach a work server 5.6.7.100 on the 5.6.7.0/24 network.

Fortunately, I have a FreeBSD box at work, with an address 5.6.7.8, and it's fairly easy to setup a simple OpenVPN tunnel between 1.2.3.4 and 5.6.7.8, and route NetBIOS traffic over that.

OpenVPN will make it appear as if the two machines have a point-to-point network connection, when in reality the traffic is passing encrypted over the public internet. We need to pull a couple IP numbers out of our hat to use for the VPN endpoints - I'll use 192.168.88.1 for the home machine and 192.168.88.2 for the work machine.

Diagram of using OpenVPN

Settting up OpenVPN

On each box install the security/openvpn port. After that's done, on one machine, go to /usr/local/etc/openvpn and run:

openvpn --genkey --secret mykey

Copy the mykey file you just generated over to the other box's /usr/local/etc/openvpn directory. The two OpenVPN endpoints will used that shared key to authenticate each other.

On the 1.2.3.4 machine, create a /usr/local/etc/openvpn/openvpn.conf file containing:

remote 5.6.7.8
dev tun0
ifconfig 192.168.88.1 192.168.88.2
secret mykey

On the 5.6.7.8 machine, create a /usr/local/etc/openvpn/openvpn.conf file containing:

remote 1.2.3.4
dev tun0
ifconfig 192.168.88.2 192.168.88.1
secret mykey

(note that the ifconfig line swapped the IPs compared to the other machine's config)

Throw an openvpn_enable="YES" in each machine's /etc/rc.conf, and start the daemons: /usr/local/etc/rc.d/openvpn start

If necessary, allow OpenVPN traffic through your firewall, for the 1.2.3.4 box it might look something like:

pass in on $ext_if inet proto udp from 5.6.7.8 to $ext_if port 1194
pass on tun0

If this works, you should be able to sit at the 1.2.3.4 box and ping 192.168.88.2 and get a response. On the 5.6.7.8 box, running tcpdump -n -i tun0 should show the ICMP packets reaching the machine.

Routing specific traffic

I don't want to route all my traffic going to the 5.6.7.0/24 network through the VPN, I just want the NetBIOS stuff so I'll setup a split tunnel. PF makes it pretty easy to redirect network traffic through the VPN, in fact, I ended up doing a double-NAT, one on each end of the tunnel.

Diagram of double-natting

So when the home workstation contacts the Samba server, the Samba server sees the traffic as coming from the 5.6.7.8 box, and the 5.6.7.8 box saw the traffic as coming from the home FreeBSD NAT machine. So interestingly, neither of the work machines needs to have any clue about the home network. The PF state tables take care of reversing everything when the Samba server responds.

On the home 1.2.3.4 machine, these lines are added in the appropriate places to /etc/pf.conf:

int_if="eth1"
internal_net="10.0.0.0/24"
work_net="5.6.7.0/24"
.
.
nat on tun0    from $internal_net to any -> (tun0)
.
.
pass in on $int_if route-to tun0 proto tcp from any to $work_net port {139, 445} flags S/SA modulate state

That last line is the key to the whole thing, it's responsible for diverting the traffic we want to go through the VPN instead of over the public internet. If you want to secure additional protocols, just add similar lines.

The PF config on the work 5.6.7.8 machine is simpler, just

nat on $ext_if from 192.168.88.1 to any -> 5.6.7.8

To perform that 2nd NATting, making VPN traffic seem like it came from the work box.

Lastly, both machine need gateway_enable="YES" in /etc/rc.conf. A home NAT box probably already has that though.

There's a lot more that OpenVPN can do, we barely scratched the surface with the simple setup described above, check the docs for more info.

More DHCP Failover

Earlier I wrote about DHCP failover, but there's another thing I thought I might mention that could be useful to others....

I had a problem in that one of my servers' CMOS clocks tends to be a bit off, maybe 90 seconds. When dhcpd starts up, it is unable to enter a normal failover state because of the time difference between it and the other dhcpd server.

I have

ntpdate_enable="YES"
ntpdate_flags="-b x.x.x.x"

in my /etc/rc.conf, along with running openntpd, but for some reason ntpdate wasn't setting the clock at boot time, and by the time openntpd got the clock tuned up, dhcpd had given up on trying to re-establish failover. Restarting dhcpd by hand later on always worked OK.

I think what was happening was that the network jack this server was plugged into wasn't coming alive quick enough to be up and running when ntpdate tried to do its thing. Something to do with the Cisco switch not having portfast enabled.

I don't have access to do anything about the switches, so I came up with the workaround of adding a simple script /usr/local/etc/rc.d/000.afterboot.sh to schedule a job to run a few minutes after the machine boots - to adjust the clock and restart dhcpd. It looks something like:

#!/bin/sh
at now + 5 minutes <<EOF
/etc/rc.d/ntpdate restart
/usr/local/etc/rc.d/isc-dhcpd restart
EOF

It's a bit of a kludge, but seems to do the trick.

Migrating Django databases

I started out a few Django projects using SQLite as the DB backend a while back, and decided to upgrade them to PostgreSQL. Turns out using the manage.py dumpdata and loaddata, the switch can be fairly smooth. The steps turned out to be something like:

  • Create empty database in PostgreSQL

  • Copy the existing project's settings.py to a new_settings.py

  • Edit new_settings.py to replace the SQLite connection info with PgSQL info.

  • Run ./manage.py syncdb --settings=new_settings (note that there's not a '.py' at the end of that parameter) to create the PgSQL tables (and also test the connection info). Say 'no' when asked if you want to create a user.

  • Run ./manage.py dbshell --settings=new_settings to get into the psql shell, to clean out a few tables populated by syncdb

    • delete from auth_permission;
    • delete from django_content_type;
  • Dump the data from the old DB backend with: ./manage.py dumpdata >olddata.json

  • Load the data into the new DB backend with: ./manage.py loaddata --settings=new_settings olddata.json

  • Swap the settings files: mv settings.py old_settings.py; mv new_settings.py settings.py

  • Restart the server

Barring any complications, the entire switch can take just a few minutes. Of course, for anything you really care about you'd do some more testing before going live, but that's up to you.

PostgreSQL full text search with Django

PostgreSQL 8.3 is coming out soon with full text search integrated into the core database system. It's pretty well documented in chapter 12 of the PostgreSQL docs. The docs are a bit intimidating, but it turns out to be pretty easy to use with Django.

Let's say you're doing a stereotypical blog application, named 'blog', and have a model for entries such as:

from django.db import models

class Entry(models.Model):
    title = models.CharField(max_length=128)
    body = models.TextField()

after adding your app, and doing a syncdb, you should have a PostgreSQL table named blog_entry. You'll need to connect to the database with psql, or probably more conveniently with manage.py dbshell to execute a few SQL commands to setup full text searching.

PostgreSQL Setup

First, you'll want to add a column to hold a tsvector for the blog entry, which is a preprocessed version of text optimized for searching.

ALTER TABLE blog_entry ADD COLUMN body_tsv tsvector;

Next, you'll want to add a trigger to update the body_tsv column whenever a record is inserted or updated:

CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON blog_entry 
    FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(body_tsv, 'pg_catalog.english', body);

You'll probably also want an index on that field, to make searches more efficient:

CREATE INDEX blog_entry_tsv ON blog_entry USING gin(body_tsv);

Lastly, if you already have records in the blog_entry table, you'll want to update the body_tsv column for those records (any new records will be automatically taken care of by the trigger).

UPDATE blog_entry SET body_tsv=to_tsvector(body);

As a test, to search for the words 'hello world' for example, you could try:

SELECT title FROM blog_entry WHERE body_tsv @@ plainto_tsquery('hello world');

and get back a list of entries with those words in the body.

Full text searching in Django

Full text searching through the Django ORM is possible using the .extra() queryset modifier. To fetch the same entries with 'hello world' we searched for in raw SQL, you'd do something like:

q = 'hello world'
queryset = Entry.objects.extra(
    where=['body_tsv @@ plainto_tsquery(%s)'], 
    params=[q])
for entry in queryset:
    print entry.title

The full text features in PostgreSQL also allow for ranking the search results using the PostgreSQL ts_rank_cd() function, and generating fragments of documents with search terms highlighted using the ts_headline() function. An example of raw SQL for doing this might be:

SELECT title, ts_headline(body, query) AS snippet, ts_rank_cd(body_tsv, query, 32) AS rank
  FROM blog_entry, plainto_tsquery('hello world') AS query
  WHERE body_tsv @@ query
  ORDER BY rank DESC;

The Django ORM can generate an equivalent query with

q = 'hello world'
queryset = Entry.objects.extra(
    select={
        'snippet': "ts_headline(body, query)",
        'rank': "ts_rank_cd(body_tsv, query, 32)",
        },
    tables=["plainto_tsquery(%s) as query"],
    where=["body_tsv @@ query"],
    params=[q]
    ).order_by('-rank')

for entry in queryset:
    print entry.title, entry.snippet, entry.rank

Addendum, 2008-04-30

With the new Queryset Refactor (QS-RF) branch of Django, one of the few backwards-incompatible changes is that you can't use an order_by method with a field generated by an extra method. Instead, ordering by that field should be specified using an order_by parameter in the extra method, as in (the change is only in the last two lines)

Also, having a function call in the 'tables' parameter no longer seems to work, so as a workaround for now I've been repeating plainto_tsquery() 3 times:

q = 'hello world'
queryset = Entry.objects.extra(
    select={
        'snippet': "ts_headline(body, plainto_tsquery(%s))",
        'rank': "ts_rank_cd(body_tsv, plainto_tsquery(%s), 32)",
        },
    where=["body_tsv @@ plainto_tsquery(%s)"],
    params=[q],
    select_params=[q, q],
    order_by=('-rank',)
    )

There is a lot more that can be done with the PostgreSQL full text search feature, check the docs for more info. But hopefully this is enough to get started with.

Addendum, 2009-01-28

Ross Poulton has a writup on Full-text searching in Django with PostgreSQL and tsearch2 - using PostgreSQL 8.1 (on a Debian system) where tsearch2 was a contibuted addon module.

DHCP Failover

I've been setting up DHCP servers at work to use the failover feature available in ISC-DHCP (the net/isc-dhcp3-server port in FreeBSD). That allows for two servers to work together, sharing a pool of addresses and keeping track of leases handed out by both servers. The dhcpd.conf(5) manpage discusses this feature somewhat. I'll jot down some notes here that are a bit more specific about what all had to be done.

Let's assume the two DHCP servers will have the IP addresses 10.0.0.10 and 10.0.0.20 - with .10 being the 'primary' and .20 being the 'secondary' server. It doesn't really matter which is which - although the logs on the 'primary' server seem a bit more complete. We'll also assume these servers are giving out addresses from a pool of numbers 10.0.0.100-10.0.0.200, and are running DNS caches - so that the DHCP clients should be told to use them for DNS servers.

We'll also use TCP port 520 for communications between the DHCP servers, so be sure to allow for that through any firewalls.

Configuration

On the 10.0.0.10 'primary' machine, the /usr/local/etc/dhcpd.conf file might look like:

failover peer "foo" 
    {
    primary;
    mclt 1800;  # only specified in the primary
    split 128;  # only specified in the primary

    address 10.0.0.10;
    port 520;

    peer address 10.0.0.20;
    peer port 520;

    max-response-delay 30;
    max-unacked-updates 10;
    load balance max seconds 3;                
    }

option domain-name-servers 10.0.0.10, 10.0.0.20;

include "/usr/local/etc/dhcp/master.conf";

and the same file on the 'secondary' 10.0.0.20 machine is very similar:

failover peer "foo" 
    {
    secondary;

    address 10.0.0.20;
    port 520;

    peer address 10.0.0.10;
    peer port 520;

    max-response-delay 30;
    max-unacked-updates 10;
    load balance max seconds 3;                
    }

option domain-name-servers 10.0.0.20, 10.0.0.10;

include "/usr/local/etc/dhcp/master.conf";

The failover peer name, "foo" in this example, will also appear in the DHCP pool configuration, and will be used in a script change the failover state later on.

I created a directory /usr/local/etc/dhcp/ to hold the DHCP config files that will be common to both DHCP servers. That way, it's just a matter of copying the entire directory between servers when a change is made. The /usr/local/etc/dhcp/master.conf file I included from the main server config might look something like:

omapi-port 7911;

default-lease-time 16200;  # 4.5 hours
max-lease-time 16200;

subnet 10.0.0.0 netmask 255.255.255.0
        {
        option routers 10.0.0.1;

        pool
                {
                failover peer "foo";
                deny dynamic bootp clients;

                range 10.0.0.100  10.0.0.200;
                }
        }

The deny dynamic bootp clients;directive is required for any failover pool. The omapi-port 7911; directive will be useful later on for when a server needs to be put into the 'partner-down' state because the other server will be off for a while.

To sync and restart the two servers whenever there's a change to the DHCP configuration, I setup the 10.0.0.20 server to allow root logins through SSH from the root account of 10.0.0.10 using public/private keys, and then put a script named restart_dhcp on the 10.0.0.10 server that looks like:

#!/bin/sh
/usr/local/etc/rc.d/isc-dhcpd restart
scp -pr /usr/local/etc/dhcp 10.0.0.20:/usr/local/etc
ssh 10.0.0.20 /usr/local/etc/rc.d/isc-dhcpd restart

That copies the entire /usr/local/etc/dhcp directory, so if you need to break up your config into more files that get included, they'll all be copied over when you do a restart.

Failover

When one server stops unexpectedly, the remaining server will go into a communications-interrupted state, and continue offering up addresses from its half of the DHCP pools, and will renew leases it knows were given out by the other server.

If the downed server will be out for longer than the mclt value from the server config (1800 seconds (30 minutes) in the examples above). You may want to let the surviving server know that it's on its own so that it can use the entire pool of available addresses. This is done by putting the surviving server into partner-down state.

This has to be done after the other server is really down. Doing it before shutting down the other server doesn't work, because the two servers will get themselves back into a normal state very quickly, probably before you get a chance to shut the 2nd server down.

The omshell program can be used to communicate with a running DHCP server daemon to control it in various ways, including changing the failover state. I put this partner-down script on both the primary and secondary servers:

#!/bin/sh
omshell << EOF
connect
new failover-state
set name = "foo"
open
set local-state = 1
update
EOF

so when one server is going to be down for a while, I can connect to the other server and just run that script.

When the downed server comes back up, the two servers automatically start communicating and eventually get themselves back into a normal state. But only after the recovering server has spent mclt time in recover-wait state, where it renews existing leases but won't offer up new ones. So you probably wouldn't want to go into a partner-down state if the other server will be down for less than that amount of time.

Running the partner-down script when both servers are really up and running doesn't seem to do any harm, as mentioned above the two servers will quickly move back into a normal state. This can be seen by watching the DHCP logs.

Clean Failover

It's possible using OMAPI to shut down a server and have the remaining server automatically switch to "partner-down" mode in a clean way, so that when the downed server comes back up both servers quickly move to "normal" mode, without spending the mclt time in recover-wait state. This script does the trick:

#!/bin/sh
omshell << EOF
connect
new control
open
set state = 2
update
EOF

When run, it causes the dhcpd daemon on the current server to shutdown, and the dhcpd daemon on the other server takes over completely the DHCP pools.


Update: I wrote a bit more about DHCP failover, talking about how to deal with a clock sync problem when the machine boots by scheduling a dhcpd restart a few minutes after boot time.

amqplib 0.2

I noticed the other day that my two RabbitMQ servers were consuming more and more memory - one had gone from an initial 22mb size to over 600mb. As I sat and watched it would grow by 4k or so at regular intervals.

I think what had happened is that I had created an exchange which received lots of messages, and then ran scripts that created automatically-named queues bound to that exchange, but defaulted to not auto-deleting them. I ran these scripts many many times, which left many many queues on the server, all swelling up with lots of messages that would never be consumed. Good thing I caught it, it might have eventually killed my server.

This message in the rabbitmq-discuss list gives useful info on how to get in and see what queues exist on a RabbitMQ server, and how big they are.

It seems to me that having the auto_delete parameter of Channel.queue_declare() default to False is a really bad idea. If you want to keep a queue around after your program exits, I think you should explicitly say so, so I changed the default to True. The Channel.exchange_declare() also has a auto_delete parameter, which I also change the default to True for consistency.

I also did some work on supporting the redirect feature of AMQP, where a server you connect to can tell you to go somewhere else, useful for balancing a cluster. I don't actually have a RabbitMQ cluster, so I put together a utility to fake an AMQP server that tells you to redirect. It works well enough to run the uniitests unchanged against it, each test case being redirected from the fake server to the real server.

With those two changes, I put out a 0.2 release, on my software page and on the Cheeseshop.