mod_python segfault fixed

Just as a followup, it seems the segfault in mod_python on FreeBSD I mentioned before was found and fixed. Turns out to not be any kind of pointer/memory corruption like I thought, but rather a mishandled return code from an APR (Apache Portable Runtime) function. Oh well, I got to play with gdb, ddd, and valgrind a bit, which is good stuff to be familiar with.

Debugging mod_python with Valgrind

Other people have reported the same problem with mod_python on FreeBSD I had seen before, so I'm happy that I'm not losing my mind.

I took a stab at using Valgrind to find the problem. Didn't actually find anything, but I thought I'd jot down notes on how I went about this.

First, the Valgrind port didn't seem to work on FreeBSD 6.0. When I tried running it against the sample code in the Valgrind Quick Start guide, it didn't find anything wrong with it. Ended up finding a FreeBSD 5.4 machine, which did see the expected problem.

Next, I built the Apache 2.0.x port with: make WITH_THREADS=1 WITH_DEBUG=1, and then built mod_python which uses APXS and picks up the debug compile option from that.

Then, in the mod_python distribution, went into the test directory, and downloaded a Valgrind suppression file for Python, valgrind-python.supp, and in it uncommented the suppressions for PyObject_Free and PyObject_Realloc (otherwise the Valgrind output is full of stuff that is really OK). Then tweaked test/test.py around line 307 where it starts Apache, to insert

valgrind --tool=memcheck --logfile=/tmp/valgrind_httpd --suppressions=valgrind-python.supp

At the front of the cmd variable that's being composed to execute httpd.

Finally, ran python test.py, and then looked at /tmp/valgrind_httpd.pid#### to see the results.

mod_python segfault on FreeBSD

I've been testing mod_python 3.2.x betas as requested by the developers on their mailing list. Unfortunately there seems to be some subtle memory-related but that only occurs on FreeBSD (or at least FreeBSD the way I normally install it along with Apache and Python).

Made some mention of it here and an almost identical problem is reported for MacOSX, even down to the value 0x58 being at the top of the backtrace.

Did a lot of poking around the core with gdb and browsing of the mod_python and Apache sourcecode, but never quite saw where the problem could be. Took another approach and started stripping down the big mod_python testsuite, and found that the test that was failing ran fine by itself, but when it ran after another test for handling large file uploads - then it would crash.

So I suspect there's a problem in a whole different area of mod_python, that's screwing something up in memory that doesn't trigger a segfault til later during the connectionhandler test. My latest post to the list covers some of that.