Markdown and Pygments

This blog is mainly being written as Markdown text stored in a database, and I thought it would be nice to add the ability to use Pygments to add syntax highlighting to various bits of code within the entries.

There are some DjangoSnippets entries on how to do this, notably #360 which first runs text through Markdown to generate HTML and then BeautifulSoup to extract parts marked up in the original pre-Markdown text as <pre class="foo">...</pre> to be run through Pygments and then re-inserted back into the overall Markdown-generated HTML.

The problem with this is that the text within <pre>...</pre> needs to valid HTML with things like: e_mail='<foo@bar.edu>' escaped as e_mail='&lt;foo@bar.edu>', otherwise BeautifulSoup thinks in that example that you have a screwed up <foo> tag and tries to fix that up.

Making sure all the <, &, and other characters special to HTML are escaped within a large chunk of code misses out on the convenience of using Markdown. I decided to go with an arrangement in which regular Markdown code blocks are used, but if the first line begins with pygments:<lexer>, then that block is pygmentized.

So if I enter something like:

Here is some code

    pygments:python
    if a < b:
        print a

It ends up as:


Here is some code

if a < b:
    print a

What I came up with is this derivative of Snippet #360

from htmlentitydefs import name2codepoint
from HTMLParser import HTMLParser
from markdown import markdown
from BeautifulSoup import BeautifulSoup
from pygments.lexers import LEXERS, get_lexer_by_name
from pygments import highlight
from pygments.formatters import HtmlFormatter

# a tuple of known lexer names
_lexer_names = reduce(lambda a,b: a + b[2], LEXERS.itervalues(), ())

# default formatter
_formatter = HtmlFormatter(cssclass='source')    

class _MyParser(HTMLParser):
    def __init__(self):
        HTMLParser.__init__(self)
        self.text = []
    def handle_data(self, data):
        self.text.append(data)
    def handle_entityref(self, name):
        self.text.append(unichr(name2codepoint[name]))

def _replace_html_entities(s):
    """
    Replace HTML entities in a string
    with their unicode equivalents.  For
    example, '&amp;' is replaced with just '&'

    """
    mp = _MyParser()
    mp.feed(s)
    mp.close()
    return u''.join(mp.text)  

def markdown_pygment(txt):
    """
    Convert Markdown text to Pygmentized HTML

    """
    html = markdown(txt)
    soup = BeautifulSoup(html)
    dirty = False
    for tag in soup.findAll('pre'):
        if tag.code:
            txt = tag.code.renderContents()
            if txt.startswith('pygments:'):
                lexer_name, txt = txt.split('\n', 1)
                lexer_name = lexer_name.split(':')[1]
                txt = _replace_html_entities(txt)
                if lexer_name in _lexer_names:
                    lexer = get_lexer_by_name(lexer_name, stripnl=True, encoding='UTF-8')
                    tag.replaceWith(highlight(txt, lexer, _formatter))
                    dirty = True
    if dirty:
        html = unicode(soup)

    return html

Stackless Python and Sockets

I've been intrigued by Stackless Python for a while, and finally got around to installing it one one of my machines. FreeBSD doesn't have a port available, so after creating an ezjail to isolate the installation, it was just a matter of fetching and extracting stackless-251-export.tar.bz2 and doing a standard ./configure && make && make install

The installation looks pretty much like a normal Python installation on FreeBSD, with a /usr/local/bin/python binary and libraries in /usr/local/lib/python2.5

Networking is something I especially wanted to check out with Stackless, and the examples on the Stackless website mostly make use of a stacklesssocket.py module which is a separate download. That module has unittests built in as the module's main function, but when running it on my FreeBSD 7.0-CURRENT box, it died with an exception ending in:

File "stacklesssocket.py.ok", line 286, in handle_connect
  self.connectChannel.send(None)
AttributeError: 'NoneType' object has no attribute 'send'

after doing some digging, I found that stacklesssocket.py has a dispatcher class which is a subclass of a class by the same name in Python's asyncore.py module. stacklesssocket.dispatcher.connect() calls asyncore.dispatcher.connect() which may directly call the object's handle_connect() method before returning back to stacklesssocket.dispatcher.connect(). However stacklesssocket.dispatcher.connect() doesn't setup that channel until after the call to asyncore.dispatcher.connect() returns. So when handle_connect() tries to send a message over a channel that doesn't exist yet, an exception is raised.

This trivial patch seems to fix the problem - only sending a message over the channel if it exists (which should only happen if there's another tasklet waiting on it back in a stacklesssocket.dispatcher.connect() method).

--- stacklesssocket.py.orig       2007-09-18 20:58:02.000835000 -0500
+++ stacklesssocket.py  2007-09-18 22:03:13.370709131 -0500
@@ -282,7 +282,7 @@

    # Inform the blocked connect call that the connection has been made.
    def handle_connect(self):
-        if self.socket.type != SOCK_DGRAM:
+        if (self.socket.type != SOCK_DGRAM) and self.connectChannel:
            self.connectChannel.send(None)

    # Asyncore says its done but self.readBuffer may be non-empty

With that patch, the unittests run successfully - at least on my box.

Going to PyCon 2007

I've got my plane ticket, hotel reservation and conference registration for PyCon 2007 all lined up, so I'll be headed for Texas in 6 weeks.

Expy Update

I just updated Exim on my home server to 4.63, and built it with my py-exim-localscan (AKA expy) module linked to Python 2.5

Only minor glitch was a C compile warning, that's probably due to better warnings in a newer version of GCC than what I had when the package was originally developed. I fixed it and bundled up a new release - mainly to assure that it's not abandonware.

Django-powered

Haven't posted anything in a while, because I've been redoing this site in Django. Previously I had a photo-gallery written as a direct mod_python app, the software part was Zope 2.x, and this blog was in PyBlosxom.

mod_python is pretty bare-bones (as it should be), and I've been down on Zope for some time now. PyBlosxom was nice, but I've become quite a Django fan, and felt I could do much more with that framework. So I figured it would be good to do a kind of unification - and learn some more Django at the same time.

I'm using Markdown for editing the bodies of blog entries now, and found it was pretty easy to transfer the old PyBlosxom files into Django database records, with Markdown mostly able to handle the HTML I had entered for those old entries with just a few minor tweaks.

The Django URLs were planned so that Apache would be able to rewrite the old PyBlosxom URLs into the new format - so hopefully existing links will still work. URLs for the old feeds should be handled transparently, but I'm omitting the old entries from the feeds because their links had changed, and didn't want them to reappear as new entries for whoever's subscribed to them.

Returned from PyCon

Got back from PyCon 2006, in mostly one piece. Picked up a terrible cold at the conference, I suppose scrounging food off the same buffet tables as 400 other people wasn't the most hygenic thing in the world.

Attended the mainly web-oriented sessions, came away very impressed with Django. I had sort of blown it off before because I didn't like the look of the templating language, and the ORM seemed weird. But after seeing what's coming in the Removing the Magic branch, I think it will be much much nicer. Was even inspired to spend the little time I had there Monday morning and afternoon sprinting with the Django guys, but I don't see how one can sprint effectively in such a short time with the limited knowledge of the codebase I had. Maybe if I go next year ... and I know more Django ... and can spend more than a day there, then I could accomplish something useful during the time.

The TurboGears guys demonstrated some nice things with AJAX widgets, but the SQLObject part of TG has given me trouble in the past when working with an existing DB, and seems to get in the way more than it helps. Even so, the TG guys, and Ian Bicking seemed pretty cool, so I hope they polish things up a bit more. Maybe SQLObject 2 will be the answer, or maybe a switch to SQLAlchemy (which wasn't represented at the conference), would make TG a nicer environment to work in.

I've been struggling with Zope for some years now. From a user standpoint I guess it's OK, from a programmer standpoint it's a nightmare, both 2.x and 3.x. The documentation and community attitude have rubbed me wrong for a long time. I attended a couple Zope sessions at the conference, but didn't hear anything to inspire me to keep up with it. I'll probably switch what little Zope things I have going to Django/TurboGears/CherryPy/whatever.

The PyParsing presentation on writing an adventure game was interesting, wish I could have attended the more in-depth one but it conflicted with a Django session. PyParsing looks to make a hard job pretty easy, and I'd love to play with it somewhere.

The party at NerdBooks had some decent food, they had a pretty deep selection of books, and the prices on some of the things I looked up were much better than Amazon. Will definitely look there next time I need something.

Lastly, I hope Django or someone who was at the sprint uses the codename "Vacuum Assassin" somewhere. That would just be too cool.

Running PyBlosxom through SCGI

Out of curiosity, ran the Apache Benchmark program ab on the plain CGI installation of PyBlosxom on my little server (-n 100 -c 10), and got around 1.5 requests/second. Decided to give SCGI a try, and got some better results.

Went about this based on what I had read in Deploying TurboGears with Lighttpd and SCGI. Tried Lighttpd at first, and it mostly worked, but I've got an Apache setup right now, so wanted to stick with that for the moment (and it seems a bit quicker anyhow). Basically started by loading flup with easy_install.

 
    easy_install flup

Copied the config.py and wsgi_app.py files from the PyBlosxom distribution into a directory, and added this little script into that same directory:

#!/usr/bin/env python
import sys
from flup.server.scgi_fork import WSGIServer
from wsgi_app import application

server = WSGIServer(application, 
                 scriptName='/blog', 
                 bindAddress=('127.0.0.1', 8040)
                )
ret = server.run()
sys.exit(ret and 42 or 0)

Installed mod_scgi built for Apache2 and added two lines to the config

LoadModule scgi_module libexec/apache2/mod_scgi.so

SCGIMount /blog 127.0.0.1:8040

Notice how the scriptName and bindAddress parameters in the Python code are matched in the SCGIMount Apache directive. With this setup, running the same ab benchmark yields about 10 to 15 requests/second - not too bad. Running the threaded SCGI server (remove the _fork from the first import line) wasn't as good, only 3 or 8 requests/second.

The setup seems a bit shaky in that the benchmark values seem to keep decreasing with every run, especially in the threaded mode. So there may be some problems in my setup or in flup/scgi/pyblosxom_wsgi.

Even if it was working fine, SCGI is probably overkill for running PyBlosxom when you're not expecting a lot of traffic. And if you were, you'd probably run it with --static to generate static pages. But it was a reasonable thing to fool with for the day when you want to run a more dynamic WSGI app.

Going to PyCon 2006

Found out this week that I get to attend PyCon 2006 on my employer's dime. Never been to anything like this before, so it should be interesting. Probably will attend the more web-related presentations.

Trying PyBlosxom

Going to give PyBlosxom a try, seems like a pretty simple system for throwing together a simple blog. Right now simple sounds pretty good. A lot of the Python-related blogs I normally end up seeing seem to use this software, so I figure it can't be too bad. Things like Zope/Plone/Turbogears seem like way overkill for just a simple one-person setup.

I'm kind of interested in the idea of a blog as a resume, so I'll try to write down some of the things I've worked on or figured out.

Previously, I've put some things on Advogato, however I always felt a bit guilty entering items that were too lengthy or not interesting enough for other readers. I guess that doesn't stop most bloggers, but at least on my own server I feel I can abuse it as much as I want.