PostgreSQL full text search with Django
PostgreSQL 8.3 is coming out soon with full text search integrated into the core database system. It's pretty well documented in chapter 12 of the PostgreSQL docs. The docs are a bit intimidating, but it turns out to be pretty easy to use with Django.
Let's say you're doing a stereotypical blog application, named 'blog', and have a model for entries such as:
from django.db import models class Entry(models.Model): title = models.CharField(max_length=128) body = models.TextField()
after adding your app, and doing a syncdb
, you should have a PostgreSQL table named blog_entry
. You'll need to connect to the database with psql
, or probably more conveniently with manage.py dbshell
to execute a few SQL commands to setup full text searching.
PostgreSQL Setup
First, you'll want to add a column to hold a tsvector
for the blog entry, which is a preprocessed version of text optimized for searching.
ALTER TABLE blog_entry ADD COLUMN body_tsv tsvector;
Next, you'll want to add a trigger to update the body_tsv
column whenever a record is inserted or updated:
CREATE TRIGGER tsvectorupdate BEFORE INSERT OR UPDATE ON blog_entry FOR EACH ROW EXECUTE PROCEDURE tsvector_update_trigger(body_tsv, 'pg_catalog.english', body);
You'll probably also want an index on that field, to make searches more efficient:
CREATE INDEX blog_entry_tsv ON blog_entry USING gin(body_tsv);
Lastly, if you already have records in the blog_entry
table, you'll want to update the body_tsv
column for those records (any new records will be automatically taken care of by the trigger).
UPDATE blog_entry SET body_tsv=to_tsvector(body);
As a test, to search for the words 'hello world' for example, you could try:
SELECT title FROM blog_entry WHERE body_tsv @@ plainto_tsquery('hello world');
and get back a list of entries with those words in the body.
Full text searching in Django
Full text searching through the Django ORM is possible using the .extra()
queryset modifier. To fetch the same entries with 'hello world' we searched for in raw SQL, you'd do something like:
q = 'hello world' queryset = Entry.objects.extra( where=['body_tsv @@ plainto_tsquery(%s)'], params=[q]) for entry in queryset: print entry.title
The full text features in PostgreSQL also allow for ranking the search results using the PostgreSQL ts_rank_cd()
function, and generating fragments of documents with search terms highlighted using the ts_headline()
function. An example of raw SQL for doing this might be:
SELECT title, ts_headline(body, query) AS snippet, ts_rank_cd(body_tsv, query, 32) AS rank FROM blog_entry, plainto_tsquery('hello world') AS query WHERE body_tsv @@ query ORDER BY rank DESC;
The Django ORM can generate an equivalent query with
q = 'hello world' queryset = Entry.objects.extra( select={ 'snippet': "ts_headline(body, query)", 'rank': "ts_rank_cd(body_tsv, query, 32)", }, tables=["plainto_tsquery(%s) as query"], where=["body_tsv @@ query"], params=[q] ).order_by('-rank') for entry in queryset: print entry.title, entry.snippet, entry.rank
Addendum, 2008-04-30
With the new Queryset Refactor (QS-RF) branch of Django, one of the few backwards-incompatible changes is that you can't use an order_by
method with a field generated by an extra
method. Instead, ordering by that field should be specified using an order_by
parameter in the extra
method, as in (the change is only in the last two lines)
Also, having a function call in the 'tables' parameter no longer seems to work, so as a workaround for now I've been repeating plainto_tsquery() 3 times:
q = 'hello world' queryset = Entry.objects.extra( select={ 'snippet': "ts_headline(body, plainto_tsquery(%s))", 'rank': "ts_rank_cd(body_tsv, plainto_tsquery(%s), 32)", }, where=["body_tsv @@ plainto_tsquery(%s)"], params=[q], select_params=[q, q], order_by=('-rank',) )
There is a lot more that can be done with the PostgreSQL full text search feature, check the docs for more info. But hopefully this is enough to get started with.
Addendum, 2009-01-28
Ross Poulton has a writup on Full-text searching in Django with PostgreSQL and tsearch2 - using PostgreSQL 8.1 (on a Debian system) where tsearch2 was a contibuted addon module.