<?xml version="1.0" encoding="utf-8" ?>
<rss version="2.0">
<channel>
<title>Fun with ones and zeros - sqlite</title>
<description><![CDATA[Barry's notes on computer software and hardware]]></description>
<link>/blog/tags/sqlite</link>
<lastBuildDate>Sun, 03 May 2026 05:10:53 -0700</lastBuildDate>
<item>
<title>Full Text Searching with SQLite</title>
<link>/blog/entries/full-text-searching-sqlite</link>
<pubDate>Sun, 03 Dec 2006 16:20:17 -0800</pubDate>
<author>bp@barryp.org (Barry Pederson)</author>
<description><![CDATA[
<p>I'd like to add a search feature back on this site.  Previously, I had an arrangement setup with PyBlosxom and ht://Dig, but now that it's Django-powered, I'd like to do something that worked directly from the database instead of crawling the site like ht://Dig did.
</p>
<p>After looking a while at various text search engines, I remembered seeing that <a href="http://sqlite.org">SQLite</a> just added an <a href="http://www.sqlite.org/cvstrac/wiki?p=FtsOne">FTS1</a> module in version 3.3.8, which sounds pretty easy to use.  Unfortunately the FreeBSD port <code>databases/sqlite3</code> doesn't build with that feature.
</p>
<p>After poking around a bit, I got it to build with FTS manually, and after a whole bunch more messing around, came up with a patch to add an option to the port to build sqlite3 with FTS.   The patch has been submitted to the FreeBSD bug tracker as <a href="http://www.freebsd.org/cgi/query-pr.cgi?pr=ports/106281">ports/106281</a>.  Hopefully I have all my ducks lined up on that.
</p>
<hr />
<p>Anyhow, in testing the FTS a bit, I found one thing they only hint at on the SQLite website.  There's a <a href="http://www.tartarus.org/~martin/PorterStemmer/index.html">Porter stemmer</a> built in, even though the wiki says: &quot;The module does not perform stemming of any sort.&quot;  You activate it by adding <code>tokenize porter</code> to the table declaration, for example (adapted from their example).
</p>
<div class="source"><pre><span class="k">CREATE</span> <span class="n">VIRTUAL</span> <span class="k">TABLE</span> <span class="n">recipe</span> <span class="k">USING</span> <span class="n">FTS1</span><span class="p">(</span><span class="n">name</span><span class="p">,</span> <span class="n">ingredients</span><span class="p">,</span> <span class="n">tokenize</span> <span class="n">porter</span><span class="p">);</span>
</pre></div>
<p>once you've done that, and inserted some sample data:
</p>
<div class="source"><pre><span class="k">INSERT</span> <span class="k">INTO</span> <span class="n">recipe</span> <span class="k">VALUES</span><span class="p">(</span><span class="s1">&#39;broccoli stew&#39;</span><span class="p">,</span> <span class="s1">&#39;broccoli peppers cheese tomatoes&#39;</span><span class="p">);</span>
</pre></div>
<p>the searches don't have to be quite as exact, for example:
</p>
<div class="source"><pre><span class="k">SELECT</span> <span class="n">name</span> <span class="k">FROM</span> <span class="n">recipe</span> <span class="k">WHERE</span> <span class="n">ingredients</span> <span class="k">MATCH</span> <span class="s1">&#39;pepper&#39;</span>
</pre></div>
<p>hits the 'broccoli stew' recipe even through it has 'peppers' and you searched for 'pepper'.
</p>
<p>Not sure why the Porter stemmer isn't documented in the SQLite wiki, perhaps it's still a work in progress or being changed for <a href="http://www.sqlite.org/cvstrac/wiki?p=FtsTwo">FTS2</a>.
</p>
]]></description>
</item>
</channel>
</rss>