Generating an index of Haskell haddock docs automatically

Haskell’s great, and has a lovely documentation generation system called haddock. Even better, if you install one of Haskell’s many third-party libraries using the excellent cabal install, it will (if you configure it to do so) generate these docs for you. Having local copies of documentation like this is always good for when you’re offline, so this is A Good Thing.

Sadly, there’s no master index of what’s installed. I’ve got it configured to install globally, so the docs go into /usr/local/share/doc, and each (version of each) package gets a folder of its own there; if you’ve got cabal calling haddock, that folder will contain an html folder with the docs in, but it’s tedious to click through an (otherwise empty) folder to get to it each time, and the whole setup’s not very pretty or informative (and the lexicographic sorting is case-senstivie, which I don’t much like). Eg:

Bare index of Haskell docs

People have attacked this problem before, but PHP makes my skin itch, and I can’t be bothered with apache, so a simpler, static solution seemed right for me.

Thus, I’ve knocked up a (quick, dirty, nasty) python script to generate the index. As a happy side-effect, I’ve pointed it at the hackage CSS file, so it’s pleasingly pretty and familiar:

Pretty index of Haskell docs

I did it in python because that’s still my go-to “hack stuff together in a hurry” language, I guess; but I was very nearly did it in Haskell. Mainly it was library knowledge that drove me. Also perhaps I fancied a break. Hmmm. Next time, for sure.

If anyone’s interested in the code (the doc path is hardcoded, so if you’re installing user docs, change it), you can view it pretty/download it via this link, or just check it out here. Oh, and what about the “automatically” part? Well, just stick it in a cron job… ;-)

Update: I realised I wasn’t linking to the actual index.html file, which kinda defeated the point of the script! However, it’s an easy fix. The line that says:

                s.write('<a href="file://%s">' % package.path)

should actually say:

                s.write('<a href="file://%s">' % os.path.join(package.path, 'html', 'index.html'))


#!/usr/bin/env python

# Horribly hacked-together script to generate an HTML index page for
# Haddock-generated Haskell package docs.  Note that the docs
# directory is hard-coded, below: see the comments there.

import os
import re
import StringIO

# This is the path to the docs.  Any subdirectory which contains an
# 'html/index.html' will get an entry in the index (all the other are
# listed separately at the end).

# The index is written to 'index.html' at this path.

PATH = "/usr/local/share/doc"

class Package:

    def __init__(self, name, path): = name
        self.path = path
        self.title = self._seekTitle()

    def __cmp__(self, other):
        return cmp(,

    def index(self):
        return os.path.join(self.path, 'html', 'index.html')

    titleRE = r'>.*: (.*)</TITLE'

    def _seekTitle(self):
        f = open(self.index())
        d =
        m =, d)
        if not m:
            raise ValueError, "Can't extract title from %s" % self.index()

class Builder:

    def __init__(self, path):
        self.path = path
        libs = [l for l in os.listdir(self.path)
                if os.path.isdir(os.path.join(path, l))]
        self.has_html = []
        self.no_html = []
        for lib in libs:
            full = os.path.join(path, lib)
            if self.seek_html(full):
                self.has_html.append(Package(lib, full))
                self.no_html.append((lib, full))
        p = os.path.join(path, 'index.html')
        o = open(p, 'w')

    def html(self):
        s = StringIO.StringIO()
        s.write('<title>Local Haskell package docs (from Haddock)</title>\n')
        s.write(('<link rel="stylesheet" type="text/css" '
        s.write('<h2>Local packages with docs</h2>\n')

        split = self.splitAlpha()
        keys = split.keys()

        s.write('<p class="toc">\n')
        links = ['<a href="#%s">%s</a>' % (start, start) for start in keys]
        s.write(' &bull; '.join(links))
        for start in keys:
            packages = split[start]
            s.write('<h3 class="category"><a name="%s">%s</a></h3>\n' % (start, start))
            s.write('  <ul class="packages">\n')
            for package in packages:
                s.write('    <li>')
                s.write('<a href="file://%s">' % package.path)
                s.write('</a>: %s</li>\n' % package.title)
            s.write('  </ul>\n')
        s.write('<hr />')
        s.write('<h2>Directories without HTML docs</h2>\n')
        for (name, path) in self.no_html:
            s.write('  <li><a href="file://%s">%s</li>\n' % (path, name))
        h = s.getvalue()
        return h

    def splitAlpha(self):
        al = {}
        for package in self.has_html:
            start =[0].upper()
            except KeyError:
                al[start] = [package]
        return al
    def seek_html(self, path):
        items = os.listdir(path)
        if 'html' not in items:
            return False
        html = os.path.join(path, 'html')
        if not os.path.isdir(html):
            return False
        return 'index.html' in os.listdir(html)
if __name__ == '__main__':
    builder = Builder(PATH)

One Response to “Generating an index of Haskell haddock docs automatically”

  1. October 25th, 2010 | 11:56 pm

    I did eventually rewrite it in Haskell.