Generating an index of Haskell haddock docs automatically

Haskell’s great, and has a lovely documentation generation system called haddock. Even better, if you install one of Haskell’s many third-party libraries using the excellent cabal install, it will (if you configure it to do so) generate these docs for you. Having local copies of documentation like this is always good for when you’re offline, so this is A Good Thing.

Sadly, there’s no master index of what’s installed. I’ve got it configured to install globally, so the docs go into /usr/local/share/doc, and each (version of each) package gets a folder of its own there; if you’ve got cabal calling haddock, that folder will contain an html folder with the docs in, but it’s tedious to click through an (otherwise empty) folder to get to it each time, and the whole setup’s not very pretty or informative (and the lexicographic sorting is case-senstivie, which I don’t much like). Eg:

Bare index of Haskell docs

People have attacked this problem before, but PHP makes my skin itch, and I can’t be bothered with apache, so a simpler, static solution seemed right for me.

Thus, I’ve knocked up a (quick, dirty, nasty) python script to generate the index. As a happy side-effect, I’ve pointed it at the hackage CSS file, so it’s pleasingly pretty and familiar:

Pretty index of Haskell docs

I did it in python because that’s still my go-to “hack stuff together in a hurry” language, I guess; but I was very nearly did it in Haskell. Mainly it was library knowledge that drove me. Also perhaps I fancied a break. Hmmm. Next time, for sure.

If anyone’s interested in the code (the doc path is hardcoded, so if you’re installing user docs, change it), you can view it pretty/download it via this link, or just check it out here. Oh, and what about the “automatically” part? Well, just stick it in a cron job… ;-)

Update: I realised I wasn’t linking to the actual index.html file, which kinda defeated the point of the script! However, it’s an easy fix. The line that says:


                s.write('<a href="file://%s">' % package.path)
</a>

should actually say:


                s.write('<a href="file://%s">' % os.path.join(package.path, 'html', 'index.html'))
</a>

Aaaanyway…

Read the rest of this entry »

A short survey of some chart-drawing options for Python and Haskell

I’ve been thinking about chart drawing a bit lately, partially because I’ve been doing some work which needs it, and partially because I keep seeing pretty pictures like the ones here (or in these slides) and wondering how people produce them.

Perhaps old news, but today I came across the Google charts API, for drawing charts (line, bar, pie, scatter, radar, etc.) via URLs. It’s clearly not capable of the prettiness linked above, but seems quite neat for “workhorse” charting, e.g.

Example pie chart

I particularly like the maps option:

Example map

Naturally, there exist Python and Haskell bindings.

I’ve previously looked at NetworkX, matplotlib and gnuplot, all of which are a bit more heavyweight — though I think only NetworkX, if any, could handle the prettiness mentioned initially.

HaskellCharts was mentioned in the latest Haskell Weekly News.

Today I also found Cairoplot, Chaco (very plot-centric), and the fruity Mac goodness that is NodeBox — very pretty, and looks lots of fun, but not exactly a charting app.

Right. That should be enough to be getting on with, anyway…

On prototypes and real applications

Quite so: prototypes and real applications.

Your prototype needs to be written quickly and then it needs to change quickly. You’ll only be able to do that with a maintainable, flexible code base. In short, a well-written code base. You’re a proficient software engineer, you know how to do this. You probably do it without even thinking.

And at some level, everyone knows this. That’s why prototypes are created in languages like Python. A language that you can write quickly, but also write well, quickly.

Visualizing regular expressions with reAnimator

reAnimator — a very cool tool for visualizing regular expressions. Given an RE, renders the corresponding NFA and DFA and animates acceptance (or not) of an input string. Try out the “a|ab|abc|abcd” example with input “a” for a neato example. A flash app, written using python.

A Subversion Pre-Commit Hook in Python

A Subversion Pre-Commit Hook (in Python) [smallcool]. That’s bound to come in handy some time… :-)

“Perl, I’m leaving you”

Damn straight.

All about python and unicode

Handy for future reference and pointing confused ASCII-loving students at: all about python and unicode, just the thing to explain the mysteries of unicode in a friendly manner, especially, er, if you know python.

Threads should pass messages, not share memory

Highly recommended reading for any of my students out there: a comparison of message-passing concurrency vs. shared-memory concurrency, with a healthy dose of historical perspective. The author introduces Erlang-style concurrency in a Java-ish setting, and does so quite well, to my mind.

Reading the introductory remarks about candidates in interviews, I was pleased, nay, smug to realise that – albeit inadvertantly – I came to multi-threaded programming via the message-passing route, and would probably have made him quite happy if he’d interviewed me. Back when I worked at Frontier I did my first multi-threading work, in Python, and made heavy use of its excellent Queue class for inter-thread communication. Queue provides a thread-safe message passing mechanism, hiding all the nasty details of locking from me, which was exactly what I was looking for. My threads shared almost no state, and what state they did share was mostly Queue objects. They communicated by passing messages through Queues (messages could be anything, and often were), and it was all lovely and clean.

Why did I go down that route? No genius; I just got lucky (yeah, lucky in that I was using Python not Java or C or C++). I had excellent advice from the good folk on comp.lang.python/python-list: this was the way to proceed. Of course, looking back I realise many of these guys knew all about message passing vs shared memory, they knew about Erlang, they knew about Haskell, hell some of them even knew about Lisp. A community as smart and welcoming as that one is a precious resource for a budding programmer.

Anyway, this led to two strongly noticeable results.

First, my code worked well, and didn’t suffer from mysterious hard-to-debug race conditions, etc. It “just worked”, as is often the way with Python.

Second (confession time), I didn’t actually learn properly about semaphores, monitors, shared memory concurrency and all its ridiculous fiddly baggage until I came to teach them in the Operating Systems module at Swansea! By then I’d already formed a strong sense that high-level languages (and Python in particular) made life so much sensibler, so the shared memory stuff slotted quite readily into the mental space of “low level stuff which has to be understood, but is best done by software not humans” (like much of that module).

I was discussing this whole issue with one of my students earlier in the week. If she closed her app’s main window while a worker thread was running, the program would exit uncleanly. This being Python, it was nothing more drastic than an exception/traceback, but she found this properly displeasing and wanted to clean it up (good, I said). It turned out that the main thread wasn’t waiting for the worker to finish: it exited immediately, cleaning up all data, including data the worker was trying to update. Hence, exception city. I showed the simple fix (make the main thread wait for the worker to finish, using a shared boolean “I’m not dead yet” variable), but then I tried to persuade her that message-passing concurrency was the way to go for all inter-thread communication. Even, said I, right down to the (frequent, many) interface updates caused by the worker thread. That is, I suggested, the worker shouldn’t update the GUI component directly, because the GUI is owned by the main thread. Instead, the worker should pass messages to the main thread telling it what updates to perform, and the main thread should poll for these messages and do the updates. I don’t think I sold her on it entirely, but maybe I planted sump’n.

(Caveat: yes, if performance really matters – eg you’re doing volume graphics – this may be poor advice. For the other 95% of us, however…)

woof

woof – very handy for occasional one-shot (or multi-shot) file serving. Put it somewhere in your path and every now and then it will make you happy.

Woof (Web Offer One File) tries a different approach. It assumes that everybody has a web-browser or a commandline web-client installed. Woof is a small simple stupid webserver that can easily be invoked on a single file. Your partner can access the file with tools he trusts (e.g. wget). No need to enter passwords on keyboards where you don’t know about keyboard sniffers, no need to start a huge lot of infrastructure, just do a

$ woof filename

and tell the recipient the URL woof spits out. When he got that file, woof will quit and everything is done.

Why the iPhone puts Java into the past tense (apparently)

In Which I Think About Java Again, But Only For A Moment [smallcool].

Me, I defected long ago. I’m another of those Apple Java engineers who dropped out. I spent five years as a raving Java fanboy, but I gave up after optimizing AWT, implementing drag and drop, and trying to make 1,200 pages of crappy APIs do the right thing on the Mac. Then I took a one-week Cocoa training course, and wrote the first prototype of iChat.

Desktop Java never worked because Sun tried to build their own OS on top of the real OS, duplicating every API and feature. This led to terrible bloat, making every app as heavyweight to launch as Photoshop. Worse, the GUI portions of the Java platform are awful, because Sun is a server company with no core competency at GUIs. The APIs are too clumsy to code to, and compared to any decent Mac app, the results look like a Soviet tractor built on a Monday.

He almost makes me want to get a Mac, ditch BSD and emacs, and start writing Cocoa apps – except that then life would just be too darn easy, and I’d never hear the end of it from TR (or Bash, probably).

It’s always somewhat depressing, or at least downheartening, to see someone like this tell me I shouldn’t be using emacs. It always makes me wonder if, maybe, they’re right – maybe out there there’s an editor that’ll do everything emacs does for me, but somehow nicer, more productive. Usually, as far as I can tell, that means the editor does IDE-like things such as autocompletion, code browsing, etc. – and yes, that’s stuff I just don’t use when coding. But gosh darn it, I’ve tried a lot of editors and nothing has ever come close, for me, to the feeling of power and (welcome) flexibility emacs gives me. Effortlessly editing multiple files in multiple split views in multiple windows (across multiple virtual desktops), powerful and easy regexp and macro capabilities, and (as one of the commenters on the linked article says) just doing The Right Thing with indentation in Python, Java, C, Haskell, … So the downheartening aspect is the tantalising feeling that there’s something else out there I should be using, but I just can’t find it! Of course, not using a Mac probably doesn’t help me, here. ;-)

I haven’t listened to Depeche Mode for a while, mind…

Oh, and here‘s another “Java is rubbish” story (comparing EJB lines of code with python/django) from the same source.

You can’t beat python for seeing in the new year…

Gimbo’s New Year’s Eve 06/07: human_twister.py

Rails vs Django

Rails vs Django [smallcool].

Interesting and balanced. I tried Django about a year ago and did indeed get going with it quite quickly, although the lack of migration was a big pain in the butt, and sounds like a killer feature in Rails.

jobtimer.py – timing small repetitive jobs

It’s exam marking season hereabouts (and, thanks to the AUT industrial action, coursework marking time), so I’ve got my head down in piles of exam scripts.

One exam (on IT security) was a complete nightmare to mark – essay questions, loads of text, oh it just took ages. It really seemed to go on forever. At least I only had to mark half of it – but I really wasn’t looking forward to my other exams, to be marked all on my own.

Python to the rescue!

No, not a random number generator (though it’s sometimes tempting). Instead, a motivational tool: something to keep me focussed and “in the game”.

I have two problems when marking, basically: one is that when I’ve got a huge pile to get through, and it’s going fairly slowly, oh it’s sooo painful and you want it to be over, but it isn’t, and it won’t do itself, and, well, it’s all very antimotivational. The other problem is that I get distracted easily, so I’ll do a few, then chat, then do a few, then play Urban Dead for a bit, then do a few more, etc. Naturally these two problems feed into each other, and a snail’s pace is achieved.

It’s all mental – the issue is focus. Thus, we present jobtimer.py, a little script I knocked up in a hurry yesterday evening to help me stay focussed. And I gotta say, it’s proven instrumental in helping me hammer through the networks exams in record time.

Basically, jobtimer.py is a simple tool for keeping track of progress through a large batch of small repetetive jobs, where you want to know how long you’re spending on each job on average, and how many you’ve done so far. It has a simple text-mode keyboard interface, whose central feature is “you hit space to tell it you’ve finished one job and are starting another”. You can pause it, report on averages, and see how much time since you started has been spent “unpaused and working” (as opposed to “paused and playing Shartak, say).

It’s very simple: no persistence between sessions, no flashy graphics, and probably only works on Unix – it uses select() on stdin to catch keypresses; a Windows version could be hacked using msvcrt, I guess. The code’s not beautiful, but it does the job beautifully well for me.

Read the comment at the start of the code to see excactly how to use it. It’s dead simple.

Screenshot:

Screenshot of jobtimer.py in action

The clock starts ticking at 15:06:07; the first job takes 27 seconds, then 2 mins 5, then 1:42, then 2:51. 10 seconds into the next job (at 15:06:23), the clock is paused. 15 seconds later, it’s unpaused, and three seconds later that job’s complete. At 15:06:44 we hit ‘a’ to get a reading of averages/stats: 5 done, average 1 min 27, elapsed wall clock time is 7 mins 36, 96% of which has been spent with the clock ticking. Etc., etc. – you get the picture. Actually, looking at this shows me a bug/feature: when you quit, it doesn’t count the job that was just running – so end a job before quitting if you care about accuracy. :-)

Scripts in ruby a la python’s __name__ == ‘__main__’ idiom

A common idiom in python is to check the special variable __name__ to see if the current module is being run as a script or not. For example:

class Foo:
    ...

def bar():
    ...

if __name__ == '__main__':
    bar()

Here, if the module is run as a script (ie passed directly to the python interpreter), then __name__ has the value “__main__”, this is detected, and (in this case) the bar() function is called. On the other hand, if the module is just imported from some module, __name__ has a different value (the name of the module file, I think?), and bar() doesn’t get called.

This is nice for a number of reasons – for example, you might put unit tests into bar().

How to do this in Ruby? It’s not in FAQ, which surprised me. I was about to ask on ruby-talk but then remembered the biggest FAQ of them all, and turned to google. Aha (and eek, what a horrible mailing list interface). Anyway, it’s:

if __FILE__ == $0
    bar()
end

OK, so why does this work?

$0 contains the name of the script being executed – ie, the name of the file that was passed to the interpreter. Whatever code you’re executing, this value never changes over a particular run of ruby. On the other hand, __FILE__ is always the name of the current source file. If the two match, then the current source file is the one that was passed to the interpreter.

I guess that’s pretty clear. Cool.

Operational Semantics of a Simple Process Algebra in Python and Haskell

As promised, though I’m still working on the shiny LaTeX article which actually explains all this stuff…

From the README:

This is an investigative/learning exercise in transforming process algebraic expressions into labelled transition systems according to their operational semantics, using Python and Haskell.

It started out just as an exercise in operational semantics, using Python because it’s the language I get things done fastest in. That proved interesting enough that I wanted to re-do it in Haskell, which I’m learning and which arguably handles this sort of thing better.

The Python version is more advanced. It includes mu-recursion, which the Haskell doesn’t, and is quite tidy IMHO. OTOH the Haskell is less developed, and in particular the functions for actually creating the graphs could, I’m sure, be improved. Always nice to have some future work to do…

I’m publishing this in case anyone is interested in the code. In particular, it is perhaps useful as an example of:

I’m working on a paper describing the problem, the semantics, and the approaches taken in the two languages, but it’s very much an ongoing WIP and isn’t ready to publish yet.

Homepage for this work: http://www.cs.swan.ac.uk/~csandy/research/op_sem_py_hs/

gimbo gimbo, where art thou?

I’m still here, I’ve just been too busy to blog. However, while I’m waiting for ghc-6.4 to compile (that’s my excuse anyway), I thought I’d do a quick blogdump…

I was going to write about my week in London, and a bit about what I’ve been reading lately, but I started by writing the stuff below instead, and now I think that’s enough for one post, so I’ll publish this and follow up with the rest maybe tomorrow or Sunday. (Short version: London fun except bombs, Quicksilver OK, Accelerando completely barmy but getting a bit dull at 40%). Colin, I should warn you that the rest of this post is of the kind you don’t like. The London diary thing might be better. Sorry!

Work’s been truly excellent this week. No students so no teaching, and also no admin for a while too. Some time I need to do some preparation for next year’s teaching, but for the next two months I’m mainly going to be focussing on research at last, aiming to break the back of my MPhil. I made a breakthrough on the parsing side last Sunday (or rather, I cleared up a misconception I’d been harbouring), but have spent this week on a different tack, learning about operational semantics through the magical medium of Python. Specifically, Markus gave me a toy process algebra and its semantics, and outlined the algorithm for turning expressions in the PA into their labelled transition systems, then I went away and programmed it.

It’s been excellent fun, and I got it basically done in no time at all, mainly because I chose to do it in Python. It’s quite remarkable, in fact… For months I’ve been struggling to get anywhere with my research, and it’s been very depressing and making me feel like I can’t hack this stuff. Now I realise it’s just that I’m a complete Haskell newbie. If I was doing my research in Python, I’d probably have finished by now. Alas, I have to do it in Haskell, because of the system we’re interacting with, but it’s encouraging to realise my problems relate to the language/paradigm I’m working in, not some basic failing on my part to understand what the heck I’m trying to do.

Anyway, I’m writing up an article explaining what I’ve done, and either later today or early next week I’ll publish it and my code, so anyone who reads the above and says “huh?” can take a look if they want. (JOn? You reading this? ;-))

Next week is graduation week at Swansea University and I’m acting as a marshall on Monday, which is Science day. So I get to see all this year’s third years do their stuff. With luck and effort, I should be there myself this time next year.

What else is new? I recently made the shift from ion to ion3. Ion’s been by window manager of choice for about three years now, mainly because I can’t be bothered with all that tedious point-and-click move-and-resize nonsense you have to do with windows in most WMs. TR occasionally moans at me that it’s modal but I don’t see it as a problem, it works for me and is extremely keyboard friendly and fast, so I’m happy. But I’ve been feeling behind the curve, and in particular some apps (eg the Gimp) don’t play well with the tiled model – which is why ion3 is nice because it adds optional “float workspaces” which act more like a conventional tedious point-and-click point-and-resize window manager if and when that’s what you need. Making the move was non-trivial because my config files had to be ported to Lua, but now it’s done and I’m very happy with my window manager indeed. Once again, I’d recommend looking at Ion if you’re getting dissatisfied with your Gnome/KDE experience and want to strip things down.

Finally, a couple of Python goodies via the Python-URL: try/except vs if/else in terms of speed (an oldie but a goodie, especially when advocating Python to curmudgeons), and Sparklines, which are kinda weird and kinda cool, but I’ve no idea if they’d actually be useful.

Hackers and Painters

Developers have much to learn from Hackers & Painters, a review of Paul Graham‘s book “Hackers and Painters”. Sounds like there’s lots of functional programming evangelism going on here, and it’s interesting to read that Graham asserts:

The programmers you’ll be able to hire to work on a Java project wont be as smart as the ones you could get to work on a project written in Python.

… particularly because I’ve heard that quote before, but it was Haskell that was being bigged up, not Python. All the same, it’s nice to have my status as a cognoscenti reaffirmed in public. ;-)

Lisp in Web-based Applications sees Graham expanding on what makes Lisp great, if you don’t want to buy the book just yet… There’s a particularly interesting bit about using closures to elegantly solve the problem of HTTP’s statelessness. Quote: “by using closures, we could make it look to the user, and to ourselves, as if we were just doing a subroutine call.” I’ve bolded the important bit: all web apps these days make it look to the user like you’re just doing a subroutine call, but to make it look like that to the developer is much more impressive. Sure, there are mechanisms in Java or whatever, but I love this idea of using closures: so much simpler and more elegant. (Here’s a nice explanation of closures, for the unsure.)

Finally, I disagree that “its hard to find successful adults now who don’t claim to have been nerds in high school”. For my whole life the world has seemed to be full of successful ignorant bullies and deceivers, and I don’t really see any signs of that changing. It’s a nice dream for a geek to have, I guess, and I can see how rising to the top during the dotcom bubble would surround you by enough successful nerds that you might think it was even true.

But who cares about the iniquities of the world when we’ve got shiny shinies like Haskell and Python to play with, eh? :-)

Operating Systems in Python

More project ideas for next year: further to this post about writing an operating system kernel in Haskell, it looks like some crazy dudes are doing similar things using Python. Ladies and gentlemen, we present: Cleese, and the delightfully-named Unununium [python-list].

Apparently Cleese started with the idea to “make the Python intepreter a micro-kernel and boot directly to the Python prompt.”, but it doesn’t look like there’s been any movement there over a year, and hasn’t released any files.

Unununium looks interesting, and very much under development. The introduction states that components and a unified filesystem namespace (a la Plan 9) are key goals.

OK, maybe all this is too flakey and out-there for a decent project, but nonetheless, nice to know it’s happening.

More on factorial in Python

More stuff over at tiddly-pom on the whole 10,000 factorial thing. Julian does a good job of pointing at me and laughing (rightly so, rightly so), and there are a couple of imperative approaches to the problem suggested, including one using generators – yay.

Of course, something I forgot to point out about Stephen’s reduce based solution is that it’s a functional approach, and so has more in common with what’s happening in the Haskell version than these imperative versions. Anyway.

Python still amazingly cool also

Oh, how quickly we forget. Or to put it another way, how stupid I am. Or to put it another way, what a fickle language whore I am.

Further to my recent joy regarding Haskell, Stephen Judd appeared from nowhere and reminded me that of course, Python also handles arbitrary-length integers “out of the box”. How the hell did I forget that?

In fact, he went one better and gave me a single line of Python for calculating 10,000 factorial, viz:

reduce(lambda x,y:x*y, range(1,10001))

Admittedly this is harder to grok than the Haskell version:

bigFac :: Integer -> Integer
bigFac n
  | n == 0 = 1
  | n >  0 = n * bigFac(n-1)

(at least, without knowing the semantics of reduce) – but it does also seem to run a bit quicker on my box.

Hurrah for Python!

Of course, this all just reminds me that I’m spending too much time lately thinking about theoretical computer science, and not enough time getting my hands dirty programming… :-/

Update 2007-01-26: Of course, the python version is not easier to grok than this alternative Haskell version:

product [1..10000]

Now that’s beautiful, as is the definition of product.

Where Python meets Haskell

Alex Martelli explaining how Python’s list comprehensions and (new and groovy) generator expressions relate to Haskell, and why Python doesn’t (in generally) do lazy evaluation. As someone with a foot in both camps, but no deep understanding (alas – yet) of what’s actually happening, this is pretty interesting stuff…