Well Quite!


The Rants, Raves, and Rituals of Matthew Sackman
Sunday, November 4th, 2007

New Site (followup)

I guess even a few months ago, I would not have attempted to write scripts for generating a website in Haskell. I would have succumbed to old habits, reasoning that Perl is the correct language in which to do text manipulation and to just bite the bullet and get on with it. In hindsight, I'm very glad I did attempt to do it in Haskell. I'm left with just three programs, one which takes articles and generates JSON from them, one that takes JSON and generates HTML from them, and one that takes JSON and generates RSS from them. All three are either just under or just over 100 lines of code, and sure, I used as many libraries as I could, but I fail to see how I could have done the same work in less code in Perl.

So a little bit about the structure of this site then. I write text files, where the file name is the computer name of the article, the first line of the file is the human name of the article, and the rest is raw HTML. I maintain a set of files that point at the head of each chain - the blog is a chain as are the other pages linked from the left. Each plain text file is converted to JSON and inserted into the list - it's a double linked list. Time is also added at this point.

The JSON version is then used to generate the full perma-link version by combining with a template, using the Text.HTML.Chunks module I wrote a while ago. Then you just point the RSS builder at the pointers to the head of the chain and it generates RSS for the last ten entries. Pretty simple really. There's a JSON module that I'm very familiar with, and I grabbed an RSS module from hackage which I then edited so that it uses Data.Time rather than the old and broken System.Time module.

To take just an example of the power that becomes available: The JSON files store the date in a human form. This is probably a mistake, but if I did a proper timestamp, I'd then have to deal with that in Javascript which probably wouldn't be pleasant. So, in order to build the RSS, the RSS module I need wants the PubDate as a proper time format, not just a String. So I have to parse it. The problem is that none of the time formatting codes include the English number suffixes: e.g. the nd in 2nd. So, in a normal language, I'd have to write some horrible nested if statement to test for %est, %end, %erd or %eth (%e is the day of month, space padded). But in Haskell, parsing is well known as an action that can fail, so it's wrapped in Maybe. Maybe's also in MonadPlus, so I can simply write:

rebuildDate :: String -> Maybe UTCTime
rebuildDate dateStr
    = msum . map (flip (parseTime dtl) dateStr) $ parsers
    where
      dtl = defaultTimeLocale
      parsers = [ "%A, %B %est, %Y"
                , "%A, %B %end, %Y"
                , "%A, %B %erd, %Y"
                , "%A, %B %eth, %Y"
                ]

Now that is rather beautiful. The first one to succeed will be the one who's result will be returned - that's what msum and MonadPlus gets us. Ok, so if you're not used to Haskell, it might not strike you as being that clear, but let's consider the alternatives.

  • Haskell, but without the msum:
    rebuildDate :: String -> Maybe UTCTime
    rebuildDate dateStr
        = case parseTime dtl "%A, %B %est, %Y" dateStr of
            (Just date) -> return date
            Nothing -> case parseTime dtl "%A, %B %end, %Y" dateStr of
                         (Just date) -> return date
                         Nothing -> case parseTime dtl "%A, %B %erd, %Y" of
                                      (Just date) -> return date
                                      Nothing -> parseTime dtl "%A, %B %eth, %Y"
        where
          dtl = defaultTimeLocale
    

    Yep, that's really nice. We've got much more code there and the horrible nested control flow. Let's consider expanding that to twenty different date formats. In fact, this is going to be the same pattern for all normal languages so I won't bother repeating them here. The only difference is that most languages won't force you to deal with the error case, so your code will probably be buggy.

  • Regular Expressions. The problem with this approach is dealing with everything else in the format string - you don't want to start putting the days of the week (%A) or the full name of the month (%B) in the regular expression. In fact, you don't want to use \d+ for the day of the month either. Nor will \d{,2} do either - both allow values for the day (number) of the month that %e does not (e.g. 99). So the best you could do would be to parse using a date library the "%A, %B %e" and then, if all's okay, drop the next 4 chars and then parse the "%Y". But that accepts strings that should be rejected, so use the regular expression to match on "(st|nd|rd|th), ". And then grab the "%Y".

My point is that lots of people seem to think firstly that Haskell isn't suitable for real world tasks, and secondly, that the things Haskell makes you deal with just get in the way. Hopefully, this has illustrated that the things Haskell makes you deal with are useful things, like errors, and failures, and it actually has rather wonderful machinery to prevent you from forgetting about these things which other languages don't. Any what's more, you can use such stuff to your advantage to write small amounts of very reliable code.

I don't really think that Haskell makes it easier to write simple programs. But it does make it quite a lot harder to write faulty programs.