New Site (followup)
I guess even a few months ago, I would not have attempted to write scripts for generating a website in Haskell. I would have succumbed to old habits, reasoning that Perl is the correct language in which to do text manipulation and to just bite the bullet and get on with it. In hindsight, I'm very glad I did attempt to do it in Haskell. I'm left with just three programs, one which takes articles and generates JSON from them, one that takes JSON and generates HTML from them, and one that takes JSON and generates RSS from them. All three are either just under or just over 100 lines of code, and sure, I used as many libraries as I could, but I fail to see how I could have done the same work in less code in Perl.
So a little bit about the structure of this site then. I write text files, where the file name is the computer name of the article, the first line of the file is the human name of the article, and the rest is raw HTML. I maintain a set of files that point at the head of each chain - the blog is a chain as are the other pages linked from the left. Each plain text file is converted to JSON and inserted into the list - it's a double linked list. Time is also added at this point.
The JSON version is then used to generate the full perma-link
version by combining with a template, using the Text.HTML.Chunks module I wrote a while ago. Then
you just point the RSS builder at the pointers to the head of the
chain and it generates RSS for the last ten entries. Pretty simple
really. There's a JSON module that I'm very familiar with, and I
grabbed an RSS module from hackage which I then edited so
that it uses Data.Time rather than the old and broken
System.Time module.
To take just an example of the power that becomes available: The
JSON files store the date in a human form. This is probably a mistake,
but if I did a proper timestamp, I'd then have to deal with that in
Javascript which probably wouldn't be pleasant. So, in order to build
the RSS, the RSS module I need wants the PubDate as a
proper time format, not just a String. So I have to parse it. The
problem is that none of the time formatting codes include the English
number suffixes: e.g. the nd in 2nd. So, in a normal
language, I'd have to write some horrible nested if statement
to test for %est, %end,
%erd or %eth
(%e is the day of month, space padded). But in
Haskell, parsing is well known as an action that can fail, so it's
wrapped in Maybe. Maybe's also in
MonadPlus, so I can simply write:
rebuildDate :: String -> Maybe UTCTime
rebuildDate dateStr
= msum . map (flip (parseTime dtl) dateStr) $ parsers
where
dtl = defaultTimeLocale
parsers = [ "%A, %B %est, %Y"
, "%A, %B %end, %Y"
, "%A, %B %erd, %Y"
, "%A, %B %eth, %Y"
]
Now that is rather beautiful. The first one to succeed will be the
one who's result will be returned - that's what msum and
MonadPlus gets us. Ok, so if you're not used to Haskell,
it might not strike you as being that clear, but let's consider the
alternatives.
- Haskell, but without the
msum:rebuildDate :: String -> Maybe UTCTime rebuildDate dateStr = case parseTime dtl "%A, %B %est, %Y" dateStr of (Just date) -> return date Nothing -> case parseTime dtl "%A, %B %end, %Y" dateStr of (Just date) -> return date Nothing -> case parseTime dtl "%A, %B %erd, %Y" of (Just date) -> return date Nothing -> parseTime dtl "%A, %B %eth, %Y" where dtl = defaultTimeLocaleYep, that's really nice. We've got much more code there and the horrible nested control flow. Let's consider expanding that to twenty different date formats. In fact, this is going to be the same pattern for all normal languages so I won't bother repeating them here. The only difference is that most languages won't force you to deal with the error case, so your code will probably be buggy.
- Regular Expressions. The problem with this approach is dealing
with everything else in the format string - you don't want to start
putting the days of the week (
%A) or the full name of the month (%B) in the regular expression. In fact, you don't want to use\d+for the day of the month either. Nor will\d{,2}do either - both allow values for the day (number) of the month that%edoes not (e.g. 99). So the best you could do would be to parse using a date library the"%A, %B %e"and then, if all's okay, drop the next 4 chars and then parse the"%Y". But that accepts strings that should be rejected, so use the regular expression to match on"(st|nd|rd|th), ". And then grab the"%Y".
My point is that lots of people seem to think firstly that Haskell isn't suitable for real world tasks, and secondly, that the things Haskell makes you deal with just get in the way. Hopefully, this has illustrated that the things Haskell makes you deal with are useful things, like errors, and failures, and it actually has rather wonderful machinery to prevent you from forgetting about these things which other languages don't. Any what's more, you can use such stuff to your advantage to write small amounts of very reliable code.
I don't really think that Haskell makes it easier to write simple programs. But it does make it quite a lot harder to write faulty programs.