0. Small scripts with Haskell. I give blog posts detailing the fun, interesting or advanced stuff I do with Haskell. But that isn't a real representation of my programming life! Most of the time I am doing small scripts that do little tasks, so I thought I'd describe one of those. This post is a literate program, which means you save the contents as a .hweb file from which you generate .shtml and .hs files.

1. The task I had to complete was to take a directory of files, and, for each file foo.txt, generate the files foo_m1.txt to foo_m3.txt, where each one file is a block of lines from the original delimited by a blank line i.e. given the file with lines ["", "1", "1", "", "2", "", "3"], the numbers "1" would go in foo_m1.txt.txt etc.

2. This blog post isn't how I actually wrote the original script - I didn't use literate Haskell (since I find it ugly), I didn't give explicit import lists (since they are needlessly verbose), I didn't give type signatures (but I should have) and I didn't split the IO and non-IO as well as (but again, I should have). It is intended as a guide to the simple things you can easily do with Haskell. Now on to the code...

3. Every Haskell program starts with main, which is an IO action. For this program, we are going to keep all the IO in main, and otherwise only use functions. With most file processing applications it's best to read files from one directory, and write them to another. That way, if anything goes wrong, it is usually easy to recover. In this case we read from "data" and write to "res".

Needed libraries
main :: IO ()
main = do
    Set files to be list of files in the directory "data"
    Process each file in files which has the extension ".txt"

Needed libraries:

import System.Directory(getDirectoryContents)

Set `files` to be list of files in the directory "data":

    files <- getDirectoryContents "data"

Needed libraries:

import Control.Monad
import System.FilePath(takeExtension, dropExtension, (<.>), ())

Process each file in `files` which has the extension ".txt":

    forM_ files $ \file -> when (takeExtension file == ".txt") $ do
        Set src to be the result of reading the file
        Process each numbered result of reading the file

Set `src` to be the result of reading the file:

        src <- readFile $ "data"  file

Process each numbered result of reading the file:

        forM_ (zip [1..] (split_file src)) $ \(i, x) ->
            Write out the value to "res/file_m#.txt" where # is the 1-based index into the results list

10.

Write out the value to "res/file_m#.txt" where # is the 1-based index into the results list:

            writeFile ("res"  dropExtension file ++ "_m" ++ show i <.> "txt") x

11. We can now move on to the functional bits left over. We want a function split_file that that takes a string and splits it in three chunks for each of the blocks in the string. When processing text, often there will be stray blank lines, and the term "blank lines" will also apply to lines consisting only spaces. The code is:

split_file :: String -> [String]
split_file xs = 
    let 
        Split the text into lines
        Drop all leading spaces from each line
        Drop all leading blank lines
        Break on the first empty line, the bits before are chunk 1
        Drop all leading blank lines for the rest
        Break on the first empty line in the rest, before is chunk 2, after is chunk 3
    in
        Put the lines of each chunk back together, and tabify them

12.

Split the text into lines:

        as = lines xs

13.

Needed libraries:

import Data.Char(isSpace)

14.

Drop all leading spaces from each line:

        bs = map (dropWhile isSpace) as

15.

Drop all leading blank lines:

        cs = dropWhile null bs

16.

Break on the first empty line, the bits before are chunk 1:

        (s1, _:rest) = break null cs

17.

Drop all leading blank lines for the rest:

        ds = dropWhile null rest

18.

Break on the first empty line in the rest, before is chunk 2, after is chunk 3:

        (s2,_:s3) = break null ds

19.

Put the lines of each chunk back together, and tabify them:

        map (tabify . unlines) [s1,s2,s3]

20. The tabify requirement was added after. The person decided that all continuous runs of spaces should be converted to tabs, so the file could better be loaded into a spreadsheet. Easy enough to add, just a simple bit of recursive programming:

tabify xs = 
    case xs of
        If the list is empty, then we're done
        If you encounter a space, drop it and all successive spaces, and write out a tab
        Otherwise just continue onwards

21.

If the list is empty, then we're done:

        [] -> []

22.

If you encounter a space, drop it and all successive spaces, and write out a tab:

        (' ':ys) -> '\t' : tabify (dropWhile (== ' ') ys)

23.

Otherwise just continue onwards:

        (y:ys) -> y : tabify ys

24. Haskell is a great language for writing shorts, and as the libraries improve it just keeps getting better.

25. Names of the sections.
Needed libraries
Set files to be list of files in the directory "data"
Process each file in files which has the extension ".txt"
Set src to be the result of reading the file
Process each numbered result of reading the file
Write out the value to "res/file_m#.txt" where # is the 1-based index into the results list
Split the text into lines
Drop all leading spaces from each line
Drop all leading blank lines
Break on the first empty line, the bits before are chunk 1
Drop all leading blank lines for the rest
Break on the first empty line in the rest, before is chunk 2, after is chunk 3
Put the lines of each chunk back together, and tabify them
If the list is empty, then we're done
If you encounter a space, drop it and all successive spaces, and write out a tab
Otherwise just continue onwards

Needed libraries:

Set files to be list of files in the directory "data":

Needed libraries:

Process each file in files which has the extension ".txt":

Set src to be the result of reading the file:

Process each numbered result of reading the file:

Write out the value to "res/file_m#.txt" where # is the 1-based index into the results list:

Split the text into lines:

Needed libraries:

Drop all leading spaces from each line:

Drop all leading blank lines:

Break on the first empty line, the bits before are chunk 1:

Drop all leading blank lines for the rest:

Break on the first empty line in the rest, before is chunk 2, after is chunk 3:

Put the lines of each chunk back together, and tabify them:

If the list is empty, then we're done:

If you encounter a space, drop it and all successive spaces, and write out a tab:

Otherwise just continue onwards:

Set `files` to be list of files in the directory "data":

Process each file in `files` which has the extension ".txt":

Set `src` to be the result of reading the file: