@*Small scripts with Haskell.
@qNormally@> I give blog posts detailing the fun, interesting or advanced
stuff I do with Haskell. But that isn't a real representation of my
programming life! Most of the time I am doing small scripts that do
little tasks, so I thought I'd describe one of those. This post is
a literate program, which means you save the contents as a
.hweb file from which you generate .shtml and .hs files.
@h
@*.
The task I had to complete was to take a directory of files, and, for
each file foo.txt, generate the files foo_m1.txt
to foo_m3.txt, where each one file is a block
of lines from the original delimited by a blank line i.e. given the
file with lines ["", "1", "1", "", "2", "", "3"], the
numbers "1" would go in foo_m1.txt.txt etc.
@h
@*.
This blog post isn't how I actually wrote the original script - I
didn't use literate Haskell (since I find it ugly), I didn't give
explicit import lists (since they are needlessly verbose), I didn't
give type signatures (but I should have) and I didn't split the IO
and non-IO as well as (but again, I should have). It is intended as
a guide to the simple things you can easily do with Haskell. Now on
to the code...
@h
@*.
Every Haskell program starts with main, which is an IO
action. For this program, we are going to keep all the IO in main
, and otherwise only use functions. With most file processing
applications it's best to read files from one directory, and write
them to another. That way, if anything goes wrong, it is usually easy
to recover. In this case we read from "data" and write to "res".
@h
@
main :: IO ()
main = do
@files to be list of files in the directory "data"@>
@files which has the extension ".txt"@>
@*.
@=
import System.Directory(getDirectoryContents)
@*.
@files to be list of files in the directory "data"@>=
files <- getDirectoryContents "data"
@*.
@=
import Control.Monad
import System.FilePath(takeExtension, dropExtension, (<.>), (>))
@*.
@files which has the extension ".txt"@>=
forM_ files $ \file -> when (takeExtension file == ".txt") $ do
@src to be the result of reading the file@>
@
@*.
@src to be the result of reading the file@>=
src <- readFile $ "data" > file
@*.
@=
forM_ (zip [1..] (split_file src)) $ \(i, x) ->
@
@*.
@=
writeFile ("res" > dropExtension file ++ "_m" ++ show i <.> "txt") x
@*. We can now move on to the functional bits left over. We want a
function split_file that that takes a string and splits
it in three chunks for each of the blocks in the string. When
processing text, often there will be stray blank lines, and the term
"blank lines" will also apply to lines consisting only spaces. The
code is:
@h
split_file :: String -> [String]
split_file xs =
let
@
@
@
@
@
@
in
@
@*.
@=
as = lines xs
@*.
@=
import Data.Char(isSpace)
@*.
@=
bs = map (dropWhile isSpace) as
@*.
@=
cs = dropWhile null bs
@*.
@=
(s1, _:rest) = break null cs
@*.
@=
ds = dropWhile null rest
@*.
@=
(s2,_:s3) = break null ds
@*.
@=
map (tabify . unlines) [s1,s2,s3]
@*.
The tabify requirement was added after. The person decided that all
continuous runs of spaces should be converted to tabs, so the file
could better be loaded into a spreadsheet. Easy enough to add, just
a simple bit of recursive programming:
@h
tabify xs =
case xs of
@
@
@
@*.
@=
[] -> []
@*.
@=
(' ':ys) -> '\t' : tabify (dropWhile (== ' ') ys)
@*.
@=
(y:ys) -> y : tabify ys
@*.
Haskell is a great language for writing shorts, and as the libraries
improve it just keeps getting better.
@h