Haskell shell applications techniques

Posted on September 20, 2016

Haskell terminal applications

This is part one in a two part blog series about haskell terminal applications. In this blog I’ll cover some techniques for writing a haskell application that behaves well as a shell application. In part two I’ll show a simple text classification implementation using these techniques.

Interacting with the terminal

Parsing command line arguments

OptParse-generic makes parsing command line arguments easy. Doing this manually is tedious and not terribly interesting so its great to have a simple library that handles this well.

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}

data Arguments = Arguments {train :: Text <?> "Path to training data"
                           ,input :: Maybe Text <?> "Input file to categorise. If missing stdin will be used"
                           ,parser :: Maybe Text <?> "Parser type, defaults to lines. Options are lines/detail/csv"
                           ,popts :: Maybe Text <?> "Parser options"
                           ,clean :: Maybe Text <?> "Options name of text cleaner - see docs"
                           } deriving (Generic, Show)

instance ParseRecord Arguments

getArgs :: IO Arguments
getArgs = do
  args <- getRecord "Your app name here."
  pure (args :: Arguments)

The <?> operator here lets you specify help text for each argument. Running your app with –help will print the help message using this text

Input from stdin or a file

It is often useful to allow terminal apps to get their input data either from an input file or have it piped to the app (stdin). System.IO defines a set of functions for reading and writing IO that all take an explicit handle. For example hGetLine

hGetLine :: Handle -> IO String

System.IO also defines the stdin, stdout and stderr standard IO handles.

This means that you can pass either stdin or a file handle to hGetLine and it will work the same.

In the example arguments above I’ve allowed the user to specify an input file by using the –input option. If that is missing stdin is used.

handle <- case unHelpful $ input args of
            Just t -> 
              openFile (Txt.unpack t) ReadMode
            Nothing ->
               pure stdin

Notice that unHelpful is called to get the value from a record field with a description

While you could use the parsed data directly I chose to rather create another record populated from the parsed command line arguments. This allows me to have names tailored for command line in one record and for my code in another. In this interpreted record I also store the handle to use for input

getOptions :: IO Options
getOptions = do
  args <- getRecord "Your app name here."
  hin_ <- case unHelpful $ input args of
             Just t -> 
               openFile (Txt.unpack t) ReadMode
             Nothing ->
                pure stdin

  pure Options {trainingPath = unHelpful (train args)
               ,parserType = fromMaybe "lines" $ unHelpful (parser args)
               ,parserOptions = unHelpful (popts args)
               ,hout = stdout
               ,hin = hin_
               }

Processing a line at a time

It is often desirable to allow terminal applications to process and respond to a single line of data at a time (e.g. sed). There are several ways to do this in haskell. One of the simplest it to use whileM_ and check for EOF.

import System.IO

--where inputH is the input handle stdin or a file as above
hSetBuffering inputH LineBuffering

whileM_ (not <$> hIsEOF inputH) $ do
  line <- hGetLine inputH
  --

Using another terminal app for processing

Using an existing terminal application is a quick way to leverage existing functionality. For example you may want to use sed to manipulate some text. This is fairly strait-forward in haskell.

import System.IO
import System.Process

(Just inp, Just outp, _, phandle) <- createProcess (proc "command_name_here" []) { std_out = CreatePipe, std_in = CreatePipe }
hSetBuffering outp NoBuffering
hSetBuffering inp LineBuffering

You now have and an input (inp) and output (outp) handle for the application. If the application supports line at a time input from stdin you can simply write your data and read the result back. Alternatively you may need to write the entire contents and wait for a result.

See part two to see these techniques in use in a text classification application

Links

  1. OptParse-generic
  2. Stack.
  3. Protolude
  4. Haskell Programming from first principles.