Haskell shell applications techniques
Haskell terminal applications
This is part one in a two part blog series about haskell terminal applications. In this blog I’ll cover some techniques for writing a haskell application that behaves well as a shell application. In part two I’ll show a simple text classification implementation using these techniques.
Interacting with the terminal
Parsing command line arguments
OptParse-generic makes parsing command line arguments easy. Doing this manually is tedious and not terribly interesting so its great to have a simple library that handles this well.
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE DataKinds #-}
{-# LANGUAGE TypeOperators #-}
data Arguments = Arguments {train :: Text <?> "Path to training data"
,input :: Maybe Text <?> "Input file to categorise. If missing stdin will be used"
,parser :: Maybe Text <?> "Parser type, defaults to lines. Options are lines/detail/csv"
,popts :: Maybe Text <?> "Parser options"
,clean :: Maybe Text <?> "Options name of text cleaner - see docs"
} deriving (Generic, Show)
instance ParseRecord Arguments
getArgs :: IO Arguments
getArgs = do
args <- getRecord "Your app name here."
pure (args :: Arguments)
The <?>
operator here lets you specify help text for each argument. Running your app with –help will print the help message using this text
Usage: appName --train STRING [--input STRING] [--parser TEXT] [--popts TEXT]
[--clean TEXT]
Available options:
-h,--help Show this help text
--train TEXT Path to training data
--input TEXT Input file to categorise. If missing stdin will be
used
--parser TEXT Parser type, defaults to lines. Options are
lines/detail/csv
--popts TEXT Parser options
--clean TEXT Options name of text cleaner - see docs
Input from stdin or a file
It is often useful to allow terminal apps to get their input data either from an input file or have it piped to the app (stdin). System.IO
defines a set of functions for reading and writing IO that all take an explicit handle. For example hGetLine
System.IO also defines the stdin, stdout and stderr standard IO handles.
This means that you can pass either stdin or a file handle to hGetLine and it will work the same.
In the example arguments above I’ve allowed the user to specify an input file by using the –input option. If that is missing stdin is used.
handle <- case unHelpful $ input args of
Just t ->
openFile (Txt.unpack t) ReadMode
Nothing ->
pure stdin
Notice that unHelpful
is called to get the value from a record field with a description
While you could use the parsed data directly I chose to rather create another record populated from the parsed command line arguments. This allows me to have names tailored for command line in one record and for my code in another. In this interpreted record I also store the handle to use for input
getOptions :: IO Options
getOptions = do
args <- getRecord "Your app name here."
hin_ <- case unHelpful $ input args of
Just t ->
openFile (Txt.unpack t) ReadMode
Nothing ->
pure stdin
pure Options {trainingPath = unHelpful (train args)
,parserType = fromMaybe "lines" $ unHelpful (parser args)
,parserOptions = unHelpful (popts args)
,hout = stdout
,hin = hin_
}
Processing a line at a time
It is often desirable to allow terminal applications to process and respond to a single line of data at a time (e.g. sed). There are several ways to do this in haskell. One of the simplest it to use whileM_
and check for EOF.
import System.IO
--where inputH is the input handle stdin or a file as above
hSetBuffering inputH LineBuffering
whileM_ (not <$> hIsEOF inputH) $ do
line <- hGetLine inputH
--
Using another terminal app for processing
Using an existing terminal application is a quick way to leverage existing functionality. For example you may want to use sed to manipulate some text. This is fairly strait-forward in haskell.
import System.IO
import System.Process
(Just inp, Just outp, _, phandle) <- createProcess (proc "command_name_here" []) { std_out = CreatePipe, std_in = CreatePipe }
hSetBuffering outp NoBuffering
hSetBuffering inp LineBuffering
You now have and an input (inp) and output (outp) handle for the application. If the application supports line at a time input from stdin you can simply write your data and read the result back. Alternatively you may need to write the entire contents and wait for a result.
See part two to see these techniques in use in a text classification application