Refactoring to pure code and dealing with exceptions.

Posted on October 31, 2017

Overview

(TL;DR: I use a record of functions & a record of wrapper functions that catch all synchronous exceptions and convert to ExceptT for the pure code)

See also the followup post on using Free as an alternative

This is a quick overview of how I refactored one of my first production haskell application that had a majority of IO code to be more pure. I’ve seen several approaches with different trade offs but none that fit exactly what I was doing 100%.

The design I needed was a pipeline of actions that needs to be performed, where each job in the pipeline is considered user code (think plugin)

Some design notes

Job
1. Can run any IO action and as a result these actions can fail with exceptions
2. I don’t want to constrain what a job can do in any way, i.e. full IO access
3. This is custom code and can fail for any number of reasons, network errors, disk permissions etc
Pipeline
1. Must run impure jobs but be as pure as possible itself
2. Must be able to handle job failure (exceptions)
3. Supports different storage mechanisms, e.g. on disk for local dev vs cloud for production

Both jobs and the pipeline should be testable

Step 1 - Just use IO and refactor later

Jobs

data Job = Job { jobName :: Text
               , jobFn :: Text -> IO Text
               }

A job has a name and an IO action to perform. In this demo a job takes a text value, does some transformation and returns a resulting text value. In a real world application this record would contain more operations and probably be polymorphic

Operations

data Operations = Operations { opRead :: IO Text
                             , opWrite :: Text -> IO ()
                             }

Operations is a record of operations that the pipeline uses to persist job results. I.e. this is the pluggable storage and related functions

I’ve chosen to pass a record of functions rather than using a type class. For discussion on records vs typeclass for this type of design see

A pluggable set of operations feels more natural as a record of functions to me than a typeclass. For me this is because they are just a named set of functions rather than something that has a set of rules (“an algebra”).

Pipeline

runPipeline :: Operations -> Text -> [Job] -> IO Text
runPipeline ops init jobs = do
  opWrite ops init
  id <- foldlM runJob 0 jobs

  putText $ "\nfinal job id = " <> show id
  opRead ops

  where
    runJob (id :: Int) (Job name fn) = do
      putText $ "running job: " <> name

      prev <- opRead ops
      r <- fn prev
      opWrite ops r
      
      putText $ "  = " <> r
      putText "  ----"

      pure $ id + 1

Run each job in order, for each job

Load the last data
Run the job
Store the result
Pass a unique (for the run) id to each step

(Obviously I could simply pass the previous state as part of the fold and do away with the load and store. Its done this way for the demonstration)

Storage

import           Protolude
import qualified Data.Text.IO as Txt

import qualified Step1.Impl1 as I


readFileOp :: FilePath -> IO Text
readFileOp = Txt.readFile


writeFileOp :: FilePath -> Text -> IO ()
writeFileOp = Txt.writeFile


mkFileOps :: FilePath -> I.Operations
mkFileOps p =
  I.Operations { I.opRead = readFileOp p
               , I.opWrite = writeFileOp p
               }

Here the operations are implemented by reading and writing to a file, using Data.Text.IO. The example code in github also has a STM backed storage implementation

Example jobs

job1 :: Text -> IO Text
job1 v = do
  putText "in job1"
  pure $ "1:" <> v

job2 :: Text -> IO Text
job2 v = do
  putText "in job2"
  void . throwIO $ DemoException "oops"
  pure $ "2:" <> v

job3 :: Text -> IO Text
job3 v = do
  putText "in job3"
  pure $ "3:" <> v


newtype DemoException = DemoException Text

instance Show DemoException where
  show (DemoException s) = Txt.unpack s
  
instance Exception DemoException

These are three example jobs. Notice that the second one explicitly throws an exception. A job can explicitly throw an exception like this or it could throw an exception on failure (e.g. network error)

Running

import           Protolude hiding (catch, throwIO)

import qualified Step1.Impl1 as I1
import qualified Step1.Storage1 as S1

main :: IO ()
main = do
  let jobs = [ I1.Job "j1" job1
             , I1.Job "j2" job2
             , I1.Job "j3" job3
             ]
  
  let ops = S1.mkFileOps
  r <- I1.runPipeline ops "0" jobs
  putText r

When run this will fail with an exception in job 2 and result in the application terminating

---------
 Demo1: in memory
---------
running job: j1
in job1
  = 1:0
  ----
running job: j2
in job2
Exception: oops

Step 2 - (Monad m)

So far the example has a working pipeline but everything is in IO and it does nothing about exceptions.

Lets remove some IO constraints.

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}

module Step2.Impl2 ( Operations (..)
                   , Job (..)
                   , runPipeline
                   ) where

import Protolude


data Operations m = Operations { opRead :: m Text
                               , opWrite :: Text -> m ()
                               , opLog :: Text -> m ()
                               }


data Job m = Job { jobName :: Text
                 , jobFn :: Text -> m Text
                 }

runPipeline :: (Monad m) => Operations m -> Text -> [Job m] -> m Text
runPipeline ops init jobs = do
  opWrite ops init
  id <- foldlM runJob 0 jobs

  opLog ops $ "\nfinal job id = " <> show id
  opRead ops

  where
    runJob (id :: Int) (Job name fn) = do
      opLog ops $ "running job: " <> name

      prev <- opRead ops
      r <- fn prev
      opWrite ops r
      
      opLog ops $ "  = " <> r
      opLog ops "  ----"

      pure $ id + 1

Here I’ve removed all explicit IO actions for the pipeline types and functions

opRead

opRead has changed from

opRead :: IO Text

opRead :: m Text

i.e. the type no longer explicitly says IO but now accepts any kind * -> * (e.g. any monad)

Operations

The Operations type has changed from

data Operations = Operations {..}

data Operations m = Operations {..}

runPipeline

runPipeline has changed from

runPipeline :: Operations -> Text -> [Job] -> IO Text

runPipeline :: (Monad m) => Operations m -> Text -> [Job m] -> m Text

There is a constraint saying that m must be a monad. This is so that we can use the monadic type class (pure, >>= etc)

(Monad m) =>

And we use the Operations m and Job m types

Storage

These changes mean that the pipeline and its types no longer require IO

The storage implementations need IO so you specialize the m to IO. Apart from that nothing changes I.e.

readFileOp :: FilePath -> IO Text
readFileOp = Txt.readFile

writeFileOp :: FilePath -> Text -> IO ()
writeFileOp = Txt.writeFile

mkFileOps :: FilePath -> Operations IO
mkFileOps p =
  Operations { opRead = readFileOp p
             , opWrite = writeFileOp p
             , opLog = putText
             }

Notice that I added opLog. It is used for logging rather than calling putText etc which can not be done since there is no IO (or MonadIO constraint)

Running

import           Protolude hiding (catch, throwIO)

import qualified Step2.Storage2 as S2
import qualified Step2.Impl2 as I2


main :: IO ()
main = do
  let jobs = [ I2.Job "j1" job1
             , I2.Job "j2" job2
             , I2.Job "j3" job3
             ]
  
  let ops = S2.mkFileOps
  r <- I2.runPipeline ops "0" jobs
  putText r

Note that we can use the same jobs as we did in step 1 (job1, job2 and job3).

And… so?

This simple change has already resulted in a few nice improvements.

The pipeline code is pure, no IO at all
This means that the pipeline can already be tested as pure code.
The pipeline can be specialized to IO and can run IO jobs or as above kept pure when testing or running non-IO jobs
The types are now compatible with monad transformers since a concrete monad type was not specified
The jobs can be used as is.

That is a pretty good for not much extra work. However I have done nothing about exceptions and when this code runs it still breaks with an exception as it did before.

Step 3 - Exceptions

If you have not seen the Exceptions best practices in haskell article, I think its worth looking at before continuing. The two points from the article I want to address are

Mixing ExceptT and exceptions mean you have to deal with multiple failure modes
When you are using IO anything can fail and throw an exception

Not only must we accept that jobs can fail with exceptions but we should treat this as normal. I.e. when someone is writing a IO job they should be able to throw exceptions and the pipeline should handle them.

However I don’t want the pure code to have to deal with exceptions, I’d much rather deal with an Either type there. Which may seem to contradict point 1 from the article above. But this is not the case, remember that jobs perform IO and thus should deal with exceptions and the pipeline is pure and should not, so ExceptT/Either is fine.

And so we need something to bridge the gap between the two worlds.

The wrapper type

data OpsError = ErrRead Text
              | ErrWrite Text
              | ErrLogging Text
              | ErrRunning Text
              deriving (Show, Eq)

data OperationsWrapper m = OperationsWrapper { opRead :: ExceptT OpsError m Text
                                             , opWrite :: Text -> ExceptT OpsError m ()
                                             , opLog :: Text -> ExceptT OpsError m ()
                                             , opRun :: (Text -> m Text) -> Text -> ExceptT OpsError m Text
                                             }

OperationsWrapper is a record with a wrapper function for each function from the Operations record. Each function will run the corresponding Operation function, catch any synchronous exception and covert it into an ExceptT transformer type. (See the section on catching all exceptions below)

The code for the wrapper is not terribly pretty, but I feel its a fair price to pay for separating the two concerns without resorting to any higher level magic (that I’m not comfortable with yet)

mkOpsWrapper :: (MonadCatch m) => I2.Operations m -> OperationsWrapper m
mkOpsWrapper o =
  OperationsWrapper { opRead = E.ExceptT ((Right <$> I2.opRead o) `catch` readError)
                    , opWrite = \t -> E.ExceptT ((Right <$> I2.opWrite o t) `catch` writeError)
                    , opLog = \t -> E.ExceptT ((Right <$> I2.opLog o t) `catch` logError)
                    , opRun = \fn t -> E.ExceptT ((Right <$> fn t) `catch` logError)
                    }
  where
    readError :: (Monad m) => SomeException -> m (Either OpsError b)
    readError e = 
      pure . Left . ErrRead $ "Error reading: " <> show e
    
    writeError :: (Monad m) => SomeException -> m (Either OpsError b)
    writeError e = 
      pure . Left . ErrWrite $ "Error writing: " <> show e
    
    logError :: (Monad m) => SomeException -> m (Either OpsError b)
    logError e = 
      pure . Left . ErrLogging $ "Error logging: " <> show e

Each function works as follows

Call the “wrapped” corresponding function and on success return the result as a Right value
```
Right <$> I2.opRead o
```

Catch any exception and return it as a Left OpsError

... `catch` readError

readError :: (Monad m) => SomeException -> m (Either OpsError b)
readError e = 
  pure . Left . ErrRead $ "Error reading: " <> show e

Pipeline using the wrapper

runPipeline :: (Monad m) => OperationsWrapper m -> Text -> [I2.Job m] -> m (Either OpsError Text)
runPipeline ops init jobs = runExceptT $ do
  opWrite ops init
  id <- foldlM runJob 0 jobs

  opLog ops $ "\nfinal job id = " <> show id
  opRead ops

  where
    runJob (id :: Int) (I2.Job name fn) = do
      opLog ops $ "running job: " <> name

      prev <- opRead ops
      r <- opRun ops fn prev -- don't just lift, use opRun
      opWrite ops r
  
      opLog ops $ "  = " <> r
      opLog ops "  ----"

      pure $ id + 1

The changes from Step2 are

The result type is an Either. I.e. failure is now explicit
Its using OperationsWrapper not Operations
It is using the ExceptT monad, so runExceptT is used
Each function from the wrapper will abort the monad if it returns a Left (i.e. if the wrapped function throws)

Also notice that I added a opRun function to wrap the running of the job. If you just lifted the job’s run function, then the exception would not be handled. So we need a wrapper function for this as well.

Running

import           Protolude

import qualified Step2.Storage2 as S2
import qualified Step2.Impl2 as I2
import qualified Step3.Impl3 as I3


main :: IO ()
main = do
  let jobs = [ I2.Job "j1" job1
             , I2.Job "j2" job2
             , I2.Job "j3" job3
             ]
  
  let ops = S2.mkFileOps

  r <- I3.runPipeline (I3.mkOpsWrapper ops) "0" jobs

  case r of
    Right x -> putText $ "Success: " <> x
    Left e -> putText $ "Exception: " <> show e

When run this will catch the exception in job 2 and correctly report the error, i.e. no runtime failure

 Demo3: use file
---------
running job: j1
in job1
  = 1:0
  ----
running job: j2
in job2
Exception: ErrLogging "Error logging: oops"

Testing

For completeness here is an example of how the pipeline can be tested using only pure code. I’m using the State monad (which you may not want to use in production but for my test and this example its just fine). Skip this the section if you are not interested in the tests

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}

module PipelineSpec where

import           Protolude 
import           Test.Hspec
import qualified Control.Monad.Except as E
import qualified Control.Monad.State.Strict as S

import qualified Step2.Impl2 as I2
import qualified Step3.Impl3 as I3

spec :: Spec
spec = do
  describe "simple pipeline" $ do
    it "should run in correct order" $ do
      let jobs = [ I2.Job "j1" job1
                 , I2.Job "j2" job2
                 ]
      
      let (r, _) = S.runState (testPipeline jobs "0") ""
      r `shouldBe` (Right "2:1:0")



testPipeline :: [I2.Job (S.State Text)] -> Text -> S.State Text (Either I3.OpsError Text)
testPipeline jobs initial = do
  let ops = I3.OperationsWrapper { I3.opRead = E.ExceptT $ do
                                     r <- get
                                     pure . Right $ r

                                 , I3.opWrite = \t -> E.ExceptT $ do
                                     put $ t
                                     pure . Right $ ()

                                 , I3.opRun = \fn t -> E.ExceptT $ do
                                     r <- fn t
                                     pure . Right $ r

                                 , I3.opLog = \t -> E.ExceptT . pure . Right $ ()
                                 }

  I3.runPipeline ops initial jobs


job1 :: Text -> (S.State Text) Text
job1 v = pure $ "1:" <> v

job2 :: Text -> (S.State Text) Text
job2 v = pure $ "2:" <> v

Notes on exceptions

Catching all exceptions

Catching all exceptions is generally considered to be a bad idea. See for example the docs for Control.Exception.

For a thorough discussion of alternatives see Catching all exceptions from www.schoolofhaskell.com

Safe-exceptions

In this example I am using the safe-exceptions package. The catch function comes from Control.Exception.Safe not Control.Exception so only synchronous exceptions are caught. Take a look at the source code to see the cabal file and the explicit imports from Control.Exception.Safe

Using async

Another approach that @thumphriees pointed out to me on twitter (and is discussed in the “Catching All Exceptions” article) is to use the async library to help with exceptions. This is the approach I’ll probably be using with the production version of this code as it for almost no extra cost gives me simple timeout and cancellation control.

As you can see from the code below this is a pretty simple to use

demoAsyncCatch :: IO ()
demoAsyncCatch = do
  r <- async jobBad >>= waitCatch

  case r of
    Right _ -> putText "demo async - Right"
    Left e -> putText $ "demo async - Left: " <> show e

  where
    jobBad = do
      putText "in jobBad"
      void . throwIO $ DemoException "oops"

Conclusion

The wrapper record separates the pure and effectful worlds and converts all synchronous IO exceptions into Eithers.
Passing around a record of functions is a alternative to using a typeclass and I believe makes more sense in many cases.
It is not that hard to refactor much of the impure code from a codebase

The obvious downside to this method is that you end up with boilerplate code for the wrapper. You’ll have to decide on if this is a problem for you or not. For me this is working really well so far.

Refactoring to pure code and dealing with exceptions.

Overview

Step 1 - Just use IO and refactor later

Jobs

Operations

Pipeline

Storage

Example jobs

Running

Step 2 - (Monad m)

opRead

Operations

runPipeline

Storage

Running

And… so?

Step 3 - Exceptions

The wrapper type

Pipeline using the wrapper

Running

Testing

Notes on exceptions

Catching all exceptions

Safe-exceptions

Using async

Conclusion

Links