Refactoring to pure code and dealing with exceptions.
Overview
(TL;DR: I use a record of functions & a record of wrapper functions that catch all synchronous exceptions and convert to ExceptT for the pure code)
See also the followup post on using Free as an alternative
This is a quick overview of how I refactored one of my first production haskell application that had a majority of IO code to be more pure. I’ve seen several approaches with different trade offs but none that fit exactly what I was doing 100%.
The design I needed was a pipeline of actions that needs to be performed, where each job in the pipeline is considered user code (think plugin)
Some design notes
- Job
- Can run any IO action and as a result these actions can fail with exceptions
- I don’t want to constrain what a job can do in any way, i.e. full IO access
- This is custom code and can fail for any number of reasons, network errors, disk permissions etc
- Pipeline
- Must run impure jobs but be as pure as possible itself
- Must be able to handle job failure (exceptions)
- Supports different storage mechanisms, e.g. on disk for local dev vs cloud for production
Both jobs and the pipeline should be testable
Step 1 - Just use IO and refactor later
Jobs
A job has a name and an IO action to perform. In this demo a job takes a text value, does some transformation and returns a resulting text value. In a real world application this record would contain more operations and probably be polymorphic
Operations
Operations is a record of operations that the pipeline uses to persist job results. I.e. this is the pluggable storage and related functions
I’ve chosen to pass a record of functions rather than using a type class. For discussion on records vs typeclass for this type of design see
- ReaderT design pattern
- mtl-style-example: A small, self-contained example of using mtl style to unit test effectful code in a pure way
- Java interfaces map to Haskell records
A pluggable set of operations feels more natural as a record of functions to me than a typeclass. For me this is because they are just a named set of functions rather than something that has a set of rules (“an algebra”).
Pipeline
runPipeline :: Operations -> Text -> [Job] -> IO Text
runPipeline ops init jobs = do
opWrite ops init
id <- foldlM runJob 0 jobs
putText $ "\nfinal job id = " <> show id
opRead ops
where
runJob (id :: Int) (Job name fn) = do
putText $ "running job: " <> name
prev <- opRead ops
r <- fn prev
opWrite ops r
putText $ " = " <> r
putText " ----"
pure $ id + 1
Run each job in order, for each job
- Load the last data
- Run the job
- Store the result
- Pass a unique (for the run) id to each step
(Obviously I could simply pass the previous state as part of the fold and do away with the load and store. Its done this way for the demonstration)
Storage
import Protolude
import qualified Data.Text.IO as Txt
import qualified Step1.Impl1 as I
readFileOp :: FilePath -> IO Text
readFileOp = Txt.readFile
writeFileOp :: FilePath -> Text -> IO ()
writeFileOp = Txt.writeFile
mkFileOps :: FilePath -> I.Operations
mkFileOps p =
I.Operations { I.opRead = readFileOp p
, I.opWrite = writeFileOp p
}
Here the operations are implemented by reading and writing to a file, using Data.Text.IO. The example code in github also has a STM backed storage implementation
Example jobs
job1 :: Text -> IO Text
job1 v = do
putText "in job1"
pure $ "1:" <> v
job2 :: Text -> IO Text
job2 v = do
putText "in job2"
void . throwIO $ DemoException "oops"
pure $ "2:" <> v
job3 :: Text -> IO Text
job3 v = do
putText "in job3"
pure $ "3:" <> v
newtype DemoException = DemoException Text
instance Show DemoException where
show (DemoException s) = Txt.unpack s
instance Exception DemoException
These are three example jobs. Notice that the second one explicitly throws an exception. A job can explicitly throw an exception like this or it could throw an exception on failure (e.g. network error)
Running
import Protolude hiding (catch, throwIO)
import qualified Step1.Impl1 as I1
import qualified Step1.Storage1 as S1
main :: IO ()
main = do
let jobs = [ I1.Job "j1" job1
, I1.Job "j2" job2
, I1.Job "j3" job3
]
let ops = S1.mkFileOps
r <- I1.runPipeline ops "0" jobs
putText r
When run this will fail with an exception in job 2 and result in the application terminating
---------
Demo1: in memory
---------
running job: j1
in job1
= 1:0
----
running job: j2
in job2
Exception: oops
Step 2 - (Monad m)
So far the example has a working pipeline but everything is in IO and it does nothing about exceptions.
Lets remove some IO constraints.
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE ScopedTypeVariables #-}
module Step2.Impl2 ( Operations (..)
, Job (..)
, runPipeline
) where
import Protolude
data Operations m = Operations { opRead :: m Text
, opWrite :: Text -> m ()
, opLog :: Text -> m ()
}
data Job m = Job { jobName :: Text
, jobFn :: Text -> m Text
}
runPipeline :: (Monad m) => Operations m -> Text -> [Job m] -> m Text
runPipeline ops init jobs = do
opWrite ops init
id <- foldlM runJob 0 jobs
opLog ops $ "\nfinal job id = " <> show id
opRead ops
where
runJob (id :: Int) (Job name fn) = do
opLog ops $ "running job: " <> name
prev <- opRead ops
r <- fn prev
opWrite ops r
opLog ops $ " = " <> r
opLog ops " ----"
pure $ id + 1
Here I’ve removed all explicit IO actions for the pipeline types and functions
opRead
opRead has changed from
to
i.e. the type no longer explicitly says IO but now accepts any kind * -> * (e.g. any monad)
Operations
The Operations type has changed from
to
runPipeline
runPipeline has changed from
to
There is a constraint saying that m must be a monad. This is so that we can use the monadic type class (pure, >>= etc)
And we use the Operations m and Job m types
Storage
These changes mean that the pipeline and its types no longer require IO
The storage implementations need IO so you specialize the m to IO. Apart from that nothing changes I.e.
readFileOp :: FilePath -> IO Text
readFileOp = Txt.readFile
writeFileOp :: FilePath -> Text -> IO ()
writeFileOp = Txt.writeFile
mkFileOps :: FilePath -> Operations IO
mkFileOps p =
Operations { opRead = readFileOp p
, opWrite = writeFileOp p
, opLog = putText
}
Notice that I added opLog. It is used for logging rather than calling putText etc which can not be done since there is no IO (or MonadIO constraint)
Running
import Protolude hiding (catch, throwIO)
import qualified Step2.Storage2 as S2
import qualified Step2.Impl2 as I2
main :: IO ()
main = do
let jobs = [ I2.Job "j1" job1
, I2.Job "j2" job2
, I2.Job "j3" job3
]
let ops = S2.mkFileOps
r <- I2.runPipeline ops "0" jobs
putText r
Note that we can use the same jobs as we did in step 1 (job1, job2 and job3).
And… so?
This simple change has already resulted in a few nice improvements.
- The pipeline code is pure, no IO at all
- This means that the pipeline can already be tested as pure code.
- The pipeline can be specialized to IO and can run IO jobs or as above kept pure when testing or running non-IO jobs
- The types are now compatible with monad transformers since a concrete monad type was not specified
- The jobs can be used as is.
That is a pretty good for not much extra work. However I have done nothing about exceptions and when this code runs it still breaks with an exception as it did before.
Step 3 - Exceptions
If you have not seen the Exceptions best practices in haskell article, I think its worth looking at before continuing. The two points from the article I want to address are
- Mixing ExceptT and exceptions mean you have to deal with multiple failure modes
- When you are using IO anything can fail and throw an exception
Not only must we accept that jobs can fail with exceptions but we should treat this as normal. I.e. when someone is writing a IO job they should be able to throw exceptions and the pipeline should handle them.
However I don’t want the pure code to have to deal with exceptions, I’d much rather deal with an Either type there. Which may seem to contradict point 1 from the article above. But this is not the case, remember that jobs perform IO and thus should deal with exceptions and the pipeline is pure and should not, so ExceptT/Either is fine.
And so we need something to bridge the gap between the two worlds.
The wrapper type
data OpsError = ErrRead Text
| ErrWrite Text
| ErrLogging Text
| ErrRunning Text
deriving (Show, Eq)
data OperationsWrapper m = OperationsWrapper { opRead :: ExceptT OpsError m Text
, opWrite :: Text -> ExceptT OpsError m ()
, opLog :: Text -> ExceptT OpsError m ()
, opRun :: (Text -> m Text) -> Text -> ExceptT OpsError m Text
}
OperationsWrapper is a record with a wrapper function for each function from the Operations record. Each function will run the corresponding Operation function, catch any synchronous exception and covert it into an ExceptT transformer type. (See the section on catching all exceptions below)
The code for the wrapper is not terribly pretty, but I feel its a fair price to pay for separating the two concerns without resorting to any higher level magic (that I’m not comfortable with yet)
mkOpsWrapper :: (MonadCatch m) => I2.Operations m -> OperationsWrapper m
mkOpsWrapper o =
OperationsWrapper { opRead = E.ExceptT ((Right <$> I2.opRead o) `catch` readError)
, opWrite = \t -> E.ExceptT ((Right <$> I2.opWrite o t) `catch` writeError)
, opLog = \t -> E.ExceptT ((Right <$> I2.opLog o t) `catch` logError)
, opRun = \fn t -> E.ExceptT ((Right <$> fn t) `catch` logError)
}
where
readError :: (Monad m) => SomeException -> m (Either OpsError b)
readError e =
pure . Left . ErrRead $ "Error reading: " <> show e
writeError :: (Monad m) => SomeException -> m (Either OpsError b)
writeError e =
pure . Left . ErrWrite $ "Error writing: " <> show e
logError :: (Monad m) => SomeException -> m (Either OpsError b)
logError e =
pure . Left . ErrLogging $ "Error logging: " <> show e
Each function works as follows
Call the “wrapped” corresponding function and on success return the result as a Right value
Catch any exception and return it as a Left OpsError
Pipeline using the wrapper
runPipeline :: (Monad m) => OperationsWrapper m -> Text -> [I2.Job m] -> m (Either OpsError Text)
runPipeline ops init jobs = runExceptT $ do
opWrite ops init
id <- foldlM runJob 0 jobs
opLog ops $ "\nfinal job id = " <> show id
opRead ops
where
runJob (id :: Int) (I2.Job name fn) = do
opLog ops $ "running job: " <> name
prev <- opRead ops
r <- opRun ops fn prev -- don't just lift, use opRun
opWrite ops r
opLog ops $ " = " <> r
opLog ops " ----"
pure $ id + 1
The changes from Step2 are
- The result type is an Either. I.e. failure is now explicit
- Its using OperationsWrapper not Operations
- It is using the ExceptT monad, so runExceptT is used
- Each function from the wrapper will abort the monad if it returns a Left (i.e. if the wrapped function throws)
Also notice that I added a opRun function to wrap the running of the job. If you just lifted the job’s run function, then the exception would not be handled. So we need a wrapper function for this as well.
Running
import Protolude
import qualified Step2.Storage2 as S2
import qualified Step2.Impl2 as I2
import qualified Step3.Impl3 as I3
main :: IO ()
main = do
let jobs = [ I2.Job "j1" job1
, I2.Job "j2" job2
, I2.Job "j3" job3
]
let ops = S2.mkFileOps
r <- I3.runPipeline (I3.mkOpsWrapper ops) "0" jobs
case r of
Right x -> putText $ "Success: " <> x
Left e -> putText $ "Exception: " <> show e
When run this will catch the exception in job 2 and correctly report the error, i.e. no runtime failure
Demo3: use file
---------
running job: j1
in job1
= 1:0
----
running job: j2
in job2
Exception: ErrLogging "Error logging: oops"
Testing
For completeness here is an example of how the pipeline can be tested using only pure code. I’m using the State monad (which you may not want to use in production but for my test and this example its just fine). Skip this the section if you are not interested in the tests
{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
module PipelineSpec where
import Protolude
import Test.Hspec
import qualified Control.Monad.Except as E
import qualified Control.Monad.State.Strict as S
import qualified Step2.Impl2 as I2
import qualified Step3.Impl3 as I3
spec :: Spec
spec = do
describe "simple pipeline" $ do
it "should run in correct order" $ do
let jobs = [ I2.Job "j1" job1
, I2.Job "j2" job2
]
let (r, _) = S.runState (testPipeline jobs "0") ""
r `shouldBe` (Right "2:1:0")
testPipeline :: [I2.Job (S.State Text)] -> Text -> S.State Text (Either I3.OpsError Text)
testPipeline jobs initial = do
let ops = I3.OperationsWrapper { I3.opRead = E.ExceptT $ do
r <- get
pure . Right $ r
, I3.opWrite = \t -> E.ExceptT $ do
put $ t
pure . Right $ ()
, I3.opRun = \fn t -> E.ExceptT $ do
r <- fn t
pure . Right $ r
, I3.opLog = \t -> E.ExceptT . pure . Right $ ()
}
I3.runPipeline ops initial jobs
job1 :: Text -> (S.State Text) Text
job1 v = pure $ "1:" <> v
job2 :: Text -> (S.State Text) Text
job2 v = pure $ "2:" <> v
Notes on exceptions
Catching all exceptions
Catching all exceptions is generally considered to be a bad idea. See for example the docs for Control.Exception.
For a thorough discussion of alternatives see Catching all exceptions from www.schoolofhaskell.com
Safe-exceptions
In this example I am using the safe-exceptions package. The catch function comes from Control.Exception.Safe not Control.Exception so only synchronous exceptions are caught. Take a look at the source code to see the cabal file and the explicit imports from Control.Exception.Safe
Using async
Another approach that @thumphriees pointed out to me on twitter (and is discussed in the “Catching All Exceptions” article) is to use the async library to help with exceptions. This is the approach I’ll probably be using with the production version of this code as it for almost no extra cost gives me simple timeout and cancellation control.
As you can see from the code below this is a pretty simple to use
demoAsyncCatch :: IO ()
demoAsyncCatch = do
r <- async jobBad >>= waitCatch
case r of
Right _ -> putText "demo async - Right"
Left e -> putText $ "demo async - Left: " <> show e
where
jobBad = do
putText "in jobBad"
void . throwIO $ DemoException "oops"
Conclusion
- The wrapper record separates the pure and effectful worlds and converts all synchronous IO exceptions into Eithers.
- Passing around a record of functions is a alternative to using a typeclass and I believe makes more sense in many cases.
- It is not that hard to refactor much of the impure code from a codebase
The obvious downside to this method is that you end up with boilerplate code for the wrapper. You’ll have to decide on if this is a problem for you or not. For me this is working really well so far.
Links
- Sample code on github
- Exceptions best practices in haskell
- ReaderT design pattern
- mtl-style-example: A small, self-contained example of using mtl style to unit test effectful code in a pure way
- Java interfaces map to Haskell records
- docs for Control.Exception
- Catching all exceptions
- safe-exceptions package