Hakyll compiler to include working code samples

Posted on February 5, 2018

(updated: 09/March/2018 - includes, local path & html, css and js support)

Ensuring that the code you include in a blog post is up to date and works can be a bit of a pain. Often I’ll change code while writing a post and then I have to find and copy anything that has changed. This is manual and error prone.

Fortunately Hakyll is reasonably easy to customise. Here I’ll show one way to write a hakyll compiler to help with this issue.

Goal

What I wanted was

Include code from a git repo
Work with a specific version of the code
Check that the code builds
Check that tests or any other custom actions succeed
Check that the repo is still accessible

Example template markdown

  ---
  title: testing
  ---
  [<code setup.repo>] https://gist.github.com/53e179c4244411493ae1f9deebc3cc3f.git
  [<code setup.sha>] 5a95ece18ecb248fb745b3e7cb19f5c4d410240f
  [<code setup.run>] stack init --resolver lts-12.0
  [<code setup.run>] stack build
  [<code setup.run>] stack test
  
  Some text
  
  [<code>] main
  
  more text

The markdown should be parsed as follows

[<code setup.repo>] - is the git repo to pull the code from
[<code setup.sha>] - is the commit to work with
[<code setup.run>] - any number of commands to run in order.
[<code>] main - gets the section named main from the code and inserts it as a markdown code block

Example haskell file with sections

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}

import           Protolude hiding (onException)
import           System.FilePath ((</>))
import qualified System.FilePath as FP
import qualified System.Directory as Dir
import           Control.Exception.Safe (onException, throwString)

{-! SECTION< main !-}
main :: IO ()
main = hakyll $ do
    match "posts/*" $ do
        route $ setExtension "html"
        compile $ includeCodeCompiler
            >>= renderPandoc 
{-! SECTION> main !-}

{-! SECTION< starts a code section
{-! SECTION> ends a code section
The parser will read all sections from all files in the repo, so section names must be unique. The advantage is that you don’t need to worry about finding paths or paths changing later on.

Result

When pandoc is run the include compiler will insert the code from the main section and add a title showing the source path (repo relative) and the position (line from & to).

app/site.hs (32 to 37)

main :: IO ()
main = hakyll $ do
    match "posts/*" $ do
        route $ setExtension "html"
        compile $ includeCodeCompiler
            >>= renderPandoc

Before including the code, the includeCompiler will checkout the code and run the commands specified in the template. In the example template above I’m cloning from a github gist that does not have a stack.yaml so I run stack init first. You can use the commands to run tests etc to ensure that your sample code is working 100%. If any action fails, the blog generation is aborted.

Constraints

I only use markdown, so I’m assuming that all input is markdown
This is not “production” code. I’m doing a lot of work in IO and throwing exceptions to abort on error
It works for me, feel free to use the code and change it to match your needs.

Code

Customising hakyll

The hakyll tutorial will give you an idea of how to setup hakyll.

This is a fairly standard match clause to run your posts through pandoc to generate HTML output

    match "posts/*" $ do
        route $ setExtension "html"
        compile $ pandocCompiler
            >>= loadAndApplyTemplate "templates/post.html"    postCtx
            >>= saveSnapshot "content"
            >>= loadAndApplyTemplate "templates/default.html" postCtx
            >>= relativizeUrls

Lets modify this route to use a new compiler named includeCodeCompiler and pipe that output through pandoc

site.hs (32 to 39)

  match "posts/*" $ do
      route $ setExtension "html"
      compile $ includeCodeCompiler
          >>= renderPandoc
          >>= loadAndApplyTemplate "templates/post.html"    postCtx
          >>= saveSnapshot "content"
          >>= loadAndApplyTemplate "templates/default.html" postCtx
          >>= relativizeUrls

The two changes to notice are

Call includeCompiler rather than pandocCompiler
The output of includeCompiler is passed to renderPandoc

Preliminaries

Here are the imports I’m using

site.hs (2 to 20)

{-# LANGUAGE NoImplicitPrelude #-}
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE LambdaCase #-}

import           Protolude hiding (onException)
import           Prelude (String)
import           Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import qualified Data.List as Lst
import qualified Data.Text as Txt
import qualified Data.Text.IO as Txt
import           Data.Monoid (mappend)
import           Hakyll
import qualified System.Exit as Xit
import qualified System.Process as Proc
import           System.FilePath ((</>))
import qualified System.FilePath as FP
import qualified System.Directory as Dir
import           Control.Exception.Safe (onException, throwString)

And a few helper functions for running shell processes and finding files

site.hs (220 to 250)

runShell' :: FilePath -> Text -> IO ()
runShell' workingDir cmd = do
  putText cmd
  runShell workingDir cmd >>= \case
    Right _ -> pass
    Left e -> throwString $ Txt.unpack "Error running `" <> Txt.unpack cmd <> "` " <> show e

runShell :: FilePath -> Text -> IO (Either Int ())
runShell workingDir cmd = do
  let p = Proc.shell $ Txt.unpack cmd
  (_, _, _, phandle) <- Proc.createProcess p { Proc.cwd = Just workingDir }
  Proc.waitForProcess phandle >>= \case
    Xit.ExitSuccess -> pure $ Right ()
    Xit.ExitFailure i -> pure $ Left i

getFilesRec :: FilePath -> IO [FilePath]
getFilesRec p = do
  fs <- (p </>) <<$>> getFiles p
  ds <- (p </>) <<$>> getDirs p
  cs <- traverse getFilesRec ds
  pure $ fs <> join cs

getDirs :: FilePath -> IO [FilePath]
getDirs p = do
  entries <- (p </>) <<$>> Dir.listDirectory p
  filterM Dir.doesDirectoryExist entries

getFiles :: FilePath -> IO [FilePath]
getFiles p = do
  entries <- (p </>) <<$>> Dir.listDirectory p
  filterM Dir.doesFileExist entries

The includeCodeCompiler

site.hs (51 to 66)

includeCodeCompiler :: Compiler (Item String)
includeCodeCompiler = do
  p <- getResourceFilePath
  getResourceString >>= withItemBody (unsafeCompiler . includeCompile p)

  where
    includeCompile :: FilePath -> String -> IO String
    includeCompile compilingPath source = 
      includeCompile' source
      `onException`
      putStr ("Exception compiling includes for: " <> compilingPath)

    includeCompile' :: String -> IO String
    includeCompile' source = do
      let ls1 = Txt.lines $ Txt.pack source 
      let (sourceNoSetup, repoPath', sha', cmds', path') = getConfig ls1

A pandoc compiler has the type Compiler (Item String). Since this compiler needs to read file it has to be able to perform IO. To allow IO the unsafeCompiler function is used.

So this code, gets the current file path, the item body and starts the includeCompile in IO

  p <- getResourceFilePath
  getResourceString >>= withItemBody (unsafeCompiler . includeCompile p)

onException is used to print the name of the file being compiled if there is an exception.

Once the config (repo, sha and commands) have been read the main compiler logic can be run.

site.hs (70 to 110)

      case path' of
        Nothing ->
          case (repoPath', sha', cmds') of
            (Nothing, Nothing, []) -> pure $ Txt.unpack . Txt.unlines $ sourceNoSetup
            (Just _, Nothing, _) -> throwString "No sha found"
            (Just _, _, []) -> throwString "No run commands found"
            (Nothing, _, (_:_)) -> throwString "No repo setup found"
            (Nothing, Just _, []) -> throwString "No repo setup found"
            (Just repoPath, Just sha, cmds) -> do
              root <- Dir.getCurrentDirectory
              let tempPath = root </> tmpDirectory defaultConfiguration </> "codeIncludeGit"

              -- Cleanup from previous post
              removeDirectoryRecursiveIfExists tempPath
              Dir.createDirectoryIfMissing True tempPath
              -- Clone the git repo
              runShell' root $ "git clone \"" <> repoPath <> "\" \"" <> Txt.pack tempPath <> "\""
              -- Goto the correct sha (if it was specified)
              gotoSha sha tempPath
              -- Execute the run commands (buid, test etc)
              executeRunCommands cmds tempPath
              -- Delete all dirs we are not interested in (exclude .git and .stack-work)
              removeDirectoryRecursiveIfExists $ tempPath </> ".git"
              removeDirectoryRecursiveIfExists $ tempPath </> ".stack-work"
              includeCode tempPath repoPath sha sourceNoSetup

        Just path -> 
          includeCode (Txt.unpack path) "**local**" "**local**" sourceNoSetup

    includeCode tempPath repoPath sha sourceNoSetup = do
      -- Get all files in the repo 
      files <- getFilesRec tempPath
      -- All sections from all files
      sections' <- Map.fromList . concat <$> traverse getSections files
      let sections = Map.map (\(p, s, e, lang, ls) -> (drop (length tempPath + 1) p, s, e, lang, ls)) sections' 
      -- Replace sections in the file
      replaced' <- traverse (replaceCodeLineSection tempPath sections) sourceNoSetup
      let replaced = Txt.unlines . concat $ replaced'
      -- Replace sha and repo tokens
      pure . Txt.unpack . Txt.replace "2297510b93a903ab23a319f7921351a9725cef0e" sha $ Txt.replace "https://gist.github.com/53e179c4244411493ae1f9deebc3cc3f.git" repoPath replaced

site.hs (114 to 121)

    executeRunCommands :: [Text] -> FilePath -> IO ()
    executeRunCommands cmds path = 
      traverse_ (runShell' path) cmds

    gotoSha :: Text -> FilePath -> IO ()
    gotoSha sha tmpPath = do
      runShell' tmpPath ("git reset " <> sha <> " --hard") 
      void $ runShell tmpPath "git clean -dfx"

This code does the following

Pre-clone cleanup
Clone
Goto the configured commit
Run the commands
Read all the sections from the files
Import the sections into the markdown

Details

Loading the config is done quite simply by filtering the source lines

site.hs (126 to 142)

  getConfig ls =
    let
      cfgPath = "[<code setup.path>]"
      cfgRepo = "[<code setup.repo>]"
      cfgSha = "[<code setup.sha>]"
      cfgRun = "[<code setup.run>]"
      path = Txt.strip . Txt.drop (Txt.length cfgRepo) <$> headMay (filter (Txt.isPrefixOf cfgPath) ls)
      repo = Txt.strip . Txt.drop (Txt.length cfgRepo) <$> headMay (filter (Txt.isPrefixOf cfgRepo) ls)
      sha = Txt.strip . Txt.drop (Txt.length cfgSha) <$> headMay (filter (Txt.isPrefixOf cfgSha) ls)
      run = Txt.strip . Txt.drop (Txt.length cfgRun) <$> filter (Txt.isPrefixOf cfgRun) ls
    in
    (filter (not . Txt.isPrefixOf "[<code setup.") ls, repo, sha, run, path)
      
  removeDirectoryRecursiveIfExists p = 
    Dir.doesDirectoryExist p >>= \case
      True -> Dir.removeDirectoryRecursive p
      False -> pass

And once the sections have been loaded from the source code the tags can be replaced in the markdown. Each [<code>] tag is replaced by a markdown code block, a title showing the source file and offset.

site.hs (147 to 175)

    replaceCodeLineSection :: FilePath -> Map Text (FilePath, Int, Int, Text, [Text]) -> Text -> IO [Text]
    replaceCodeLineSection tempPath sections line = do
      let codeTag = "[<code>]" 
      let includeTag = "[<include>]" 

      if Txt.isPrefixOf codeTag line 
        then
          let secName = Txt.strip . Txt.drop (Txt.length codeTag) $ line in
          case Map.lookup secName sections of
            Nothing -> throwString $ Txt.unpack $ "No section named " <> secName
            Just (path, start, end, lang, code) ->
              let title = Txt.pack path <> " (" <> show start <> " to " <> show end <> ")" in

              pure [ "###### " <> title
                   , ""
                   , "~~~{." <> lang <> "}"
                   , Txt.unlines code
                   , "~~~"
                   , ""
                   ]
        else
          if not $ Txt.isPrefixOf includeTag line 
            then pure [line]
            else
              let
                incRelPath = Txt.strip . Txt.drop (Txt.length includeTag) $ line 
                incFullPath = tempPath </> Txt.unpack incRelPath
              in
              sequenceA [Txt.readFile incFullPath]

Getting sections from the repo

site.hs (181 to 215)

type LineState = (Int, [(Text, (FilePath, Int, Int, Text, [Text]))]) 

getSections :: FilePath -> IO [(Text, (FilePath, Int, Int, Text, [Text]))]
getSections f = 
  case FP.takeExtension f of
    ".hs" -> getLangSections "{-! SECTION< " "{-! SECTION> " "{-! SECTION" "haskell"
    ".js" -> getLangSections "//!SECTION< " "//!SECTION> " "//!SECTION" "javascript"
    ".html" -> getLangSections "<!-- !SECTION+ " "<!-- !SECTION- " "<!-- !SECTION" "html"
    ".css" -> getLangSections "/* !SECTION< " "/* !SECTION> " "/* !SECTION" "html"
    _ -> pure []
  
  where
    getLangSections startToken endToken cleanToken lang = do
      ls <- Txt.lines <$> Txt.readFile f
      (_, r) <- foldlM (parseLine ls) (1, []) ls
      pure r

      where
        parseLine :: [Text] -> LineState -> Text -> IO LineState
        parseLine ls (lineNum, hist) l =
          if not . Txt.isPrefixOf startToken . Txt.strip $ l
          then pure (lineNum + 1, hist)
          else do
            let secName = Txt.strip . fst . Txt.breakOn " " . Txt.strip . Txt.drop (Txt.length startToken) . Txt.strip $ l 
            end <- scanForEnd ls secName lineNum
            pure (lineNum + 1, (secName, (f, lineNum + 1, lineNum + length end, lang, end)) : hist)

        scanForEnd ls secName fromLine =
          let fromOffset = drop fromLine ls in
          case Lst.span (not . Txt.isPrefixOf (endToken <> secName) . Txt.strip) fromOffset of
            (_, []) -> throwString $ "No section end found for: " <> Txt.unpack secName
            (r, _) -> pure $ filter cleanLine r

        cleanLine =
          not . Txt.isPrefixOf cleanToken . Txt.strip

Different types of files will need different tag styles. In the code above I’m handling haskell, javascript, css and HTML. You should be able to fairly easily add this to other languages as well.

parseLine works by going line by line looking for a start token, and for each one that it finds it scans to find the end token. This is a little inefficient but it allows for nested and/or overlapping tags.

Code includes

Sometimes it is useful to include external files into a post. The [<include>] tag makes this simple.

For example [<include>] /home/user/static/interestingStuff.json. Unlike the [<code>] tag, no assumptions are made about the included text. If you want it syntax highlighted simply wrap the text in a code fence.

Speeding up the writing process

Fetching the code from a remote repo and doing a full build each time can be pretty slow. This is fine when you are confirming that everything works correctly, but its not idea when you are writing a post and still making many small changes. To help with this there is a [<code setup.path>] tag. This tag overrides the repo and run settings. If it is present then all code sections will be read from this path directly without any fetching, building or running of commands.

e.g. [<code setup.path>] /home/user/dev/myProject

Obviously it is important that you remove this setting once the post is done.

Using the compiler

Once you add this to your hakyll you can be sure that you are only using working code blocks. While there is a bit of code in this compiler most of it is for dealing with the file IO and parsing. I think it also shows how easily hakyll can be customised to do useful things.

This code works with hakyll 4.10.0.0. See the cabal file in the gist for other dependencies