Hakyll compiler to include working code samples

Posted on February 5, 2018

(updated: 09/March/2018 - includes, local path & html, css and js support)

Ensuring that the code you include in a blog post is up to date and works can be a bit of a pain. Often I’ll change code while writing a post and then I have to find and copy anything that has changed. This is manual and error prone.

Fortunately Hakyll is reasonably easy to customise. Here I’ll show one way to write a hakyll compiler to help with this issue.

Goal

What I wanted was

  1. Include code from a git repo
  2. Work with a specific version of the code
  3. Check that the code builds
  4. Check that tests or any other custom actions succeed
  5. Check that the repo is still accessible

Example template markdown

The markdown should be parsed as follows

Example haskell file with sections

Result

When pandoc is run the include compiler will insert the code from the main section and add a title showing the source path (repo relative) and the position (line from & to).

app/site.hs (32 to 37)

Before including the code, the includeCompiler will checkout the code and run the commands specified in the template. In the example template above I’m cloning from a github gist that does not have a stack.yaml so I run stack init first. You can use the commands to run tests etc to ensure that your sample code is working 100%. If any action fails, the blog generation is aborted.

Constraints

  1. I only use markdown, so I’m assuming that all input is markdown
  2. This is not “production” code. I’m doing a lot of work in IO and throwing exceptions to abort on error
  3. It works for me, feel free to use the code and change it to match your needs.

Code

Customising hakyll

The hakyll tutorial will give you an idea of how to setup hakyll.

This is a fairly standard match clause to run your posts through pandoc to generate HTML output

Lets modify this route to use a new compiler named includeCodeCompiler and pipe that output through pandoc

site.hs (32 to 39)

The two changes to notice are

  1. Call includeCompiler rather than pandocCompiler
  2. The output of includeCompiler is passed to renderPandoc

Preliminaries

Here are the imports I’m using

site.hs (2 to 20)

And a few helper functions for running shell processes and finding files

site.hs (220 to 250)

The includeCodeCompiler

site.hs (51 to 66)

A pandoc compiler has the type Compiler (Item String). Since this compiler needs to read file it has to be able to perform IO. To allow IO the unsafeCompiler function is used.

So this code, gets the current file path, the item body and starts the includeCompile in IO

onException is used to print the name of the file being compiled if there is an exception.

Once the config (repo, sha and commands) have been read the main compiler logic can be run.

site.hs (70 to 110)
      case path' of
        Nothing ->
          case (repoPath', sha', cmds') of
            (Nothing, Nothing, []) -> pure $ Txt.unpack . Txt.unlines $ sourceNoSetup
            (Just _, Nothing, _) -> throwString "No sha found"
            (Just _, _, []) -> throwString "No run commands found"
            (Nothing, _, (_:_)) -> throwString "No repo setup found"
            (Nothing, Just _, []) -> throwString "No repo setup found"
            (Just repoPath, Just sha, cmds) -> do
              root <- Dir.getCurrentDirectory
              let tempPath = root </> tmpDirectory defaultConfiguration </> "codeIncludeGit"

              -- Cleanup from previous post
              removeDirectoryRecursiveIfExists tempPath
              Dir.createDirectoryIfMissing True tempPath
              -- Clone the git repo
              runShell' root $ "git clone \"" <> repoPath <> "\" \"" <> Txt.pack tempPath <> "\""
              -- Goto the correct sha (if it was specified)
              gotoSha sha tempPath
              -- Execute the run commands (buid, test etc)
              executeRunCommands cmds tempPath
              -- Delete all dirs we are not interested in (exclude .git and .stack-work)
              removeDirectoryRecursiveIfExists $ tempPath </> ".git"
              removeDirectoryRecursiveIfExists $ tempPath </> ".stack-work"
              includeCode tempPath repoPath sha sourceNoSetup

        Just path -> 
          includeCode (Txt.unpack path) "**local**" "**local**" sourceNoSetup

    includeCode tempPath repoPath sha sourceNoSetup = do
      -- Get all files in the repo 
      files <- getFilesRec tempPath
      -- All sections from all files
      sections' <- Map.fromList . concat <$> traverse getSections files
      let sections = Map.map (\(p, s, e, lang, ls) -> (drop (length tempPath + 1) p, s, e, lang, ls)) sections' 
      -- Replace sections in the file
      replaced' <- traverse (replaceCodeLineSection tempPath sections) sourceNoSetup
      let replaced = Txt.unlines . concat $ replaced'
      -- Replace sha and repo tokens
      pure . Txt.unpack . Txt.replace "2297510b93a903ab23a319f7921351a9725cef0e" sha $ Txt.replace "https://gist.github.com/53e179c4244411493ae1f9deebc3cc3f.git" repoPath replaced
  
site.hs (114 to 121)

This code does the following

  1. Pre-clone cleanup
  2. Clone
  3. Goto the configured commit
  4. Run the commands
  5. Read all the sections from the files
  6. Import the sections into the markdown

Details

Loading the config is done quite simply by filtering the source lines

site.hs (126 to 142)

And once the sections have been loaded from the source code the tags can be replaced in the markdown. Each [<code>] tag is replaced by a markdown code block, a title showing the source file and offset.

site.hs (147 to 175)

Getting sections from the repo

site.hs (181 to 215)

Different types of files will need different tag styles. In the code above I’m handling haskell, javascript, css and HTML. You should be able to fairly easily add this to other languages as well.

parseLine works by going line by line looking for a start token, and for each one that it finds it scans to find the end token. This is a little inefficient but it allows for nested and/or overlapping tags.

Code includes

Sometimes it is useful to include external files into a post. The [<include>] tag makes this simple.

For example [<include>] /home/user/static/interestingStuff.json. Unlike the [<code>] tag, no assumptions are made about the included text. If you want it syntax highlighted simply wrap the text in a code fence.

Speeding up the writing process

Fetching the code from a remote repo and doing a full build each time can be pretty slow. This is fine when you are confirming that everything works correctly, but its not idea when you are writing a post and still making many small changes. To help with this there is a [<code setup.path>] tag. This tag overrides the repo and run settings. If it is present then all code sections will be read from this path directly without any fetching, building or running of commands.

e.g. [<code setup.path>] /home/user/dev/myProject

Obviously it is important that you remove this setting once the post is done.

Using the compiler

Once you add this to your hakyll you can be sure that you are only using working code blocks. While there is a bit of code in this compiler most of it is for dealing with the file IO and parsing. I think it also shows how easily hakyll can be customised to do useful things.

This code works with hakyll 4.10.0.0. See the cabal file in the gist for other dependencies

Links