Why Post-Receive Git Hooks are Broken

Git makes it possible to have code run on your Git server when new files are pushed[1]. There’s the Pre-Receive Hook, which lets you reject changes before they reach the server. And there’s Post-Receive which happens after the push is done, notifying you about what has happened.

Steven_ho_about

Many Post-Receive hooks are used to kick off some other post processing step, like triggering a build or deploy system. We use our Post-Receive hooks to look at each commit individually and add a comment to our bug tracking system with the commit message, check for copyright, etc.

I’ve written before about the obscure and confusing world of implementing Git Hooks and about our RubyGitHooks gem that we have open sourced to make writing them easier.

How Post Receive Hooks Work

dislike-157252_150It turns out that the built-in Git hook functionality for Pre and Post Receive Hooks is even more brain dead than we thought, especially when there is a new branch and you’d like to know exactly which commits were made on this push.

It is not easy to find details about what exactly gets sent to the hooks, but the knowledge is out there. This answer on Stack Overflow gives a great example of a basic post-receive hook. It turns out that the input is a range of commits for each ref (branch) which has changed.

#!/bin/bash
while read oldrev newrev refname
do
branch=$(git rev-parse --symbolic --abbrev-ref $refname)
if [ "master" == "$branch" ]; then
# Do something
fi
done

Once you have this information it is fairly straightforward to use it to do something (start a build, etc) based on which branch has been updated.

But it becomes much more difficult if you are interested not only in the fact that changes have happened but exactly what changes they were. “The main PITA is to isolate the correct list of new revisions.” No kidding!

Ruby Git Hooks

thumb-146097_150

The good news is that with the latest version of the Ruby Git Hooks gem we have done this work for you.

RubyGitHooks pre-processes the basic information provided to any hook by Git and turns it into a more user friendly and consistent set of information. In the Pre/Post-Receive commit case, we turn the “base commit ref” triples sent from stdin into a list of the new commits, which the hook can access through a Ruby object.

Our problem

Previously, we only handled commits to the master branch, and in that case you can just use the information from Git and end up with what you want. When we started trying to handle commits to all branches, though, we ran into serious problems.

Consider this diagram representing commits to an existing repository, where the blue circles represents commits that are already on the server, and the red ones are new commits contained in this push – a few new commits to the master branch, and a few commits to a new branch, B2.

As a pre or post receive hook, the information we would get in as “base commit ref” would look something like this (if a commit SHA was one character instead of 40).

master commits

For the master branch, which we had already seen before, that gives us just the new commits. The “base” (commit before the first new one) is E, the latest commit is J, so the list of commits to master can be found by the command git rev-list E J, which gives:

master branch

BUT for branch B2, which is new, we are given 0 for the base commit, which means start all the way back before the very first commit. git rev-list H gives us the list:

B2

Technically this is accurate, because we haven’t ever seen B2 before. So those are all the commits we haven’t seen for branch B2. But for the purposes of sending an email, adding a Jira comment, or something else that you really want to do once and only once per commit, by this logic we end up reprocessing multiple commits every time a new branch is created.

And that’s what we did in the initial version of RubyGitHooks . But it just did not give the results we wanted. For repositories that were creating new branches often, we would generate an unacceptable number of duplicate emails and Jira comments.

Our Solution

It turns out that Git will return a list of commits for one reference point while excluding any commits which are also reachable from one or more other reference points. Since we know that we have already processed all the commits for branches which are not new, we can exclude all commits in those existing branches from the list of commits in a new branch. The resulting command for our example, git rev-list H ^B1 ^master, gives us what we want:

B2 correct

So that’s what we do now during setup in the RubyGitHooks. When you write your pre or post receive hook in ruby, using the RubyGitHooks framework, you have access to a list of commits, and the branch references that they were sent in with. In the example above, the commit_ref_map would look like

{"F" => ["master"],
"G" => ["B2"], "H" => ["B2"],
"I" => ["master"], "J"=>["master"]}

If you want to trigger a build when a certain branch gets any commits, you can still do that. But if you want to take a look at each unique new commit, you can now do that, too. Yay!

Here’s a super simple example of what your post receive hook could look like using RubyGitHooks.

#!/usr/bin/env ruby
# Put this file in .git/hooks/post-receive and make it executable!

class MyPostReceiveHook < RubyGitHooks::Hook
  def check
    commit_ref_map.each do |commit, refs|
        # do something for this commit
        puts "Commit #{commit} was made in branch(es) #{refs}"
    end
  end
end

RubyGitHooks.register MyPostReceiveHook.new
RubyGitHooks.run

For a more comprehensive example, take a look at the code for the JiraAddCommentHook that we use at OnLive. This code is one of the built in hooks in RubyGitHooks so you can use as it is, or use it as an example to write your own hook that fits your exact needs.

That’s all there is to it! You can focus on figuring out what you want it to do, not how in the world to convince Git to tell you what actually changed.


Footnotes:

[1]: Actually the server side hooks can’t be used with repositories hosted on GitHub, because you can’t install arbitrary code on even a GitHub Enterprise server. But we do have a solution for that which we will share soon (stay tuned…)

One Response to “Why Post-Receive Git Hooks are Broken”

  1. […] my previous post about RubyGitHooks (our framework for writing Git hooks in Ruby), I described how we addressed the […]

So what do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: