« Back to blog

Blog: Automatically generating changelogs from git, for CakePHP

Graham Weldon

02 November, 2010

So you want to generate a changelog for your cool open source project, but you don't want to go to all the hassle of writing it out every time, or generating sets and posting them to your website. I mean its a hassle that you shouldn't have to deal with.. Fortunately you can do any number of approaches to automate the process. I'm going to demonstrate my approach to generating changelogs automatically for the CakePHP website for all releases.

The trick to this approach is an intelligent tagging system. While there are some tricks to the CakePHP tagging strategy, for the most part its straight forward in terms of ordering. If you use alphabetic characters to name your tags, this approach will need to be modified in order to collate results properly. Here we go.. Generating changelogs automatically for CakePHP:

First of all, I'm using PHP to generate the changelogs. This means it can either be run from the command line, or directly dumped into a webpage somewhere. Lets start by setting up a few variables to define the repository location and the version number quirks.

<?php
$options = array(
   'repo' => '/Users/predominant/Projects/cakephp/2.0/.git',
);
$options['git-dir'] = '--git-dir=' . $options['repo'];
var_dump(`git ${options['git-dir']} tag`);
?>

So thats pretty good so far. I have an options array to specify the repository location for the ".git" directory, and the options being an array allows me to add some new options later on as they are required. If you adjust the path in the above options array to a path to any project with tags created, you'll get a list of current tags. Below is the current output for CakePHP:

1.2.0
1.2.1
1.2.2
1.2.3
1.2.4
1.2.5
1.2.6
1.2.7
1.2.8
1.3-dev
1.3.0
1.3.0-RC1
1.3.0-RC2
1.3.0-RC3
1.3.0-RC4
1.3.0-alpha
1.3.0-beta
1.3.1
1.3.2
1.3.3
1.3.4
1.3.5

While this is a great start, we use an ordering system that incorporates words that do not naturally match for the sorting order they need to be in. The general flow is [-name[iteration]]. The version number is required. Optionally a name is included with a prefixed hyphen following the version number, and in the case of the 1.3.0-RC releases, there are iteration numbers indicating the release candidate number. You'll also note that back at the release point for 1.3-dev we omitted the "0" in the version number, so we're going to have to handle two digit as well as three digit numbers in the version comparison for sorting.

So the nest step is pretty major, but its an implementation of the sorting required for the version comparisons given our rule sets for CakePHP tag naming. The core of this is in the usort() call done on the tags array after fetching a list of available tags from the git repository.

<?php
$options = array(
   'repo' => '/Users/predominant/Projects/cakephp/2.0/.git',
   'titleOrder' => array(
      'dev', 'alpha', 'beta', 'rc',
   ),
   'regex' => '/(?<version>[\d\.]+)(?:-(?<title>[a-zA-Z]+)(?:(?<iteration>\d)?))?/',
);
$options['git-dir'] = '--git-dir=' . $options['repo'];

$tags = explode("\n", trim(`git ${options['git-dir']} tag`));

usort($tags, function($from, $to) use ($options) {
   preg_match($options['regex'], $from, $fromMatches);
   preg_match($options['regex'], $to, $toMatches);

   $version = version_compare($fromMatches['version'], $toMatches['version']);
   if ($version !== 0) {
   return $version;
   }

   if (!isset($fromMatches['title'])) {
      return 1;
   }
   if (!isset($toMatches['title'])) {
      return -1;
   }

   $title = array_search(strtolower($fromMatches['title']), $options['titleOrder']) - array_search(strtolower($toMatches['title']), $options['titleOrder']);
   if ($title !== 0) {
      return $title;
   }

   return version_compare($fromMatches['iteration'], $toMatches['iteration']);
});
var_dump($tags);
?>

I'm using a name indexed regular expression to make the code more readable when referencing matches from the preg_match() call. This is completely optional, and if you don't like the names being used for regex results, you can remove the ? sections in the regex, and reference matches traditionally, by index. After parsing out the version number components into: version, title and iteration, I do a version_compare() call on the version number, which PHP provides for us, followed by an array_search on the title.

These titles are stored in an ordered form in the options array at the top of the script. Such an ordering rule set requires this sort of specification for order, as it does not follow natural language rules.

Finally, if there is an iteration number available, such as in RC1, then we do a version_compare() call on the iteration number.

Running this code, you will see an ordered list of tags being output, and thus we can begin generating changelogs from one tag to another.

People mostly want the latest changelogs listed first, which is reverse chronological order. So lets array_reverse the result, and continue from there:

$tags = array_reverse($tags);

Finally, basing off the CakePHP "Changelog Generation Scripts" that we currently use, which basically take git commit logs between versions, and utilise "awk" for formatting, we enter a couple of formats into our options array and use the GET params to detect the required output format type.

An alternative to using awk, as I have done here, is to grab each line returned, and parse it with PHP for the commit hash and message. I'm holding off on that for a future version, and only if we need to do the processing in PHP for some reason.

The final script is a little lengthy, and optimisations from myself and Mark Story have evolved in the few hours since I started this. So rather than dumping the code out again, I'm linking to the GitHub Gist where code is being stored and versioned:

Grab the final code here: https://gist.github.com/658338

What we're doing is allowing the users to specify the output format by adding ?format=KEY to the URL to specify the format type. The format itself is not controlled via the URL, so you're relatively safe in that regard, however you will need to take care with the formatting commands that you enable. The reason its provided as a string is to allow complete customisation for the formatting and output of the git log for parsing.

One simple are to improve the performance of this script is to obviously cache the result, and not query git too often. Its not the fastest response to be getting, and some caching (particularly on heavy usage sites such as cakephp.org) will dramatically increase performance for your site viewers.

That concludes the overview of this automatic changelog parser mechanism. I hope you find it interesting or useful. Drop me a line if you notice any issues, or have any improvements for this.

Comments