Skip to main content

CI cache key over many files

·487 words·3 mins

Here’s a quick tip for those of you using CircleCI, GitHub Actions, or any other CI which supports caching.

One of the ways to improve your CI build times is by caching your dependencies and build artifacts. This enables you to only fetch your build dependencies when they actually change (which won’t be often hopefully). For 90% of your builds, caching will save you a ton of time.

A quick overview of caching
#

The way you save and restore your cache on CircleCI is by your cache key. You can generate this any number of ways, but the way I use is by generating a checksum of your dependency files. On Android this will be your build.gradle files, for Bundler this will be your Gemfile, and so on.

Here’s an example of how you would set this up in CircleCI config:

- restore_cache:
    keys: gradle-{{ checksum "build.gradle" }}

# TODO: insert build tasks

- save_cache:
    keys: gradle-{{ checksum "build.gradle" }}
    paths:
        - ~/.gradle/caches
        - ~/.gradle/wrapper

Here we’re saving Gradle’s cache and wrapper folders based on a string containing the checksum of the root build.gradle. If that build.gradle file changes, the checksum will change too, and the existing cache will be invalid for future builds.

Multiple build files
#

So that’s a quick overview of how Circle CI caching. The big issue comes when you have multiple files which contain your dependencies. For big Android apps, a nice way to accelerate your build is by splitting your app into smaller modules. This allows your incremental builds to be faster (since you’ll only be re-building the modules which change), but it also adds more build.gradle files to track for checksum purposes.

For a project with three modules, you’d might end up with the following:

- restore_cache:
    keys: gradle-{{ checksum "build.gradle" }}-{{ checksum "app.gradle" }}-{{ checksum "data.gradle" }}-{{ checksum "common.gradle" }}

As you can see, this is hard to maintain and it is really easy to miss a module. With 10 modules, that string would be even longer and unwieldy.

So what can we do better? Unfortunately Circle CI does not provide a way to generate a checksum over multiple files so we have to be a bit more sneaky in how we do it.

We can run a script to generate a MD5 checksum for all of our build.gradle files in our source tree. The script uses find to find all of them, so none are missed. It then writes each checksum to a temporary file. We then sort the MD5 checksums to ensure that the file contents are consistent across runs. Finally, in our CircleCI config we create a checksum of that generated file as the cache key:

- run:
    name: Generate cache key
    command: ./checksum.sh /tmp/checksum.txt
- restore_cache:
    key: gradle-{{ checksum "/tmp/checksum.txt" }}

Voilà! An automated way to generate a stable cache key across many files.

You can find the commit containing the script here: https://github.com/chrisbanes/tivi/commit/c1219aeee9f62600fcd43d7caf1ea21e6e92930f