docker-compose-buildkite-plugin cache issues

Context

We noticed a bunch of issues related to cache hits when using the docker-compose-buildkite-plugin to do docker builds.

List of things found

There is (was?) a bug in moby/buildkite where using --cache-from was only getting cache hits on every second build
The above bug was fixed in 2024 but this fix has not yet made it into the buildkite elastic-ci-stack OSS CI stack
The reason the above hasn't happened is because amazonlinux:2023 is stuck on docker 25.x, despite it being 2+ years old
You are only impacted by all of the above if you upgrade from docker-compose-buildkite-plugin v4.x
1. v4.x doesn't honor cache-from, instead it translates this to a list of images that we need to docker pull before attempting a build
  1. This is obviously a very different form of "caching", it's non-lazy and doesn't use docker's --cache-from of the same name
  2. Importantly, it would mean we use a different cache importer (local vs remote)
Upgrading to docker-compose-buildkite-plugin v5.x means using the proper lazy version of --cache-from
1. The --cache-from feature only uses the first item in the list that it finds a cache hit for
We (most people?) use BUILDKIT_INLINE_CACHE=1 by default, to get caching for (almost) free. This stores your cache "inline" (i.e. with the image)
When you use multi-stage builds with "inline" cache, only the cache of the final stage is used

Hacks

To get around the fact "inline" cache only stores the final stage, you could explicitly build and push each stage, so every stage gets it's own "inline" cache
If you're on docker-compose-buildkite-plugin v4.x where the "cache from" feature is to docker pull instead, this will work for you
1. When docker finds images locally, it uses a different cache load mechanism where it can use the cache from more than 1 image
This may have even been the recommended solution in the README for docker-compose-buildkite-plugin back on v4, but it doesn't work in v5

Why would I upgrade?

Being stuck on v4.x forever would suck
It is missing other features and bugfixes (mostly around caching, but also improved build output and probably other things)
Having to explicitly build and push every stage in your multistage file, then reference all those in cache-from, kinda sucks

Solution

In order to get proper --cache-to (lazy caching) support for multi-stage builds, you need to specify a few extra flags
1. mode=max tells docker to cache every possible stage
  1. This has a downside, your cache will be very large - it does have every stage and base image in it after all
2. type=registry tells docker to store this cache in the registry at your chosen location, not in the image itself
3. image-manifest=true and oci-mediatypes=true are some arguments you need so that docker creates the cache manifest in a way that is supported by AWS ECR
4. compression=ztsd optional optimised caching algorithm for AWS ECR
docker-compose-buildkite-plugin v4.x does not support --cache-to so you must upgrade to v5.x
Using type=registry for a cache-to is not supported by the default docker build driver
1. You can create a custom builder using the docker-compose-buildkite-plugin quite easily
2. Using driver: docker-container is what is required for the above options to work

tl;dr

  plugins:
    - docker-compose#v5.10.0
        build: app
        builder:
          name: custom-builder
          use: true
          create: true
          driver: docker-container
        cache-from: "app:<ecr-uri>:<cache-tag>"
        cache-to: "app:ref=<ecr-uri>:<cache-tag>,type=registry,mode=max,image-manifest=true,oci-mediatypes=true,compression=ztsd"
        push: "app:<ecr-uri>:<release-tag>"

Previous v4 config for reference

  plugins:
    - docker-compose#v5.10.0
        build: app
        target: base
        cache-from: "app:<ecr-uri>:<base-cache-tag>"
        image-repository: <ecr-uri>
        image-name: <base-tag> 
    - docker-compose#v5.10.0
        build: app
        cache-from:
          - "app:<ecr-uri>:<base-cache-tag>"
          - "app:<ecr-uri>:<cache-tag>"
        image-repository: <ecr-uri>
        image-name: <release-tag>

tigris/issues.md

Select an option

No results found

Select an option

No results found

docker-compose-buildkite-plugin cache issues

Context

List of things found

Hacks

Why would I upgrade?

Solution

tl;dr

Previous v4 config for reference