Efficient Dockerfile with CMake?

Hey all,

I’m looking for an efficient Dockerfile for CMake. For context, here’s Docker’s guide on making a Java based image: Build your Java image | Docker Docs

The main idea is to gather your dependencies before your source code is added, so that changes to your source code don’t un-necessarily cause you to fetch dependencies again.

# syntax=docker/dockerfile:1

FROM eclipse-temurin:17-jdk-jammy

WORKDIR /app

COPY .mvn/ .mvn
COPY mvnw pom.xml ./
RUN ./mvnw dependency:resolve

COPY src ./src

CMD ["./mvnw", "spring-boot:run"]

For CMake, the same end goal isn’t straight forward. For context, I’ve tackled this problem with the following variables:

  1. I’m using FetchContent for dependencies, including usage of OVERRIDE_FIND_PACKAGE
  2. I’m building with Ninja
  3. I’m using Cmake Presets

My attempt was to first make a CMakePresets.json like so:

{
  "version": 6,
  "configurePresets": [
    {
      "name": "ninja",
      "generator": "Ninja Multi-Config",
      "binaryDir": "ninja-build",
      "cacheVariables": {
        "CMAKE_UNITY_BUILD": "ON",
        "CMAKE_INSTALL_MESSAGE": "LAZY",
        "CMAKE_EXPORT_COMPILE_COMMANDS": "ON",
        "BUILD_SHARED_LIBS": "ON",
      }
    },
    {
      "name": "dependencies-only",
      "inherits": "ninja",
      "cacheVariables": {
        "DEPENDENCIES_ONLY": "ON"
      }
    },
    {
      "name": "main-offline",
      "inherits": "ninja",
      "cacheVariables": {
        "DEPENDENCIES_ONLY": "OFF",
        "CMAKE_SKIP_INSTALL_ALL_DEPENDENCY": "ON",
        "FETCHCONTENT_FULLY_DISCONNECTED": "ON",
        "CMAKE_OPTIMIZE_DEPENDENCIES": "ON",
        "CMAKE_LINK_DEPENDS_NO_SHARED": "ON"
      }
    }
  ],
  "buildPresets": [
    {
      "name": "release",
      "configurePreset": "ninja",
      "configuration": "Release"
    }
  ]
}

With a top-level CMakeLists.txt that has something like:

if(DEPENDENCIES_ONLY)
  include(FetchDependencies)
else()
  add_subdirectory(src)
endif()

I figured everything was in place, and I could do the following:

# Copy the minimum needed for build dependencies
COPY FetchDependencies.cmake CMakeLists.txt CMakePresets.json ./

# Download and build dependencies
RUN cmake --preset dependencies-only
RUN cmake --build --preset release

# Copy everything else
COPY . .

# Download and build the main project
RUN cmake --preset main-offline
RUN cmake --build --preset release --target main

However, it just doesn’t seem to work. I am seeing ninja rebuild everything on the last command in order to build main. I’ve read up on various topics about ninja rebuild issues, flags for build efficiency, etc, but I can’t get a working solution. Perhaps it’s the second configuration run, but I’m not really sure.

I feel like this should be a pretty standard, but I can’t find anything regarding it. Anyone out there with an idea for solving this problem?

ninja -d explain will have ninja tell you why it thinks something needs to build; it may be that some file is touched that triggers everything.

For this level of separation, I feel like FetchContent might be better served by a “superbuild” that builds your dependencies and then you find_package the dependencies from the main project. That superbuild can provide cache variables to make it easy to use (e.g., generate a set(var value CACHE type "") script with things like dep1_DIR). That file can then be used during (the initial) configure via the -C flag.

Cc: @craig.scott

I tried ninja explain, and it would tell me everything is dirty, but it wouldn’t tell me why it was dirty. So it didn’t help me resolve anything.

In terms of the superbuild structure you mentioned, how should the directory / project folders be structured to best achieve that?

I followed this guide on improving the output of ninja’s explain:
https://david.rothlis.net/ninja-explain/

By pulling in his branch, building ninja and pointing to it on my path. I was able to get some confirmation that the second call to generate is most likely what causes everything to rebuild:

ninja explain: command line changed for...
ninja explain: command line changed for...
ninja explain: command line changed for...

So that brings me up to the next question – how should one structure their project to achieve this?

I can see a sample of this mentioned in Professional CMake:

Due to improvements in FetchContent and its integration with find_package() in CMake 3.24, it is now more advantageous to bring dependencies into the top level scope rather than a separate subdirectory scope. Defining the dependencies in a separate file that the top level brings in via include() is now the recommended structure. Such a file might mix FetchContent and find_package() to bring in the project’s dependencies, each in the most appropriate way for that dependency.

But it isn’t clear to me how you can setup the project such that:

  1. You can configure/build dependencies
  2. You can copy the src directory later
  3. You can again configure/build, and it won’t cause a re-build of dependencies

Could someone provide me a tree-like output of a structure that would work?

Do you have some business requirement to have your source and dependencies’ source in the image?

Last time I did something like this, I just mounted the the source and build tree inside the container. Something like the following:

Dockerfile

FROM xyz # you should really use an image hash here as otherwise your builds won't be reproducible

RUN ... # install os dependencies here, ie apt-get install -y clang cmake <...>

# No more content here

build.sh

#!/usr/bin/bash
docker build -t localhost/<image-name> .
docker run \
  -w /src \
  --mount type=bind,src=${PWD},dst=/src \
  localhost/<image-name> \
  cmake --workflow --preset <workflow-name>

This way your build environment is still isolated from the host system, but incremental builds work exactly as expected.

I believe you’re referring to build kit bind mounts? such as cache mounts? If so, I am also using those on top of the above, but they weren’t perfect. So I was trying to get insights from the community on how a general solution to this gets tackled.

@craig.scott could you share your thoughts on the ideal way to do it?

I’m not sure there’s a nice way to get all of these. The former kind of requires the dependencies be part of the build but the latter two really want the dependencies to be “done” (installed). Anything that tries to mix these is just lying to the build graph about what is going on (i.e., not specifying dependencies that may matter in any given incremental build).

A very good example may this: GitHub - aminya/setup-cpp: Install all the tools required for building and testing C++/C projects.