Distributing cmake workflow steps over several different physical machines

Gornhoth · October 22, 2024, 12:07pm

Hello, I am using CMake to power my build, test, and deployment needs both locally and on my CI gitlab runner. However, as is normal in CI builds, the different stages and targets of a build pipeline are often written in a way that aims to be agnostic of the physical machine that actually handles the target, which CMake can not easily handle as far as I could see. For example one pipeline runner’s physical machine may perform a configure and build, handing off the resulting build folder as artifact to the following testing stage, which may run on an entirely different physical machine. Now the problem is that when I do that, CMake uses absolute paths in the build artifacts, and so I can not transfer my build folder from one machine to another and continue without a fresh reconfigure and build. I was able to overcome this limitation with a hacky solution that uses sed like so - find build/ -type f -exec sed -i "s|$OLD_PATH|$NEW_PATH|g" {} \; to replace all relevant absolute paths from the previous machine with the absolute paths of the currently active machine in the next stage. This is of course very ugly and prone to errors, should the setup and locations of the runner machines change in the future.

So my question is: Is there a possibility to force CMake to use relative paths or otherwise enable the transfer of my build folder from one physical machine to another to handle my cmake workflow in a distributed manner?

scivision · October 22, 2024, 2:07pm

Would a network drive help, where each machine deliberately has an absolute path to the drive that is the same across machines?

ClausKlein · October 22, 2024, 2:42pm

Use a workflow preset like here cpp-lib-template/CMakePresets.json at develop · ClausKlein/cpp-lib-template · GitHub

Gornhoth · October 22, 2024, 5:49pm

Yes i thought of that as well, but before I do that i wanted to see whether there is an out-of-the-box solution from cmake to solve it, because the option you suggested probably involves having to change the behaviour of all our build runners.

Gornhoth · October 22, 2024, 5:52pm

I am already using workflow presets on my local machine, but they are of no use when trying to distribute the steps of a workflow over several different physical machines that have to use artifacts produced by other machines in preceding stages of the build pipeline.

ClausKlein · October 22, 2024, 6:50pm

For dependencies I use a package manager like conan or cpm.cmake

scivision · October 23, 2024, 12:05am

CMake generally resolves relative paths to absolute paths soon after input and thoroughly throughout itself. This avoids race conditions concerning working directory and is seen as a best practice.

Throughout the HPC (supercomputer) realm, this is typically handled by setting environment variables for paths on each heterogenous machine that scripts and code are set to expect.

For example, you might decide that env var CI_BUILD_ROOT_MYPROJ is set on each machine to the absolute path per-machine to a network drive for this project. The CMakeLists or a toolchain script can check if that env variable is defined and ensure that directories are set appropriately. This does require some setup and maintenance but is a widespread practice in HPC.

Gornhoth · October 23, 2024, 8:52am

Thank you very much for the insight into the HPC realm and the reasoning behind the early absolute path substitution! I suppose there is no other way around it but to have a common network drive for such distributed builds involving several different physical machines.