Help with recompiling on a cluster

jlquinn · October 12, 2021, 9:37pm

Hi all,

I use cmake on a cluster of systems where for the most part I don’t control what node I get when submitting a job.

I find that when I build my cmake project, it does a full recompile all the time instead of incremental. As far as I can tell,
it appears to happen because the job lands on a different host than the last time.

Is there a way in cmake to sidestep this behavior and allow it to treat different nodes as the same for configuration
purposes? All the nodes have the same software configuration and filesystems are all shared.

Thanks
Jerry

buildSystemPerson · October 15, 2021, 9:27pm

I looked into it a little bit. Apologies I’m not familiar with compiling on clusters.

Perhaps this blog might interest you:

It talks about using distcc:

“distcc is a program to distribute compilation of C or C++ code across several machines on a network. distcc should always generate the same results as a local compile, is simple to install and use, and is often two or more times faster than a local compile.”

It sounds basically perfect for your use case and has integration with CMake via the usage of CMAKE_LANG_COMPILER_LAUNCHER

jlquinn · October 15, 2021, 10:25pm

Hi and thanks for taking time to respond to my question.

It looks like distcc is a very static environment that needs servers set up for a compiler farm. I would have to launch these servers dynamically (and I don’t get to choose the machines the servers land on), check that they are up and running, then relaunch compilation that can use them. This would be quite messy and from my reading so far, distcc doesn’t seem to address this kind of more dynamic environment.

Perhaps I’m mistaken, but I don’t think it will work amazingly well.

jlquinn · October 15, 2021, 10:26pm

In our env, it’s much simpler to launch a single job with N cores such as 40 to do the compile. As long as I don’t have to recompile the whole tree every time

ben.boeckel · October 15, 2021, 11:45pm

I would start by asking the build tool why it thinks it needs to do this. make -d and ninja -d explain for the “normal” tools.

jlquinn · October 18, 2021, 7:56pm

Good advice. So the first recompile happens because there is file out of sync with flags.make. I’m running cmake before compile every time. This wasn’t intentional but I have a simpler makefile that wraps the cmake call since it makes it easy for me to set up debug vs optimized etc builds how I like. I have also run into strange situations that I remedy by removing CMakeCache.txt and rerun cmake.

So what seems to happen is running cmake on a new host rewrites flags.make with -DBUILD_HOST=xxx and makes a bunch of targets out of date, even though nothing else about the configuration has changed.

ben.boeckel · October 18, 2021, 10:34pm

I would diff them before and after, though CMake should have had any “touches a build file without content changes” issues flushed out long ago, maybe new ones have crept in. If you can make a small reproducer case, please file an issue.

buildSystemPerson · October 18, 2021, 10:47pm

What generator are you using btw @jlquinn ?

Does this happen with Ninja and Unix Makefiles? What version of CMake?

jlquinn · October 19, 2021, 5:52pm

@ben.boeckel the content of the flags.make file does change. It’s just that the BUILD_HOST change is irrelevant in my environment. I suspect what I’m looking for is either a workaround, or extending CMAKE support to better work in such a cluster environment.

@buildSystemPerson I’m using Unix Makefiles backend. The current version I’m using is 3.19.6.

ben.boeckel · October 19, 2021, 5:55pm

Then the rebuild is correct (as neither make nor CMake has any idea that this flag is irrelevant). I would recommend turning off/commenting out whatever is giving you a BUILD_HOST setting in the first place.

jlquinn · October 19, 2021, 5:59pm

OK I just tested a completely trivial cmakefile that didn’t show this issue. I’ll have to work on creating a reduced test case.

jlquinn · October 19, 2021, 6:21pm

Now that I look in flags.make in my trivial test I don’t see BUILD_HOST. I had assumed that cmake always records it.

Thanks, this gives me direction to debug what’s happening.

ben.boeckel · October 19, 2021, 6:30pm

I would recommend using cmake --trace-expand to get a log of all the CMake code that is executed. The BUILD_HOST should show up in that log.