How to avoid parallel-build race conditions?

airwin · February 27, 2020, 8:32pm

The documentation of add_custom_command states (in part) the following:

“Do not list the output in more than one independent target that may
build in parallel or the two instances of the rule may conflict
(instead use the add_custom_target() command to drive the command
and make the other targets depend on that one).”

I have a use case (building some LaTeX-based documentation where both
intermediate and final products depend on many different generated
files and associated targets) where I have many interdependent file
dependencies to insure all custom commands get rebuilt as necessary.
And the targets have many different target dependencies between them.
So this complex build system definitely but necessarily has different
targets depending indirectly on the OUTPUT from a given custom
command. So to avoid parallel build conflicts (or race conditions)
for this complex case my interpretation of the parenthesized part of
the quote above is I have to create custom targets corresponding to
custom comands and use add_dependencies between those targets as for
the simple example below. But could some CMake expert here look at
this simple example and confirm there are no parallel-build race
conditions for it?

# First custom command and associated target
add_custom_command(
OUTPUT
file_a
COMMAND ...
)
add_custom_target(target_a
DEPENDS file_a
)

# Second custom command and associated target
add_custom_command(
OUTPUT
file_b
COMMAND ...
DEPENDS
# This command should be rebuilt whenever file_a is updated
file_a
)
add_custom_target(target_b
DEPENDS file_b
)
add_dependencies(target_b target_a)

# Third custom command and associated target
add_custom_command(
OUTPUT
file_c
COMMAND ...
DEPENDS
# This command should be rebuilt whenever file_a or file_b are updated
file_a
file_b
)
add_custom_target(target_c
DEPENDS file_c
)
add_dependencies(target_c target_b target_a)

Given those custom command file DEPENDS are necessary (to get those
custom commands rebuilt as needed) and those targets are necessary
(due to other needs for the build system) are the above
add_dependencies commands sufficient to insure there are no
parallel-build race conditions in the above simple example?

Let’s assume the above general pattern with add_dependencies is
correct, but I have inadvertently not followed this pattern for my
complex build system. In this case, is there an unambiguous way with
CMake of detecting parallel-build race conditions for any given build
system? Or do build-system implementers that use add_custom_command
have to be constantly on guard about this issue?

Alan

ben.boeckel · February 28, 2020, 8:55pm

I know of no way to always detect such cases.

With LaTeX in particular, I usually have a chain of add_custom_command calls where each chooses an “arbitrary” output file of the command to trigger the next invokation (there is an optimal order, but I suspect it depends on your actual toolchain). I usually end up with 2 pdflatex, a bibtex, then a final pdflatex custom command, the last one having the .pdf as its output. That is then hooked up to the custom target. Yes, it’s a mess.

airwin · February 28, 2020, 10:31pm

That’s painful for CMake-based build system developers such as myself
since it means we have to always be vigilant about the slightest
change in file dependencies with no proof that we got it right.

I request that the CMake developers address this issue with a CMake
command-line option that lists all targets available for a given
configuration, their direct and indirect list of file dependencies,
and also the subset of those targets where those file dependencies
clash without add_dependencies solving that issue.

Let me know if you think this idea is a reasonable one, and
if so I will write it up in the CMake bug tracker.

Alan

airwin · February 28, 2020, 10:43pm

add_custom_command calls where each chooses an “arbitrary” output
file of the command to trigger the next invokation (there is an
optimal order, but I suspect it depends on your actual toolchain). I
usually end up with 2 pdflatex, a bibtex, then a final
pdflatex custom command, the last one having the .pdf as its
output. That is then hooked up to the custom target. Yes, it’s a
mess.

For your information I gave up on CMake logic to handle this necessary
*tex tool iteration. Instead, I wrote a bash script
https://sourceforge.net/p/freeeos/freeeos/ci/master/tree/www/iterate_pdflatex.sh
to perform the pdflatex iteration which anyone is welcome to use (or
adapt under the terms of the GPL license for FreeEOS). The iteration
criterion is a general one so I think it should be straightforward to
generalize this script for all *tex tools. But I only use pdflatex
these days so I did not bother with that generalization.

Alan

ben.boeckel · February 29, 2020, 12:17pm

Doing this implies that we have accurate and complete dependencies for all commands in the build system. This can only be known post-build in the general case if everything has accurate depfile outputs and isn’t “lying” and specify all of their outputs. I don’t think it is feasible for CMake to provide such a tool. The ninja tool is the closest to having some of this information (the deps log), but that is only as good as the tools you’re using report themselves. You need tracking build systems like Tup to do that kind of stuff, but usually then you have to be right and you lose the ability to do things like extract a tarball, use git, etc. during your main build.

fdk17 · March 1, 2020, 2:56pm

There are a lot of may’s in the documentation you are quoting. My experience with custom commands and where the output of the commands are needed by multiple targets is how different generators express this in the project files they generate. I recall seeing that these commands are duplicated for some generators and for others they show up once. This is one of the reasons for the “may” have race conditions. It was my understanding this is why the documentation states to avoid this situation by using a custom target that essentially runs first and then all targets that rely on these outputs run afterwords.

I really think that your example is overly complicated or I don’t understand what you are trying to get at. I don’t see what target_b or target_a is for. The build tools like Ninja or Make or Visual Studio wouldn’t have parallelized running the individual commands in my experience because you clearly define the dependencies between them.

For me if I had other targets that depending upon not just on file_c but on file_a, file_b, and file_c, then I would just use add_dependencies to have all these other targets be dependent on target_c.

But the way I read your use-case is that it’s the opposite. It’s sounds like you would just want to build the LaTeX-based documentation last in the entire build process to ensure that all the targets that feed inputs into the documentation target are finished. Then the custom commands for the documentation target will run as necessary if any of their individual inputs were updated.

airwin · March 1, 2020, 8:27pm

@ben.boeckel
Actually, I don’t believe having complete and accurate depfile outputs
should be required. In fact, the whole point is to test the custom
command outputs (however inaccurately and incompletely they may be
specified) for the current build system!

Also, could you please explain why you think such tests could only be
done post build?

On the contrary it appears to me that for a given build-system
configuration, CMake must know every depfile and the associated
chain of custom commands specified by the CMake-based build system for
every target. (Or otherwise, it would not know how to configure the
build of those targets.) So shouldn’t it be straightforward to
implement an option for CMake that warns whenever two targets that
have no target dependencies between them refer to the same custom
command in their two different chains of custom commands?

airwin · March 1, 2020, 8:56pm

@fdk17 said:
I really think that your example is overly complicated or I don’t
understand what you are trying to get at.

My use case actually concerns building figures and files for 5
different LaTeX documents with many common files between the figures
and the documents. So for my simple example I wanted at least three
different files with dependencies between them. And in this example,
I tried to follow exactly the advice in the documentation

"(instead use the add_custom_target() command to drive the command and
make the other targets depend on that one).”

concerning the targets to be implemented for those 3 generated files and the
target dependencies between those targets.

@fdk17 said:
The build tools like Ninja or Make or Visual Studio wouldn’t have
parallelized running the individual commands in my experience because
you clearly define the dependencies between them.

Thanks for that confirmation that you are pretty sure that following
this simple rule (if there are common custom commands in the custom
command chains of two different targets, then serialize those two
targets with the appropriate add_dependencies commands) will avoid
race conditions for my (complex) use case.

Also note that in my response to @ben.boeckel I was only asking that
an option be implemented for CMake that identifies for a given
build-system configuration whenever this simple rule is not followed.

ben.boeckel · March 2, 2020, 3:51pm

Ah. I think I see now. @brad.king Thoughts on feasibility?

brad.king · March 2, 2020, 4:07pm

It may be possible to implement a check to detect such races, but we don’t always know they are wrong either. For example a single custom command may be included in two independent targets that are EXCLUDE_FROM_ALL such that the project intends only one will ever by built at a time as a helper utility or something.

airwin · March 2, 2020, 6:24pm

@brad.king said:

“It may be possible to implement a check to detect such races, but we
don’t always know they are wrong either. For example a single custom
command may be included in two independent targets that are
EXCLUDE_FROM_ALL such that the project intends only one will ever by
built at a time as a helper utility or something.”

Races are always wrong in my particular use cases (not just for the
particular use case I mentioned, but many others spread across quite a
few different projects). The vast majority of my custom targets (all
configured without the ALL signature) require lots of cpu time to
build and are typically part of a hierarchy of add_dependencies on
some overall target (such as the “FreeEOS_papers” target for the present
use case). And I have a computer with 16 hardware threads. So as a
matter of course I always build such overall custom targets with
parallel builds. Therefore, I would welcome the capability of
detecting races for all my use cases for add_custom_target, and I
suspect any other build-system developers that uses add_custom_target
as extensively as I do would also welcome this capability.

@brad.king
What is the next step that you recommend? Do you plan first to look
more deeply at feasibility in case this requested capability is
trivial to implement or do you know already it is going to be hard to
implement so you would recommend the next step should be for me to
move this discussion to the bug tracker? Also, if/when that move
occurs, how should I reference in the bug tracker the present
preliminary discussion?

brad.king · March 2, 2020, 6:58pm

This has been an issue since CMake started. There are probably open issues about it in the issue tracker already. No one has taken the time to try to implement the detection. If someone does I’d be happy to review the work. I do not plan to work on it myself.

airwin · March 2, 2020, 8:10pm

I have made the feature request at https://gitlab.kitware.com/cmake/cmake/issues/20412