add_custom_command parallel race condition with just one cp -p custom command. Is this an internal parallel race bug for at least all versions from 3.18.4 to 3.31.5?

To illustrate this parallel build bug for add_custom_command with just one cp -p command, here is my test CMakeLists.txt file.

CMAKE_MINIMUM_REQUIRED(VERSION 3.18.4 FATAL_ERROR)

project(test_cp NONE)


set(name_list
     announcement-2.2.0.html
     announcement-2.2.1.html
     )

   set(names)
   set(staged_names)
   foreach(name ${name_list})
     list(APPEND names ${CMAKE_CURRENT_SOURCE_DIR}/${name})
     list(APPEND staged_names ${CMAKE_CURRENT_BINARY_DIR}/${name})
   endforeach(name ${name_list})
   message(STATUS "DEBUG: names = ${names}")
   message(STATUS "DEBUG: staged_names = ${staged_names}")

   add_custom_command(
     OUTPUT ${staged_names}
     COMMAND cp -p ${names} ${CMAKE_CURRENT_BINARY_DIR}
     DEPENDS ${names}
     VERBATIM
     COMMAND_EXPAND_LISTS
   )

   add_custom_target(build_staged_names
     DEPENDS
     ${staged_names}
     )

The same directory contains two arbitrary files with older dates that I want to preserve in the copying process using cp -p.

But here is what I get instead in the build tree

irwin@merlin> pwd
/home/irwin/cmake_test/build_dir
irwin@merlin> rm -rf ./*
irwin@merlin> # Note this is cmake-3.31.5 that I built myself with a standard script with
irwin@merlin> # no errors or warnings in the configuration or build process.
irwin@merlin> env PATH=/home/software/cmake/install-3.31.5/bin/:$PATH cmake ../cmake_test_cp/
-- DEBUG: names = /home/irwin/cmake_test/cmake_test_cp/announcement-2.2.0.html;/home/irwin/cmake_test/cmake_test_cp/announcement-2.2.1.html
-- DEBUG: staged_names = /home/irwin/cmake_test/build_dir/announcement-2.2.0.html;/home/irwin/cmake_test/build_dir/announcement-2.2.1.html
-- Configuring done (0.0s)
-- Generating done (0.0s)
-- Build files have been written to: /home/irwin/cmake_test/build_dir

By the way, that cmake build script contains the following lines (plus much else elided logic to determine SCRIPT_PATH and git logic to pull cmake, identify which git tag or commit ID is being used for the build, gpg verify the tag, etc.)

# Tailor this part of the script according to your tastes.
# But if you don't tailor this part of the script, the
# result from the default values below should be OK.

# Number of parallel jobs should be two more than the
# number of hardware threads (16 for a Ryzen 7 1700 system)
# to get the maximum efficiency out of those threads.
JOBS=18
# Compiler options for building cmake
export CFLAGS='-O3'
export CXXFLAGS='-O3'
# End of section that you might want to tailor according to your tastes.
# ... only if my bash script git logic says this is necessary
git clone https://gitlab.kitware.com/cmake/cmake.git cmake.git
#... build logic
mkdir -p ${SCRIPT_PATH}/build_dir
cd ${SCRIPT_PATH}/build_dir
# Use fresh start every time
rm -rf ${SCRIPT_PATH}/build_dir/* ${SCRIPT_PATH}/install-${CMAKE_VERSION}
echo "Bootstrapping the cmake build.  This may take a while...." 
nice -19 ${SCRIPT_PATH}/cmake.git/bootstrap \
--prefix=${SCRIPT_PATH}/install-${CMAKE_VERSION} \
--parallel=$JOBS --qt-gui --system-curl  >& bootstrap.out

echo "Building cmake.  This may take a while..."
nice -19 make -j$JOBS >& make.out
echo "Installing cmake."
nice -19 make -j$JOBS install >& make_install.out
irwin@merlin> make VERBOSE=1 -j18 build_staged_names 
/home/software/cmake/install-3.31.5/bin/cmake -S/home/irwin/cmake_test/cmake_test_cp -B/home/irwin/cmake_test/build_dir --check-build-system CMakeFiles/Makefile.cmake 0
make  -f CMakeFiles/Makefile2 build_staged_names
make[1]: Entering directory '/home/irwin/cmake_test/build_dir'
/home/software/cmake/install-3.31.5/bin/cmake -S/home/irwin/cmake_test/cmake_test_cp -B/home/irwin/cmake_test/build_dir --check-build-system CMakeFiles/Makefile.cmake 0
/home/software/cmake/install-3.31.5/bin/cmake -E cmake_progress_start /home/irwin/cmake_test/build_dir/CMakeFiles 1
make  -f CMakeFiles/Makefile2 CMakeFiles/build_staged_names.dir/all
make[2]: Entering directory '/home/irwin/cmake_test/build_dir'
make  -f CMakeFiles/build_staged_names.dir/build.make CMakeFiles/build_staged_names.dir/depend
make[3]: Entering directory '/home/irwin/cmake_test/build_dir'
cd /home/irwin/cmake_test/build_dir && /home/software/cmake/install-3.31.5/bin/cmake -E cmake_depends "Unix Makefiles" /home/irwin/cmake_test/cmake_test_cp /home/irwin/cmake_test/cmake_test_cp /home/irwin/cmake_test/build_dir /home/irwin/cmake_test/build_dir /home/irwin/cmake_test/build_dir/CMakeFiles/build_staged_names.dir/DependInfo.cmake "--color="
make[3]: Leaving directory '/home/irwin/cmake_test/build_dir'
make  -f CMakeFiles/build_staged_names.dir/build.make CMakeFiles/build_staged_names.dir/build
make[3]: Entering directory '/home/irwin/cmake_test/build_dir'
[100%] Generating announcement-2.2.0.html, announcement-2.2.1.html
cp -p /home/irwin/cmake_test/cmake_test_cp/announcement-2.2.0.html /home/irwin/cmake_test/cmake_test_cp/announcement-2.2.1.html /home/irwin/cmake_test/build_dir
make[3]: Leaving directory '/home/irwin/cmake_test/build_dir'
[100%] Built target build_staged_names
make[2]: Leaving directory '/home/irwin/cmake_test/build_dir'
/home/software/cmake/install-3.31.5/bin/cmake -E cmake_progress_start /home/irwin/cmake_test/build_dir/CMakeFiles 0
make[1]: Leaving directory '/home/irwin/cmake_test/build_dir'
irwin@merlin> ls -l ../cmake_test_cp/*.html
-rw-r--r-- 1 irwin irwin 5278 Jul 23  2007 ../cmake_test_cp/announcement-2.2.0.html
-rw-r--r-- 1 irwin irwin 6684 Jul 26  2008 ../cmake_test_cp/announcement-2.2.1.html
irwin@merlin> ***note bad date for second file***
irwin@merlin> ls -l *.html
-rw-r--r-- 1 irwin irwin 5278 Jul 23  2007 announcement-2.2.0.html
-rw-r--r-- 1 irwin irwin 6684 Feb 17 20:29 announcement-2.2.1.html
irwin@merlin> rm *.html
rm: remove regular file 'announcement-2.2.0.html'? y
rm: remove regular file 'announcement-2.2.1.html'? y
irwin@merlin> cp -p /home/irwin/cmake_test/cmake_test_cp/announcement-2.2.0.html /home/irwin/cmake_test/cmake_test_cp/announcement-2.2.1.html /home/irwin/cmake_test/build_dir
irwin@merlin> ***note good date for second file***
irwin@merlin> ls -lt *.html
-rw-r--r-- 1 irwin irwin 6684 Jul 26  2008 announcement-2.2.1.html
-rw-r--r-- 1 irwin irwin 5278 Jul 23  2007 announcement-2.2.0.html
irwin@merlin> 

So the cp -p command that VERBOSE=1 says the cmake-based build is using preserves the date of the first 2.2.0 file, but uses the current date for the second file (and subsequent files for more complicated experiments). And if I copy that VERBOSE=1 command exactly to the command line then I get the correctly dated result so the issue is not with the Linux cp -p command on my PC. Furthermore, I assume that parallel builds should never mess up commands that are listed by VERBOSE=1 although, of course, it is well-known that parallel builds do intertwine output from commands that are running in parallel. But, of course, that doesn’t happen for this simple test case because there is only one command.

The -j18 option for make is copied from a much larger project that showed the same issue. And those parallel builds for that larger project (FreeEOS) are essential since the test target takes something like 25 minutes to build without that option and only 4 minutes to build with that option. (That is, a Linux Ryzen 7 1700 PC although quite dated is still a powerful development and testing system for parallel builds.) However, as an experiment I tried the build of this simple test without the -j18 option, and that produced a perfect result. So there is likely some build race issue in the above simple project, but I cannot see any race condition in the CMakeLists.txt file above (since only one command is being run) so I am virtually positive this problem is internal to how cmake processes that cp -p command and therefore I have called it a CMake bug.

<aside>
   Hello again to all my CMake developer and coder acquaintences from years ago.
   I sure hope cmake developers continue to be as
   kind as they were in the old days.
   Of course, it is possible there is something obvious I have missed
   because in the last three years or so I have been otherwise occupied
   so it was not possible for me to develop much at all.  Of course,
   starting up again with FreeEOS development after that time lapse I was ultra
   cautious with results so I immediately noticed that date anomaly in
   those `cp -p` results.  Such date problems are practically important since I use
   rsync to upload a generated website to freeeos.sf.net, and those bad
   dates mean extra uploads for files whose content has actually not
   changed from the last rsync.
</aside>

Two other experiments I tried with the parallel build were

  1. Drop COMMAND_EXPAND_LISTS option ==> same (bad date) result for second and subsequent files. So it appears this option is optional at least for simple commands like above. I did not try the experiment of dropping VERBATIM since my impression is non-VERBATIM results are deprecated to a certain extent.

  2. Did the above for CMake 3.18.4 that I built myself with the same standard script I used to build 3.31.5. ==> same (bad date) result for second and subsequent files. So it appears this CMake bug for parallel builds has has been around a long time. I view this bug as pretty serious since
    it includes a subtle corruption of VERBOSE=1 results where the commands being given are not
    actually what is run (since I got different results in the two cases). Is there a cmake option so that VERBOSE=1 gives the exact command(s) for what is run so that parallel race conditions that are internal to cmake can be debugged?

As in the old days, I find CMake bugs pretty well but don’t understand
C++ that well so cannot help much with the actual fix for whatever is
causing this bug. Nevertheless, I would certainly be happy to try
other experiments that the CMake developers might want to suggest
to help narrow down the cause of this bug.

Cheers,

Alan

P.S. I like email normally, but for now please don’t respond to me by email because I cannot stand the gmail GUI and far prefer the local Linux mailing software I have been using for more than two decades. The only problem is google makes downloading gmail email with the getmail application really hard to configure these days, and I haven’t been able to figure out what to do during the several year hiatus I mentioned above. But now I am developing again I will figure it out eventually.

I don’t know if I fully understand your problem, with just one add_custom_command there is nothing that can be built in parallel.
But there is some conceptional flaw in your definition.

Makes do use timestamps to decide if a rule needs to be built. If any of the outputs is older than any of the dependencies, it gets rebuilt.
So repetitive calls of make will always copy both files again and again.

Makes can have only 1 file as output. I never looked at the generated makefiles, but CMake will use some technique to workaround this limitation. But this may only work for normal use cases, not with manipulated timestamps as in your case.

Possible solutions with build (copy) avoidance:

  • Use Ninja instead of Make: Ninja can handle old timestamps as new outputs
  • Use separate add_custom_command for each file: the single output file will have same timestamp as dependency

Possible solution which always copies (like it does today actually):

  • Do no use add_custom_command, use add_custom_target instead: There are no output files to handle by make, just a phony target

Hi Josef:

Thanks for your suggestion (for the Unix Makefiles generator case) of the workaround of using a separate custom cp -pf command (I have added f to the cp options for more robustness) for each individual file to be copied. I have implemented that for the simple test case, and it works. The modified CMakeLists.txt file now looks like this:

CMAKE_MINIMUM_REQUIRED(VERSION 3.18.4 FATAL_ERROR)

project(test_cp NONE)


set(name_list
    announcement-2.2.0.html
    announcement-2.2.1.html
    )
    
  option(WORKAROUND_BUG "Workaround parallel build bug for the Unix Makefiles generator" ON)
  set(names)
  set(staged_names)
  foreach(name ${name_list})
    if(WORKAROUND_BUG)
    add_custom_command(
      OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/${name}
      COMMAND cp -pf ${CMAKE_CURRENT_SOURCE_DIR}/${name} ${CMAKE_CURRENT_BINARY_DIR}
      DEPENDS ${CMAKE_CURRENT_SOURCE_DIR}/${name}
      VERBATIM
      COMMAND_EXPAND_LISTS
      )
    endif(WORKAROUND_BUG)
    list(APPEND names ${CMAKE_CURRENT_SOURCE_DIR}/${name})
    list(APPEND staged_names ${CMAKE_CURRENT_BINARY_DIR}/${name})
  endforeach(name ${name_list})
  message(STATUS "DEBUG: names = ${names}")
  message(STATUS "DEBUG: staged_names = ${staged_names}")

  if(NOT WORKAROUND_BUG)
    # This add_custom_command should have the same net effect as the collection
    # of individual add_custom_commands above (which do individual file copies rather
    # then the following mass file copy.  But with the Unix Makefiles generator and
    # -j make option this mass copy fails to give the correct old timestamps on the resulting files
    # for unknown reasons, but the Ninja generator gives the correct result for this case.
    add_custom_command(
      OUTPUT ${staged_names}
      COMMAND cp -pf ${names} ${CMAKE_CURRENT_BINARY_DIR}
      DEPENDS ${names}
      VERBATIM
      COMMAND_EXPAND_LISTS
      )
  endif(NOT WORKAROUND_BUG)

  add_custom_target(build_staged_names
    DEPENDS
    ${staged_names}
    )

For Unix Makefiles and -j 18 option for make the -DWORKAROUND_BUG=ON case just works as you suggested, and the -DWORKAROUND_BUG=OFF case is largely the equivalent of the previous test case and indeed does not work correctly (wrong timestamps) for the copied files.

You also suggested trying Ninja, and that just works regardless of whether WORKAROUND_BUG is OFF or ON.

One additional experiment I tried was running that full cp command on the command line under both bash (the results I initially reported) and dash (the default SHELL for the Makefile), and the result was the desired original datestamps.

These combined results strongly suggests there is either a bug in make or a bug in the Makefiles generated by CMake for the Unix Makefiles generator which interferes with the timestamps of the copied files for the make -j18 option.

I would be happy to run any further test suggested by a CMake developer to try and sort out which of those two conclusions is correct. Of course, if it turns out to be the generator a fix would be great, but in the meanwhile I do have a workaround for my own FreeEOS build-system needs.