CTest performance drops when large number of tests defined

I am using ctest as an infrastructure for randomized testing and I’m seeing some serious perf issues when the number of unique test grows beyond some bounds.

Test are constructed within the test directory’s CMakeLists.txt so when we execute cmake the corresponding build directory contains a CTestTestfile.cmake with lines similar to:

add_test(foo_100 "/path_to_foo/" "--seed" "100")
set_tests_properties(foo_100 PROPERTIES  DEPENDS "foo_testbench" LABELS "smoke; foo" WORKING_DIRECTORY "..." _BACKTRACE_TRIPLES "...")

The initial cmake runtime:

26K tests: 0m12.772s
120K tests: 0m25.365s
1.1M tests: 2m52.856s

ctest -R foo_100 runtime (not including test runtime)

26K tests: 0m5.028s
120K tests: 4m30.792s
1.1M tests: ??? I gave up after ~30 minutes

The tests need to be unique so it’s easy to refer to the specific random seed later for debugging purposes. Is there a better method to handling this case in ctest, or is it just the wrong tool for the job?

Do you have a publicly available project that someone could use to reproduce the sort of performance behavior you are seeing? If you can provide that, it would make it more likely that someone might run some performance analysis on the code to track down any hot spots for improvement.

Unfortunately the project isn’t something I can share publicly. Time permitting, I’ll attempt to distill down the problem to a reproducer that I can share next week.

In my own investigations strace was showing the following:

1691715094.457605 brk(0xaaab24394000)   = 0xaaab24394000
1691715094.458482 getcwd("/build/tests", 2048) = 40
1691715094.459825 getcwd("/build/tests", 2048) = 40
1691715094.461233 getcwd("/build/tests", 2048) = 40
1691715094.462487 getcwd("/build/tests", 2048) = 40
1691715094.463727 getcwd("/build/tests", 2048) = 40
1691715094.465161 getcwd("/build/tests", 2048) = 40
1691715094.466561 getcwd("/build/tests", 2048) = 40
1691715094.467920 getcwd("/build/tests", 2048) = 40
1691715094.469188 getcwd("/build/tests", 2048) = 40
1691715094.470469 getcwd("/build/tests", 2048) = 40
1691715094.471710 getcwd("/build/tests", 2048) = 40
1691715094.472847 getcwd("/build/tests", 2048) = 40
1691715094.473923 getcwd("/build/tests", 2048) = 40
1691715094.475036 getcwd("/build/tests", 2048) = 40
1691715094.476324 getcwd("/build/tests", 2048) = 40
1691715094.477500 getcwd("/build/tests", 2048) = 40
1691715094.478740 getcwd("/build/tests", 2048) = 40
1691715094.479808 getcwd("/build/tests", 2048) = 40
1691715094.481119 getcwd("/build/tests", 2048) = 40
1691715094.482300 getcwd("/build/tests", 2048) = 40
1691715094.483575 getcwd("/build/tests", 2048) = 40
1691715094.484866 getcwd("/build/tests", 2048) = 40
1691715094.485979 getcwd("/build/tests", 2048) = 40
1691715094.486945 getcwd("/build/tests", 2048) = 40
1691715094.488276 getcwd("/build/tests", 2048) = 40
1691715094.489700 getcwd("/build/tests", 2048) = 40
1691715094.491428 getcwd("/build/tests", 2048) = 40
1691715094.492700 getcwd("/build/tests", 2048) = 40
1691715094.494032 getcwd("/build/tests", 2048) = 40
1691715094.495363 getcwd("/build/tests", 2048) = 40
1691715094.496489 getcwd("/build/tests", 2048) = 40
1691715094.497553 getcwd("/build/tests", 2048) = 40
1691715094.498693 getcwd("/build/tests", 2048) = 40
1691715094.500015 getcwd("/build/tests", 2048) = 40
1691715094.501038 getcwd("/build/tests", 2048) = 40
1691715094.502138 getcwd("/build/tests", 2048) = 40
1691715094.503175 getcwd("/build/tests", 2048) = 40
1691715094.504512 getcwd("/build/tests", 2048) = 40
1691715094.505659 getcwd("/build/tests", 2048) = 40
1691715094.506913 getcwd("/build/tests", 2048) = 40
1691715094.508266 getcwd("/build/tests", 2048) = 40
1691715094.509388 getcwd("/build/tests", 2048) = 40
1691715094.510468 getcwd("/build/tests", 2048) = 40
1691715094.511718 getcwd("/build/tests", 2048) = 40
1691715094.513055 getcwd("/build/tests", 2048) = 40
1691715094.514330 getcwd("/build/tests", 2048) = 40
1691715094.515557 getcwd("/build/tests", 2048) = 40
1691715094.516772 getcwd("/build/tests", 2048) = 40
1691715094.518065 getcwd("/build/tests", 2048) = 40
1691715094.519086 getcwd("/build/tests", 2048) = 40
1691715094.520157 getcwd("/build/tests", 2048) = 40
1691715094.521249 getcwd("/build/tests", 2048) = 40
1691715094.522632 getcwd("/build/tests", 2048) = 40
1691715094.523897 getcwd("/build/tests", 2048) = 40
1691715094.525052 getcwd("/build/tests", 2048) = 40
1691715094.526113 getcwd("/build/tests", 2048) = 40
1691715094.527542 getcwd("/build/tests", 2048) = 40
1691715094.528679 getcwd("/build/tests", 2048) = 40
1691715094.529997 getcwd("/build/tests", 2048) = 40
1691715094.531269 getcwd("/build/tests", 2048) = 40
1691715094.532436 getcwd("/build/tests", 2048) = 40
1691715094.533592 getcwd("/build/tests", 2048) = 40
1691715094.534863 getcwd("/build/tests", 2048) = 40
1691715094.536135 brk(0xaaab243b5000)   = 0xaaab243b5000

The stack for each getcwd looks something like this

2259767  > /usr/lib/aarch64-linux-gnu/libc.so.6(getcwd+0x80) [0xd86e0]
2259768  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN5cmsys11SystemTools26GetCurrentWorkingDirectoryB5cxx11Ev+0x3c) [0x6bce8c]
2259769  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN13cmSystemTools26GetCurrentWorkingDirectoryB5cxx11Ev+0x30) [0x2c5764]
2259770  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN18cmCTestTestHandler7AddTestERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EE+0x3d0) [0x105710]
2259771  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZNSt17_Function_handlerIFbRKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatusEN12_GLOBAL__N_121cmCTestAddTestCommandEE9_M_invokeERKSt9_Any_dataS5_S7_+0x68) [0xf8968]
2259772  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatusSt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE+0x7a4) [0x260938]
2259773  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile11RunListFileERK10cmListFileRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_13DeferCommandsE+0x354) [0x261038]
2259774  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile17ReadDependentFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEb+0x264) [0x261ac4]
2259775  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN12_GLOBAL__N_120cmCTestSubdirCommandERKSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EER17cmExecutionStatus+0x134) [0xf8c74]
2259776  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZNSt17_Function_handlerIFbRKSt6vectorI18cmListFileArgumentSaIS1_EER17cmExecutionStatusEZN7cmState17AddBuiltinCommandERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPFbRKS0_ISF_SaISF_EES7_EEUlS5_S7_E_E9_M_invokeERKSt9_Any_dataS5_S7_+0x5c) [0x2aa61c]
2259777  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile14ExecuteCommandERK18cmListFileFunctionR17cmExecutionStatusSt8optionalINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEE+0x7a4) [0x260938]
2259778  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile11RunListFileERK10cmListFileRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEPNS_13DeferCommandsE+0x354) [0x261038]
2259779  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN10cmMakefile12ReadListFileERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x2b8) [0x261e28]
2259780  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN18cmCTestTestHandler14GetListOfTestsEv+0x4bc) [0x10b88c]
2259781  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN18cmCTestTestHandler15ComputeTestListEv+0x6c) [0x10bcb0]
2259782  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN18cmCTestTestHandler16ProcessDirectoryERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS6_EES9_+0x34) [0x10c164]
2259783  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN18cmCTestTestHandler14ProcessHandlerEv+0x1bc) [0x10ccbc]
2259784  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN7cmCTest12ProcessStepsEv+0x5b0) [0x70be4]
2259785  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(_ZN7cmCTest12ExecuteTestsEv+0x484) [0x716d4]
2259786  > /Users/pjohnson/git/sandbox/CMake-3.22.1/bin/ctest(main+0x280) [0x3ed90]

You can use c++filt to demangle the above symbols.

I don’t have the brk stack, but i believe it had something to do with std::string concatenation within ctest.

I don’t think necessarily getcwd is a bottleneck here, but it was the only narrowing down I was able to do in the time i had to investigate this.

A reproducer would be the most useful here. Speculation is hard, but there is a lot of duplicated work throughout CTest AFAIK.

code:

cmake_minimum_required(VERSION 3.10)
project(ctest_perf)
enable_testing()

function(instantiate_test)
  set(options)
  set(oneValueArgs TEST_NAME TEST_TARGET)
  set(multiValueArgs TEST_LABELS)
  cmake_parse_arguments("X" "${options}" "${oneValueArgs}"
    "${multiValueArgs}" ${ARGN})

  add_test(NAME ${X_TEST_NAME}
      COMMAND "echo ${X_TEST_NAME}")

# disabling these 2 lines sees dramatic speedup, but even disabled, slowdown way worse than linear
set_tests_properties(${X_TEST_NAME} PROPERTIES DEPENDS ${X_TEST_TARGET})
set_tests_properties(${X_TEST_NAME} PROPERTIES LABELS "${X_TEST_LABELS}")
endfunction()

# Instantiate the unit sim declared with declare_unitsim_testbench
function(instantiate_unitsim_testbench)
  set(oneValueArgs UNIT LABEL COM)
  set(multiValueArgs SCRIPT_ARGS)
  cmake_parse_arguments("X" "${options}" "${oneValueArgs}"
    "${multiValueArgs}" ${ARGN})

  set(X_TARGET_SUFFIX "testbench")
  set(X_NAME_SUFFIX "")

  instantiate_test(
      TEST_NAME ${X_UNIT}${X_NAME_SUFFIX}_${X_TARGET_SUFFIX}_${X_LABEL}_${seed}
      TEST_TARGET ${X_UNIT}_${X_TARGET_SUFFIX}
      TEST_LABELS ${X_LABEL} ${X_UNIT} ${COM})
endfunction()

foreach(seed RANGE 0 200000)
    instantiate_unitsim_testbench(UNIT foo LABEL smoke)
endforeach()

Profiled command (cleaned out build directory and re-executed cmake between profile runs):

time ctest -R foo_testbench_smoke_0

2 set_test_prop lines commented out

50k: 7.43s
100k: 50.7s (5.8x slower than 50K)
200k: 3m15s (25x slower than 50K)

2 set_test_prop lines enabled

50k: 23.2s
100k: 2m56s (6.7X slower than 50K)
200k: 10m54s (27x slower than 50K)

The above was run using cmake 3.22.1 on Macbook M1 (within a docker container).

I manged to figure out what the problem is: for each property setting, there’s a loop over all tests to check to see if the test’s name is one of the test(s) listed in the set_test_properties() call. Instead, we can store them in a multimap and use equal_range to only iterate over those with a name that is relevant. This makes a 40k with the 2 test property lines commented out (to make the benchmarking bearable yet also above the noise) go from 6.2s to 700ms locally.

d688a213d5583098f04bedc4f5c9eafe62c854bd
  Time (mean ± σ):      6.222 s ±  0.228 s    [User: 6.065 s, System: 0.126 s]
  Range (min … max):    5.997 s …  6.782 s    10 runs

a364421c7f cmCTestTestHandler: store test properties in a multimap
  Time (mean ± σ):     696.6 ms ±   6.2 ms    [User: 618.0 ms, System: 75.5 ms]
  Range (min … max):   685.2 ms … 705.5 ms    10 runs
1 Like

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/8743

This is great Ben!

Do you know if after this change the speed of set_test_prop enabled/disabled are similar, or is there still a big gap between the 2?

I believe that just changes a constant on the O(n²) algorithm that was being used before.