Find non-terminated tests with ctest -j

Sometimes some test under ctest does not (or not timely) terminate. When running with -j and a high number of tasks it is very hard to figure out which test has not ended yet. On Linux we have a decent ps or pstree, but on e.g., Windows it seems rather hard to find the test processes and how they are called and thus it is hard to find the stalled test.

Would it be possible to print the tests it cancels when you stop ctest using Control-C? I.e. something like this:

> ctest -j 16
....
^C
Interrupted tests:
   mytest1 (running for 530 sec)
   mytest7 (running for 10 sec)
   ...
>

I suggest to use Sysinternals Process Explorer. On command line their pslist may help.

Thanks. At least that helps. I’m not much of a Windows developer :slight_smile:

Still, I think feedback from ctest is not hard to implement and will simplify this problem for everybody.

I think this could be a nice use case as well. Does this involve how libuv is used (?) to manage the parallel test runs?

Cc: @kyle.edwards

Any news on this? Locally one can finds a way out, but today I was faced with a hanging test in Github CI. So, I copied the failed test report, ran this to get a sorted list of finished tests. That allowed me to spot test #81 did not complete …

 grep Passed ci.txt | sed 's/^.*#\([0-9]*\).*/\1/' | sort -n

How do people deal with this? Setting fairly short test timeouts is of course an option, but if tests are executed from fairly fast hardware to slow emulators, it is hard to find a sensible limit.

My opinion is that this should be a feature request (in CMake Gitlab project). I think CMake’s internals using libuv from CTest would allow this message of which processes were still running. Not trivial, but seems doable possibly.

What would trigger the printing of such a report? SIGUSR1 or something would work, but that doesn’t help CI. Maybe we wait for some special signal that CI systems send when tearing down due to a timeout? Is there even such a thing, nevermind consistency about it?

I was thinking about the termination signal, be it SIGINT (for terminal) or SIGTERM (I guess most CIs will use that if the time limit is exceeded?)