Staggering ctest starts

PaulK · January 20, 2023, 3:38pm

Dear ctest community!

I was wondering about a small problem I have with ctest tests:

I have a test suite of, let’s say 100 tests. On average, these tests can run fine in parallel, however, when started in parallel there is a high likelihood of a out-of-memory error occurring because of some internal buffer allocation that does happen at the same time.

My question now, apart from scripting it myself: Is there some smart/integrated way to stagger the startup of tests with ctest with cmake/ctest? Just an offset of a few ms would be enough here. Ideally, this would be a setting or configuration for the ctest-command.

Right now, I opted to hardcode a random 1-second delay in the startup routine of my tests. That works okay-ish but can be somewhat annoying when you just want to run a single (ctest)-test quickly from the IDE, because you might need to wait a second or so for a test that takes less than a second to execute.

Basically I wonder if there is a better way…?

ben.boeckel · January 22, 2023, 2:40am

It seems like something that could be managed using RESOURCE_GROUPS.

Cc: @kyle.edwards

PaulK · January 24, 2023, 3:44pm

Hey, thanks for your reply Ben!

Resource groups are fine etc, but if I understand it correctly, they only allow me to limit the number of tests that are run in parallel… While this is technically the correct behavior (e.g. good use on a buildserver), it is still undesired for user-testing:

If I have, let’s say 8 GB of memory available for testing, and my tests required 4 GB of memory for a brief moment (let’s say 1% of the total test runtime) approximately at the same time after starting, I do not want to limit the tests to only run 2 in parallel per-se. I much rather would like run all my 20 tests in parallel, but space out the temporary 4 GB allocation (as I said, I do this by introducing a random sleep right now).

This is a lot more convenient when manually running the tests, because limiting parallelism will make it so that my tests take way longer (real time) to run.

I am actually facing this problem in some other projects as well, but the circumstances are slightly different each time. The other example, where this occurs is, for example, a temporary file-lock (while reading a test file), or when using a shared database (not ideal, I know) some locking mechanisms within the database.

Typically this is not a big issue, I just do a “parallel everything” run, and then run the failed tests one second time with ctest --rerun-failed. But if staggering the startup would be a feature, the second sequential run could probably be omitted most of the time.

But maybe I do not quite understand how RESOURCE_GROUPS could play a role here… I could imagine that I could maybe register a required FIXTURE_SETUP test before each test that locks a resource (exclusively) and introduce a small delay that way for each test… I need to experiment with that. I think that could work and would be a neat trick of using what is already there.

Is that what you mean?

kyle.edwards · January 24, 2023, 3:50pm

If your test only allocates 4GB for a brief period, it may be better to split it up into several tests if possible: one test for everything before the 4GB allocation, one test for the allocation itself, and one test for everything after. Then you can use test dependencies and/or fixtures to ensure they’re run in the correct order, and use resource groups to ensure that no more than two 4GB allocations run at a time.

scivision · January 24, 2023, 4:14pm

To summarize CTest methods to control parallel test execution:

simplest but least efficient way to avoid parallel runs of certain tests is RESOURCE_LOCK.
Fixtures can give test order control while still allowing other tests in parallel.
Resource groups is most efficient but requires specifying the particular system resources to be effective. You could make a trivial bash script with system parameters on the command line. Or have a JSON file that if present is read specifying that system’s system resources

These can be used in combination but for your purposes picking just one method is enough.

PaulK · January 27, 2023, 1:17pm

Hey all! Thanks for answering again.

I was able to create a little demo-repository that does what I want. It is good enough for what I wanted to achieve. It uses FIXTURE_SETUP/FIXTURE_REQUIRE to queue a fixed (short) delay before each and every test:

Maybe there is a more efficient way than creating a new fixture for each and every test, I am quite new to using FIXTURES and still need to find some time to read the full doc. But this is doing what I want pretty much: Offset the start of each test by a bit but not fully lock a resource or limit the number of tests that run in parallel in another way.

What is better about this implementation is that the delays are smaller than introducing a random startup delay (my current solution). On the other hand, if you run a single test, the new fixture is always run alongside it, causing more visual noise, which can be annoying.

I might be iterating over this solution a bit - if enough time. If there are any further ideas on this, I welcome comments.

I did not want to go the RESOURCE_GROUPS route because that will prevent the number of tests that run in parallel and there is no real practical need for me to do that (mostly everything is fine, sometimes it crashes). I rather accept the occasional crash than increasing the buildtime by a lot because I limited how many tests run in parallel.

Thanks again for all your suggestions, this discussion has brought me a bit closer to an ideal solution, already!