Concurrent running FetchContent_MakeAvailable() with shared SOURCE_DIR

gerhardol · May 24, 2022, 8:27pm

We fetch some large SDKs (several GB), to speedup builds and avoid full disks the SDKs should be put on a shared and persistent path. The code is not too complicated and uses flock to make sure that only one version at a time is installed (which could happen on CI servers).

I would like to use FetchContent_MakeAvailable() to do this instead, setting SOURCE_DIR to the persistent path (no binary path right now), but do not see that the command is protected from concurrent installations. It works fine with local paths of course.

Did I miss something or ae there other ways to ensure concurrency?

craig.scott · May 24, 2022, 10:18pm

I’m not clear on what you’re potentially running concurrently. The CMake execution is (currently) inherently single threaded. While I have future plans to potentially make FetchContent_MakeAvailable() (or some other equivalent new function) be able to process its dependency list in parallel, that doesn’t apply right now.

If you’re talking about running multiple instances of CMake at once, then that is indeed asking for trouble. If you are sharing things between such multiple instances, you are also responsible for ensuring they don’t interfere with each other.

I’ve seen people use things like DoIt to parallelise artefact downloads during generation. That’s probably a tangent to your main query here though.

gerhardol · May 24, 2022, 10:37pm

Yes, several builds so several CMake processes.
I guess a semaphore could be implemented too, to use CMake instead of flock/curl (but that will increase the complexity).

Slightly outside intended use of CMake, will set the response as a solution.

craig.scott · May 24, 2022, 10:43pm

A change I’m aiming to have included in CMake 3.24 may help you here. If you wanted to do your own sort of locking, the new dependency providers feature would allow you to wrap each call to FetchContent_MakeAvailable(). You could do some kind of file locking, for example, but obtaining the lock, forwarding the call back to FetchContent_MakeAvailable() again, then releasing the lock. The implementation is smart enough to detect the recursive call and not put you in an infinite loop.

It’s an interesting use case for that feature.

https://gitlab.kitware.com/cmake/cmake/-/merge_requests/7276

gerhardol · May 25, 2022, 8:32am

Thanks!
A quick review of the documentation does not make this completely clear to me right now, will try to look at the change later (even if not in time for the PR).

gerhardol · October 12, 2022, 11:14am

A follow up if anyone is interested.
A summary is that without some additional implementation, this usecase has really no benefit from using custom scripts.

Implemented similar to the example in the doc, just adding CMake locking
https://cmake.org/cmake/help/latest/command/cmake_language.html#id3

        file(
            LOCK ${GOOGLETEST_SOURCE_DIR} DIRECTORY
            GUARD FILE
            TIMEOUT 600
        )

(An explicit RELEASE is done too).
I was not sure how to use file(LOCK), there are not much examples, but FILE seem to be working.

To not make unnecessary locks, the lockfile is specific to the dir to installed, and that file need to be added to a variable. (I would have preferred to get the “FetchContents SOURCE_DIR” to be available, but have not found out how.)

One intention with the change was to be able to test with GoogleTest offline. That does not work as populate will always try to connect, also when the repo also exists with the correct sha. That may be fixed with a special check that sets the proper variables without calling the built-in implementation.

My initial intention with the change was to replace a separate script download for dependencies. That was not possible as some setup is done prior to project() and I do not want to copy that into the product repo, at least right now.

Summary of limitations:

Get SOURCE_DIR (and more?)
How to avoid additional downloads
Handle “hooks” prior to project() (I expect that to be reject, if ever raised).

craig.scott · October 16, 2022, 10:53am

It shouldn’t need to contact the remote after the first run successfully downloaded it in this scenario, but you’re probably experiencing this bug.

gerhardol · October 21, 2022, 2:50pm

Thanks again!

If someone else reading this is unsure of the status for the issue, fetch_contents will delete the directory and redownload it. (Two times in my setup for some reason). This is also with a Git SHA, with 3.24.2.
(We initially used a tag and PATCH_COMMAND too, but using sha for a committed change is not enough.)

–

Other observations:

Using an URL will not work either (that is more obvious, just had to test).

Download is done twice in our setup. Something triggers configure to run again. “Re-running CMake…”. some custom targets (like running git-describe were running twice too. That is a different problem. (tried to debug ninja setup)

In some situations not just the dir to download to was deleted, but also the parent directory. I cannot reliably recreate. The global download dir should be gtest/sha so it is not a too bad issue in my setup.

So I am keeping the separate download script for now, may look into to this myself as well.