Best practices for handling multiple calls to package-config.cmake across large projects

sbolding · June 2, 2021, 8:45pm

Hello,

I have some questions about what is considered best practice for writing a basic package config file when searching for dependencies, and I haven’t had much luck deducing it from the internet. To explain my question, assume the following simple installed configuration file for a package foo that has a dependency on MPI of any version. This package foo is a low-level dependency that is consumed in several places throughout a much larger project consisting of several subprojects.

include(CMakeFindDependencyMacro)

find_dependency(MPI REQUIRED)

include("${CMAKE_CURRENT_LIST_DIR}/foo-targets.cmake")

My understanding, based on looking at a trace-expand, is that if a consuming project calls find_package(foo REQUIRED), the code in the installed config file will be executed every time, even if somewhere else in the project (maybe in a subproject) there was already a successful call to find_package(foo REQUIRED). Is that true? This behavior surprised me greatly, but I can see arguments for why it is necessary if a different version is found later.

I have seen some config files which guard against the additional calls to FindMPI, for instance, as follows:

if( NOT TARGET MPI::MPI_C)
   find_dependency(MPI REQUIRED)
endif()

So my questions are:

In the first case, will the MPI_C found by the last call to find_package(MPI REQUIRED) be used as the target for all packages linking to MPI::MPI_C? In the second case, is that saying “use whatever MPI has already been found. I will let whoever called it first set the version”?
Is one or the other considered “best practice”? Or is it just a preference of “last call wins” versus “first call wins”? What is the justification?
Should my config file also be checking to see if foo target is defined upon entry and just return or something similar to guard against multiple calls to find_package(foo)? What is the justification if not?

Hopefully my questions are clear; let me know if I can clarify!

Simon

ben.boeckel · June 2, 2021, 9:42pm

The package can’t really change the target within the scope, so it’s not like there’s anything to be done if the target would be different. So yes, it will use whatever was last found in the scope.

FindMPI is tricky, so skipping it if someone already called it seems sensible.

No. You might not define the targets, but any variables might be important. For example, if you supported components, variables could be different based on the component list.

sbolding · June 18, 2021, 9:08pm

@ben.boeckel , thanks for the responses!

emmenlau · September 27, 2022, 1:56pm

This topic is actually very interesting, and the question from @sbolding is well formulated and straight to the point. Thanks also @ben.boeckel for the helpful reply.

One thing that I’m missing here would be a “conclusion”, when include guards are well justified and maybe even recommended. Do such cases exist?

As an example, our config scripts do not (themselves) depend on components and also not on any variables, and they do not (themselves) set any variables. Basically they are just hard-coded lists of find_dependency() statements. But the detected packages may be anything, like OpenCV, Qt, xtensor, zlib, boost, and so on. When I add include_guard(), our typical build speeds up by 5-10% on Windows, which is quite significant for us.

But are there obstacles in this use case that I overlook? Would it be “better” to execute the scripts repeatedly? In which cases is that so, what are the pitfalls?

ben.boeckel · September 27, 2022, 6:55pm

Find packages should “no-op” pretty well. find_* results should be cached and if (NOT TARGET) checks should avoid recreating targets. Include guards might be good for functions and macros (since they are global anyways) to avoid reparsing them. But most other find_package code is (usually) trivial variable and/or settings beyond these “expensive” things. Finding out what is actually taking the time might lead to better insights for your project.

emmenlau · September 28, 2022, 8:09am

I can say that in one of my core libraries, we have find_package() for a number of libraries that execute repeatedly and take significant time on Microsoft Windows. Amongst these libraries are Intel MKL, libgeotiff, PROJ, and a few others. There may be more that take significant time, but these are the ones that also print a message, so they are more easy to identify.

So what you’re saying is to rather fix the upstream libraries, not mine? But is the repeated evaluation even sensible and good? It seems to use a lot of runtime on Windows, albeit I do not know where.

ben.boeckel · September 28, 2022, 5:22pm

It depends. I prefer to find dependencies as close to where they are used as possible. This makes it easy to make sure that any updates also consider the find_package update. It might mean that packages get found multiple times, but I prefer correctness over performance myself. I think actually profiling where things are slow and fixing them is more likely to be useful here. It could be that package configurations aren’t reusing the cache effectively, but it needs investigation into specific packages to really know.

craig.scott · September 28, 2022, 10:15pm

You could also try adding the --profiling-output=<filename> and --profiling-format=google-trace to your cmake command and viewing the results in Chrome (use the URL about:tracing) or other browsers that support viewing the results. Not sure if the results will be helpful, but worth a look.