Highly duplicate data in cmake_install.cmake files

mugwort · June 7, 2022, 3:28pm

Hi, I’m fairly new to cmake and using it in the project at work.

It mostly all seems good but whats bugging me is the amount of duplicate data that gets generated in the build folder, it seems for every cpp file a cmake_install.cmake file is created, which all contain the same project-wide parameters, as well as including sub-directory cmake_installs which also ultimately only contain parameters already defined on the higher level.

Is there no way to configure this so these don’t get generated for every file unless the file actually has a unique parameter that’s not yet set?

For example, cmake_install.cmake typically looks like:

# Install script for directory: /some/application/directory

 

# Set the install prefix

if(NOT DEFINED CMAKE_INSTALL_PREFIX)

  set(CMAKE_INSTALL_PREFIX "/usr/local")

endif()

string(REGEX REPLACE "/$" "" CMAKE_INSTALL_PREFIX "${CMAKE_INSTALL_PREFIX}")

 

# Set the install configuration name.

if(NOT DEFINED CMAKE_INSTALL_CONFIG_NAME)

  if(BUILD_TYPE)

    string(REGEX REPLACE "^[^A-Za-z0-9_]+" ""

           CMAKE_INSTALL_CONFIG_NAME "${BUILD_TYPE}")

  else()

    set(CMAKE_INSTALL_CONFIG_NAME "Debug")

  endif()

  message(STATUS "Install configuration: \"${CMAKE_INSTALL_CONFIG_NAME}\"")

endif()

 

# Set the component getting installed.

if(NOT CMAKE_INSTALL_COMPONENT)

  if(COMPONENT)

    message(STATUS "Install component: \"${COMPONENT}\"")

    set(CMAKE_INSTALL_COMPONENT "${COMPONENT}")

  else()

    set(CMAKE_INSTALL_COMPONENT)

  endif()

endif()

 

# Install shared libraries without execute permission?

if(NOT DEFINED CMAKE_INSTALL_SO_NO_EXE)

  set(CMAKE_INSTALL_SO_NO_EXE "1")

endif()

 

# Is this installation the result of a crosscompile?

if(NOT DEFINED CMAKE_CROSSCOMPILING)

  set(CMAKE_CROSSCOMPILING "TRUE")

endif()

 

# Set default install directory permissions.

if(NOT DEFINED CMAKE_OBJDUMP)

  set(CMAKE_OBJDUMP "/usr/bin/objdump")

endif()

 

if(NOT CMAKE_INSTALL_LOCAL_ONLY)

  # Include the install script for the subdirectory.

  include("/some/sub/directory1/cmake_install.cmake")

endif()

 

if(NOT CMAKE_INSTALL_LOCAL_ONLY)

  # Include the install script for the subdirectory.

  include("some/sub/directory2/cmake_install.cmake”)

endif()

...

The first 39 lines of this are the same for every file apart from the first line saying which directory its for, then the following lines are for including the ones in subfolders (which in itself looks as though all the if(NOT CMAKE_INSTALL_LOCAL_ONLY) statements could be merged by listing the include directories line by line, if these are truly necessary (which in most cases, seems they aren’t)).

It’s a similar case for the CTestTestfile.cmakes which are generated… most are there only to link other cmake files. Is there any reason CMake must pollute the build folder (and console output via duplicate messages) this way instead of only creating the files where necessary and then just recursively scanning the whole build directory for any generated cmake files on any level?

It seems to also create a extremely large number of empty directories in CMakeFiles folders, which I feel could also be reduced to only generating them where needed.

ferdnyc · June 8, 2022, 6:16am

If that’s something that bugs you, you’re in for a rough ride with CMake, I’m afraid.

OK, now, hold on. If that’s the case, then it sounds like the project you’re working with has… an unusual structure. The norm for cmake_install.cmake files is one per project directory containing a CMakeLists.txt. I suppose if your project builds every .cpp file in a separate directory, what you’re describing could happen, but then it really feels like they brought this on themselves. …Oh, wait, or you’re describing tests. For more on that, read on.

In a word: No.

You have to realize, first and foremost, what CMake is: It’s a build system generator. So, it creates inputs to other systems, in the form of static files that can be loaded during the build/install process. (One of those systems just happens to be cmake, which runs its own installs, but that’s neither here nor there.)

The reason cmake_install.cmake files are generated with so much redundancy is that they’re completely self-contained. They’re just runnable cmake scripts. You can cd into any build directory and run cmake -P cmake_install.cmake, and the targets in that directory will be installed. It works that way because no other files have to be read, there’s no ‘parent state’ to worry about, heck the install script doesn’t even read anything from the local CMakeFiles directory; that’s purely a working directory for the build process.

(One really nice thing about that arrangement is, all of the install scripts are available the moment CMake finishes generating the build directory, even before you’ve actually built anything. So when you’re debugging your install and/or export configuration, you can iterate really quickly without having to run any builds. …Trust me, it comes up.)

Same deal. Tests are run by ctest, which is a scripted test system. Its inputs are static files, and while it does work recursively (it has a subdir() command), information doesn’t cross levels — any of the tests in a subdir() can be run independent of any of the parent directories or their configuration.

Tests are also special, because every test is a self-contained executable. That tends to throw a lot of people who are used to monolithic test systems where you build a single test-runner that executes N tests and reports the results. With CMake + CTest those N unit tests will be N individual test targets, built into N separate executables, each with its own build directory and control files.

That was something that surprised me at first. It seemed crazy to generate 79 different executables just to run 79 tests. But there are two HUGE advantages to setting things up that way:

Your tests can be run in parallel, whether or not the code being tested supports any sort of parallelism. (Assuming, of course, that tests don’t try to access each other’s data files at the same time, or anything like that. But unless you actively sabotage yourself, tests can be run in parallel.)
Tests are isolated from each other. That’s huge.

A project I work with builds a library that uses singletons for “global” configuration. Singletons are notoriously difficult to unit-test, because by definition they can’t be tested in isolation. Their whole deal is side effects. With our old system (monolithic testrunner executable), we had to be very careful not to call anything from any of the tests that would alter the “global” configuration, because if they did the other tests would be affected and it would alter their results! And the actual singleton code itself we’d just resigned ourselves to not being able to test at all.

Once we switched to Catch2 and CTest, for our testing framework, that restriction was finally lifted because, if each test is executed in a separate process, then they each have their own “global” configuration that’s totally isolated from any of the other tests, even when they’re being run simultaneously. Now, not only can tests call whatever they want, but we even have unit tests for the singleton itself.

I’m torn as to how to respond to this.

Knee-jerk reaction
So what? Don’t look in there. If CMake wasn’t cross-platform they could have named those directories .CMakeFiles and you’d never even notice them. The working space of the build system is just that: working space. It’s not meant for human consumption. Do you go poking around in .git directories? (If so, the structure there wins no prizes. .git/objects/ can have up to 258 subdirectories. There’s always info and pack, and then the blobs can sprawl out to as many as 256 more numbered 00 to ff. Inside them you’ll find nothing but compressed files with 39-character hashes for names. But, hey, can’t fault the efficiency.)
Engaging with the claim at face value
That’s actually kind of unusual, in my experience – usually you get a few empty directories, but not that many that it would be something to call out specifically. (Unless even one is too many, in which case… just, no.) Again, I don’t know your project configuration. That may be caused by something unusual in how they’re using CMake. You say this is a project “at work”; many companies have their own structure for how projects are laid out and configured. Often, there are corporate Reasons™ for why things are done that way, and I assume they’re unlikely to change any of that which is why you’re posting here instead of on a company chat server.) Point is, there are things the project configuration can be doing with things like build targets and configurations that can further exacerbate CMake’s existing tendency to create a lot of configuration/control files in the build space.

(Which is, again, its space. My overall advice still boils down to…)

¯\_(ツ)_/¯ Don’t look in there.

ben.boeckel · June 8, 2022, 1:49pm

These are generated files and the code that writes it out has no idea what else is going on, so they end up being very repetitive. It’s just a lot simpler to chain things with includes than to make everything conditional and have “oh, this condition is open in the generated file, so just reuse it” instead because it means tracking what the state of the generated file is each time instead of just working off of some basic assumptions.

Why does this matter at all?

What duplicate messages?

This means that if files get left around (say you disable a directory), it still affects the install. This is bad behavior and would not be fun to debug all the time.

Again, why does this matter at all?

Highly duplicate data in cmake_install.cmake files

¯\_(ツ)_/¯ Don’t look in there.

`¯\_(ツ)_/¯` Don’t look in there.