[discussion] `COMPILE_OPTIONS` de-duplication "direction" (keep first vs keep last)

In the docs for prop_tgt/COMPILE_OPTIONS concerning de-duplication, it doesn’t specify which duplicate is retained (first or last occurence).

The final set of options used for a target is constructed by accumulating options from the current target and the usage requirements of its dependencies. The set of options is de-duplicated to avoid repetition.

  • It seems that the first occurence is kept
  • I think this can be an important thing to state (because,)
  • I want to discuss one way that this could result in potentially unexpected/undesirable behaviour.

Note: What I will describe has never been a problem for me, or even relevent to me so far, and if this has never been brought up before to the CMake team, it probably means it’s not a problem for anybody, and we can all spend our time on more pressing things. Since this has never been a problem for me, the example I will later give is a toy example. It should not be taken seriously as if it were something that someone might actually do.

Current CMake Behaviour

I tried the CMake de-duplication out, and it seems the first occcurence is the one that is kept (I made a target and gave it the compile options -funroll-loops -fno-unroll-loops -funroll-loops, configured, and inspected the compile_commands.json file, which had -funroll-loops -fno-unroll-loops). Relevant source code seems to be in Source/cmGeneratorTarget.cxx, but I didn’t spend the time to read and understand it.

“Problem” Background Info

In gcc (and other compilers that try to have some compatibility with it (clang, icc (I think? I’ve never used icc))), many compiler flags have a parallel flag that does the “opposite thing” (docs):

Many options have long names starting with ‘-f’ or with ‘-W’—for example, -fmove-loop-invariants, -Wformat and so on. Most of these have both positive and negative forms; the negative form of -ffoo is -fno-foo. This manual documents only one of these two forms, whichever one is not the default.

and when multiple positive and negative forms are given in the same list of compiler options, the last one controlling that behaviour is the one that is used:

Ex. warning options:

Some options, such as -Wall and -Wextra, turn on other options, such as -Wunused, which may turn on further options, such as -Wunused-value. The combined effect of positive and negative forms is that more specific options have priority over less specific ones, independently of their position in the command-line. For options of the same specificity, the last one takes effect. Options enabled or disabled via pragmas (see Diagnostic Pragmas) take effect as if they appeared at the end of the command-line.

The gcc documentation is a little lacking- as shown above, the warning options docs explicitly state how resolution is done, but the optimization options docs and others do not. But you can use -frecord-gcc-switches and inspect the output to see what “final” list of options was used to compile, which will show that it uses the last one.

“Problem” Description

The pre-deduplication list of compile options seen by CMake orders compile options from least to most specific context. From the docs for CMAKE_<LANG>_FLAGS:

  • CMAKE_CXX_FLAGS: Initialized by the CXXFLAGS environment variable.

The flags in this variable will be passed to the compiler before those in the per-configuration CMAKE_<LANG>_FLAGS_<CONFIG> variant, and before flags added by the add_compile_options() or target_compile_options() commands.

To me, it seems reasonable to believe that generally, the most specific context “knows best” about what compile options to use- or at least that that’s much more reasonable than believing the opposite.

Toy example: imagine a user about to compile a project has CXXFLAGS with -funroll-loops, and the project sets CMAKE_CXX_FLAGS_DEBUG with -fno-unroll-loops, and then for a specific target, there’s a target_compile_options with $<$<CONFIG:Debug>:-funroll-loops>.

It might be surprising that (for the debug config in this example,) the target which requested to have -funroll-loops now has -fno-unroll-loops.

(Reminder- this is a toy example, and loop unrolling is probably not the best option to use. Perhaps imagine other scenarios with options such as -fzero-call-used-regs, -fconserve-stack, -fno-inline). Important note: I don’t know if such setups are even legal/safe (ie. could they result in ODR violations or other badness for inline functions defined in headers which are included in multiple targets with different compiler options?).

Questions for the CMake Team

  • What was the rationale for making the first occurence of a compile option be the one to be retained?
    • Was what I’ve talked about taken into consideration?
    • Are there any conflicts? Ie. Are there ways that retaining the first occurence of an option could be desirable?
  • Would you consider adding to all instances of this documentation describing target compile option de-duplication that the first occurence is kept?

Possible Questions for General Discussion

  • Can you think of something better than my toy example? Ie. A setup that has a good rationale that someone who doesn’t know about CMake’s compile option de-duplication behaviour would be surprised about the outcome of.

  • (copied from above): Are there any conflicts? Ie. Are there ways that retaining the first occurence of an option could be desirable?

    • On gcc and similar compilers?
    • On compilers not like gcc (ex. MSVC)?
  • If this is seen as something that should be addressed, how can it best be addressed? Possible options for discussion:

    • Make a policy / CMAKE_ option variable to control the behaviour, and give it a sane default if one exists. Options: keep first, keep last, disable de-duplication.
    • Add special logic in CMake to handle de-duplication of these binary/toggle compile option flags.

Note: I looked at the table of contents for Craig’s “Professional CMake” book and saw that there is a chapter on option de-duplication, but have not yet purchased a copy of the book to read, so there might be relevant info / answers to my questions there.

I don’t know the history of the de-duplication functionality, but I can provide an example where keeping the first occurrence is important: header search paths. If you retained the last occurrence for header search paths, it would change the search behavior, potentially finding a different header than if no de-duplication occurred. Repeated include paths are pretty common, so I wouldn’t be surprised if this was the first area the de-duplication was used.

For other compiler and linker options, relying on the last one taking precedence can be ill-advised. I think some toolchains will issue a warning about later options overriding earlier options. I’d normally recommend you avoid setting up such a scenario by removing unwanted options from the command line. That’s also much clearer for anyone trying to debug the command line later for some reason.

The current documentation that talks about de-duplication does lack specific details about how that de-duplication is done. If someone was willing to trace through the code and confirm the behavior (noting whether there are any special cases), a merge request updating that documentation would be welcome.

1 Like