Proposal: Dependency pre-scanner for generated includes

Hello. This is my first post.

Abstract

Specify generated files and their action, instead of add_dependencies()

How we are doing now

Assume just lib2.c depends on the generated header.

# CMakeLists.txt
cmake_minimum_required(VERSION 3.13)

add_subdirectory(inc)

add_library(lib1 lib1.c)

add_library(lib2 lib2.c) # depends on generated header
target_include_directories(lib2 PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/inc)
add_dependencies(lib2 headergen)

add_library(lib12 lib12.c)
target_link_libraries(lib12 INTERFACE lib1 lib2)

add_executable(main main.c)
target_link_libraries(main PRIVATE lib12)
  • add_dependencies(lib2 headergen) is essential.
  • lib12 and main depend on headergen as order-only. It is not intended.
# inc/CMakeLists.txt
set(template_file ${CMAKE_CURRENT_SOURCE_DIR}/template.h)
set(generated_file ${CMAKE_CURRENT_BINARY_DIR}/generated.h)

add_custom_command(
  OUTPUT ${generated_file}
  COMMAND ${CMAKE_COMMAND} -E copy_if_different
  ${template_file}
  ${generated_file}
  DEPENDS ${template_file}
  )

add_custom_target(headergen DEPENDS ${generated_file})

Proposal

Add an ability to specify headergen as “dependency of file generator”
It may be similar to include_directories thing rather than order-only dependency. It may be transitive.

For example

If a new command would be introduced;

add_library(lib2 lib2.c)
target_include_directories(lib2 PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/inc)
target_include_generated_files(lib2 PRIVATE headergen) # May be custom target or generated file

If include_directories() could distinguish directories and targets;

add_library(lib2 lib2.c)
target_include_directories(lib2 PRIVATE 
  ${CMAKE_CURRENT_BINARY_DIR}/inc
  headergen # May be custom target or generated file
  )

Implementation

  • If a generator is capable of dyndep, headergen shall be converted to “a dyndep scanner to discover dependencies”. headergen will be invoked when a scanner discovered dependency to generated.h
    • A generator is responsible to extract file list to pass to a scanner as “list of missing files to be generated”
  • For other generators, headergen may be treated as local order-only dependencies, or global order-only dependencies (like add_dependencies)

Thoughts to granularity of scanning

It seems already-implemented dyndep facility emits per-file scanner. I think it is too fine-granularly. Per-target scanner would satisfy.

Requirements to the scanner

  • The scanner should traverse include path with missing file list.
  • The scanner should not miss “actually referenced missing files”.
    • Redundant files may be acceptable. “The scanner reports them but they are actually not referenced.” Gross scanner is acceptable (like cmake_depends) Of course, this is suboptimal.
    • If the scanner is not a gross scanner, it may stop when it met unknown missing files.
  • The scanner may omit reporting static files. Just missing files may be reported. DEPFILE (in compilation) shall cover them.
    • That said, the scanner should traverse all include files. Just may omit emission.
  • The scanner should emit its DEPFILE for the scanner itself. It should contain all scanned files, even if they were not emitted.

Makefiles generator’s “cmake_depends” could be diverted to this, but I guess it’d be hard to use it from Ninja generator.
For now, an external scanner may be specified.

Future of the scanner

The scanner may be capable of scanning also pre-compiled headers, like Clang’s “modules cache”
Then, an external scanner, for example clang-scan-deps, shall be specified.

Background and history

I have been experimenting ninja dyndep for weeks. I achieved highly parallelized build.ninja for Clang/LLVM tree.
See, “Clang can be built within one minute”, https://twitter.com/chapuni/status/1401519362058555393

For it, I implemented in CMake, “automatically discover add_dependencies() to generated header file and convert them to dyndep”

It works fine with LLVM tree but I don’t think it would be suitable to generic use.

  • I don’t think all projects would be ready to discover_dyndep. I guess some of them would be immature.
    • Introducing a new policy might be an option.
  • It was hard to dissolve cmake_object_order_depends chain.
  • Dissolving add_dependencies() would not help cutting unintended dependencies.

I concluded I should introduce more simple scheme. (see proposal above)

Thank you.

Starting from 3.20, Makefiles generators, as Ninja, use the compiler itself for C, C++, CUDA, etc… to compute the dependencies so, as soon the header file is included in some source file, dependencies will be correctly computed without any special treatment.

Considering the previous remark, is there any added value to create target_include_generated_files() command rather than using add_dependencies()?

Excuse me my poor example. Could you assume “lib2.c might include or might not include generated.h”
If lib2.c doesn’t include generated.h, the action header_gen isn’t executed.

In my example, each target has one source file.
But in practice, each target has bunch of sources and the project has hundreds of targets. The project has thousands of sources.

In such a case, prescanner can discover independent files in lib2 and Ninja dyndep can schedule them in prior to generated_h.
Or, generated_h stays as a bottleneck.

I am proposing the way to give opportunity of dyndep.

FYI, I experimented “header-only interface library”.

# inc/CMakeLists.txt
add_library(generated_h INTERFACE ${generated_file})
target_include_directories(generated_h INTERFACE ${CMAKE_CURRENT_BINARY_DIR})
# CMakeLists.txt
add_library(lib2 lib2.c) # depends on generated header
target_link_libraries(lib2 PRIVATE generated_h)

This can prune lib12’s dependency on generated_h, since lib12 uses target_link_libraries(INTERFACE)
But couldn’t prune main’s dependency.

It’s my intention that neither lib12 nor main depend on generated_h.

It sounds like this issue is of interest here. Also note that it is only really possible with the Ninja generator(s).

Ok, I understood your point.

I am afraid this is not the meaning of the story… Until 3.20, for makefile generators, a custom scanner was used to discover dependencies for c/c++ files, but this approach has many drawbacks:

  • This scanner is not able to handle all the complexity of the languages and preprocessor directives: a lot of bugs were logged due to this approach.
  • Each new language requires to develop a specific scanner which is nearly impossible with a reasonable cost.

So, the future is to rely on compilers themselves to generate implicit dependencies.

I guess my proposal will resolve it but it is not only an issue I want to resolve.

I took a way to introduce another semantics against add_dependencies, since I knew issues.
(This is because I abandoned the way to dissolve and extract dyndep from order-only deps chanin)

  • Introduce semantics to local order-only deps, out of add_dependencies.
    • It may be transitive. It should be available as PRIVATE as least.
  • Dyndep-capable generator(s), like ninja, can substitute it to dyndep scanner.

Do you think other generators could handle local order-only deps?

Also I was afraid of diversion of cmake_depends. I know CMake is getting rid of relying of it.

But I also thought that existent codebase might be considered to divert.

In my proposal,

  • If a prescanner is specified, the facility is activated. I think like CMAKE_EXPERIMENTAL_LANG_PRESCANNER
    ** Clang will provide compatible scanner, clang-scan-deps, for it in the future.
  • cmake_depends may be enhanced as default scanner, when “awareness of missing files” is implemented.

Prescanner may be “gross scanner”. I think cmake_depends will be acceptable to it.

FYI, I found a similar issue.

My proposal will help him, but seems too late.