Dynamic Generation of source files at configure/build time

biddisco · April 26, 2024, 9:48pm

Hi all,

We have several hundred files which must pass through several stages of preprocessing, and the output of all of that is eventually a source list that can then be used to build a target.

The preprocessing of the files can be done at configure time - which takes ages and ages, because configure runs on a single thread, but we get a consistent list of files to build, and then the build can run using make -j32 or whatever and the compilation proceeds reasonably bearably.

I can make the preprocessing of the files into a set of custom rules/targets and then they are generated in parallel using the same make -j strategy, but the target that is actually built at this point has no source files - just dependencies on the preprocessing rules that generate the source list. I can then use the source list as an input to a 2nd cmake step which actually builds the target.

A single step configure->generate followed by a build id desired, but I’d like to parallelize the preprocessing step. Is there a clean way of doing this in cmake. I can imagine a make preprocess step followed by a make target step to do the two step generation and compilation, but I’d like to know if there is a proper way to do it. (dynamicallly pick up the source list from make preprocess and use it in make target in a single invocation of cmake …)

Many thanks
JB

ben.boeckel · April 27, 2024, 3:55pm

Cc: @buildSystemPerson

buildSystemPerson · April 29, 2024, 4:59pm

add_custom_command sounds like what you want. See the docs on generating files.

This allows you to parallelize code generation at build time.

add_custom_command(
  OUTPUT preprocess.c
  COMMAND someTool -o preprocess.c
  VERBATIM)

add_library(myLib out.c)

Main thing is you need to be able to specify the source files being generated. If you have hundreds of files this can be tricky.

Perhaps you could do the following:

# Get a list of all the source files that will be preprocessed
execute_process(
    COMMAND someTool --output-files
    OUTPUT_VARIABLE preproccessed_file_list)

message(STATUS "preproccessed_file_list" # "out.c;foobar.c;etc.c"

for (file IN LISTS preprocessed_file_list)
  add_custom_command(
    OUTPUT ${file}
    COMMAND someTool -o ${file}
    VERBATIM)
endforeach()

add_library(myLib ${preproccessed_file_list})

Now when you build myLib via make -j the source file generation will be parallelized. The only thing run at configure time is the 1 execute_process needed to determine all the generated files.

biddisco · April 29, 2024, 5:47pm

You misread my original message it seems (actually, I might not have been clear enough, sorry). I have already implemented the custom_command method, but I have to use a 2 step process because I do not know what source files will be generated from the custom commands. Some could be guessed/known but many others not.

I’d like to parallelize the generation, but the source file list is only known after generation completes.

(What I’m hoping is that there is some way to use execute_process in the same way as custom_command - to get the parallelism - or alternatively - find some way of having a dynamic source-list that can be used with custom_command, but still correctly generate the target)

buildSystemPerson · April 29, 2024, 6:39pm

What I’m hoping is that there is some way to use execute_process in the same way as custom_command - to get the parallelism

There is no easy/idiomatic way to do this currently. I have seen people create custom solutions to get parallelism in the configure step but it’s a bit complicated. I wouldn’t recommend it for an open source project in particular, since package managers won’t be expecting it.

or alternatively - find some way of having a dynamic source-list that can be used with custom_command, but still correctly generate the target)

Currently no such functionality exists. FWIW this concern has been brought up a fair bit and is an active area of discussion I’m looking into.

ben.boeckel · June 11, 2024, 5:01pm

Today, you’ll need to generate a single source that #includes all of the generated sources and then include that one (known) source in the target list. It isn’t the greatest with parallelism utilization, but without some other features in CMake, this is the best that is possible today.